Start a new topic

Issue with Overlapping Variants in TOPMed Cohorts Using WGS Data

Hello BDC Forum Members,

I hope you are doing well. I’m currently working with the JHS and FHS cohorts from the TOPMed whole-genome sequencing (WGS) dataset, and I have encountered an issue that I would greatly appreciate your help with.

Problem: No Overlapping SNPs Between JHS and FHS Cohorts

I have been investigating the genetic variants in both the JHS and FHS cohorts. For one chromosome, I tried merging the VCF files from both cohorts using BCFtools (on the Seven Bridges platform). After merging, I applied PLINK for filtering variants based on missingness thresholds (--mind and --geno). Even with very large thresholds, I ended up with no mutual variants between the cohorts.

Additionally, after performing QC on each cohort separately and comparing the list of SNPs that passed filtering, I noticed that there are no overlapping variants between the two cohorts. This was surprising because I expected to find mutual variants given the large number of variants typically captured by WGS.

Questions:

Is this expected behavior? Could there truly be no overlapping variants between the JHS and FHS cohorts, or might I have made a mistake during the merging and QC process?

Why might there be no/many mutual SNPs? WGS data is typically extensive and includes millions of variants. Shouldn’t there be a substantial overlap of SNPs across different TOPMed cohorts like JHS and FHS?

Steps Taken:I merged the VCF files for one chromosome from both JHS and FHS using BCFtools on Seven Bridges.I used PLINK for filtering based on missingness thresholds (--mind and --geno), but after filtering, no mutual SNPs remained.I also compared the preprocessed SNP lists for both cohorts and found no overlap.

Any guidance or insights into why I might not be seeing overlapping SNPs and suggestions for how to resolve this issue would be greatly appreciated.

Thank you in advance for your help!

Best regards,
Yasaman Soofi

Login or Signup to post a comment