Issue with Overlapping Variants in TOPMed Cohorts Using WGS Data
Y
Yasaman Jamalipour Soofi
started a topic
3 months ago
Hello BDC Forum Members,
I hope you are doing well. I’m currently working with the JHS and FHS cohorts from the TOPMed whole-genome sequencing (WGS) dataset, and I have encountered an issue that I would greatly appreciate your help with.
Problem: No Overlapping SNPs Between JHS and FHS Cohorts
I have been investigating the genetic variants in both the JHS and FHS cohorts. For one chromosome, I tried merging the VCF files from both cohorts using BCFtools (on the Seven Bridges platform). After merging, I applied PLINK for filtering variants based on missingness thresholds (--mind and --geno). Even with very large thresholds, I ended up with no mutual variants between the cohorts.
Additionally, after performing QC on each cohort separately and comparing the list of SNPs that passed filtering, I noticed that there are no overlapping variants between the two cohorts. This was surprising because I expected to find mutual variants given the large number of variants typically captured by WGS.
Questions:
Is this expected behavior? Could there truly be no overlapping variants between the JHS and FHS cohorts, or might I have made a mistake during the merging and QC process?
Why might there be no/many mutual SNPs? WGS data is typically extensive and includes millions of variants. Shouldn’t there be a substantial overlap of SNPs across different TOPMed cohorts like JHS and FHS?
Steps Taken:I merged the VCF files for one chromosome from both JHS and FHS using BCFtools on Seven Bridges.I used PLINK for filtering based on missingness thresholds (--mind and --geno), but after filtering, no mutual SNPs remained.I also compared the preprocessed SNP lists for both cohorts and found no overlap.
Any guidance or insights into why I might not be seeing overlapping SNPs and suggestions for how to resolve this issue would be greatly appreciated.
Yasaman Jamalipour Soofi
Hello BDC Forum Members,
I hope you are doing well. I’m currently working with the JHS and FHS cohorts from the TOPMed whole-genome sequencing (WGS) dataset, and I have encountered an issue that I would greatly appreciate your help with.
Problem: No Overlapping SNPs Between JHS and FHS Cohorts
I have been investigating the genetic variants in both the JHS and FHS cohorts. For one chromosome, I tried merging the VCF files from both cohorts using BCFtools (on the Seven Bridges platform). After merging, I applied PLINK for filtering variants based on missingness thresholds (--mind and --geno). Even with very large thresholds, I ended up with no mutual variants between the cohorts.
Additionally, after performing QC on each cohort separately and comparing the list of SNPs that passed filtering, I noticed that there are no overlapping variants between the two cohorts. This was surprising because I expected to find mutual variants given the large number of variants typically captured by WGS.
Questions:
Is this expected behavior? Could there truly be no overlapping variants between the JHS and FHS cohorts, or might I have made a mistake during the merging and QC process?
Why might there be no/many mutual SNPs? WGS data is typically extensive and includes millions of variants. Shouldn’t there be a substantial overlap of SNPs across different TOPMed cohorts like JHS and FHS?
Steps Taken:I merged the VCF files for one chromosome from both JHS and FHS using BCFtools on Seven Bridges.I used PLINK for filtering based on missingness thresholds (--mind and --geno), but after filtering, no mutual SNPs remained.I also compared the preprocessed SNP lists for both cohorts and found no overlap.
Any guidance or insights into why I might not be seeing overlapping SNPs and suggestions for how to resolve this issue would be greatly appreciated.
Thank you in advance for your help!
Best regards,
Yasaman Soofi