The most immediately actionable / least ambiguous things are SNPs with well-established effects.
I recommend checking your VCF for variants in the following set sof genes:
The American College of Medical Genetics and Genomics (ACMG) Secondary Findings Working Group (SFWG) list of recommended genes for opportunistic screening, most recently updated in 2025. This is a list of 100 genes that doctors recommend taking a look at if you happen to already have genomic data. You still need to cross-reference ClinVar ― they have per-gene reporting thresholds that are, for most of them “all P and LP” (i.e. all variants annotated as “Pathogenic” or “Likely Pathogenic”.
Nutrigenomics: I ended up asking Claude to write up a list of this, since I couldn’t find a single canoncial source, but these are things like FUT2 (vitamin D metabolism) and MTHFR (folate metabolism), that have common variants which may affect your ability to acquire certain nutrients from certain dietary sources.
Phamacogenomics: I agree that PharmCAT is the right tool for this.
Polygenic risk scores vary a lot in quality and are more difficult to calculate.
I’m still forming opinions on this, but a few important caveats:
Published PRS are of highly variable quality, and the best ones are proprietary
Often papers will only publish a list of significant SNPs, which is much less information than you want, and only kind of a PRS; I have ended up trying to do some of my own GWAS --> PRS conversions, which has a lot of its own minefields (more advanced methods go beyond “Bayesian correction for linkage disequilibrium” and use larger datasets, combine multiple GWAS, and do adjustments for family relatedness that are beyond my sophistication)
Your VCF is not enough to calcualte your PRS, because the reference (rather than variant) is the “effect allele” for many scores of interest; the default behaviour of tools like PRSKB CLI and PSG_Calc is to impute the missing alleles based on your VCF but this is going to be wrong, you likely want to re-call variants from your CRAM (this is dicussed at length in the blog post I linked earlier)
CNVs are fairly tractable to calculate from CRAMs
You probably should do this analysis, since there are some CNVs that have very high effect sizes for psychiatric conditions.
Thanks! Sounds good. Yeah, I’ll check for those variants.
Regarding PRS quality, indeed. There’s a table in a collapsible section with an analysis of the quality of PRSs used. Interesting regarding your own conversion from GWAS to PRS.
Your VCF is not enough to calcualte your PRS, because the reference (rather than variant) is the “effect allele” for many scores of interest; the default behaviour of tools like PRSKB CLI and PSG_Calc is to impute the missing alleles based on your VCF but this is going to be wrong, you likely want to re-call variants from your CRAM (this is dicussed at length in the blog post I linked earlier)
Ah, cool, yes. Interestingly Claude/GPT left a comment in its code mentioning exactly this problem, and then punted on it and I didn’t notice.
CNVs are fairly tractable to calculate from CRAMs
We attempted this and it failed because of contig mismatch with the reference on the CRAM. Going back to it, we could have just downloaded the appropriate one? (DRAGEN/Lumina?) Another thing not done for no good reason that I didn’t catch. (Other things I did catch, but not this.)
Oh, huh, DRAGEN is new Illumina software that appears to be using human pangenome references; do you know what reference genome your CRAM was aligned to?
Since it’s already aligned to a reference, your better bet is to remap the coordinates; LiftOver in bcftools is a normal way to remap from one reference to another. I used Manta for calling CNVs, but it seems like maybe DRAGEN is better software?
We should chat about this! I have been semi-vibe-analyzing my genome based in part on the January 2025 blog post Calculating Polygenic Risk Scores from Whole Genome Sequencing Data and have replicated some of the same conclusions as you.
The most immediately actionable / least ambiguous things are SNPs with well-established effects.
I recommend checking your VCF for variants in the following set sof genes:
The American College of Medical Genetics and Genomics (ACMG) Secondary Findings Working Group (SFWG) list of recommended genes for opportunistic screening, most recently updated in 2025. This is a list of 100 genes that doctors recommend taking a look at if you happen to already have genomic data. You still need to cross-reference ClinVar ― they have per-gene reporting thresholds that are, for most of them “all P and LP” (i.e. all variants annotated as “Pathogenic” or “Likely Pathogenic”.
Nutrigenomics: I ended up asking Claude to write up a list of this, since I couldn’t find a single canoncial source, but these are things like FUT2 (vitamin D metabolism) and MTHFR (folate metabolism), that have common variants which may affect your ability to acquire certain nutrients from certain dietary sources.
Phamacogenomics: I agree that PharmCAT is the right tool for this.
Polygenic risk scores vary a lot in quality and are more difficult to calculate.
I’m still forming opinions on this, but a few important caveats:
Published PRS are of highly variable quality, and the best ones are proprietary
Often papers will only publish a list of significant SNPs, which is much less information than you want, and only kind of a PRS; I have ended up trying to do some of my own GWAS --> PRS conversions, which has a lot of its own minefields (more advanced methods go beyond “Bayesian correction for linkage disequilibrium” and use larger datasets, combine multiple GWAS, and do adjustments for family relatedness that are beyond my sophistication)
Your VCF is not enough to calcualte your PRS, because the reference (rather than variant) is the “effect allele” for many scores of interest; the default behaviour of tools like PRSKB CLI and PSG_Calc is to impute the missing alleles based on your VCF but this is going to be wrong, you likely want to re-call variants from your CRAM (this is dicussed at length in the blog post I linked earlier)
CNVs are fairly tractable to calculate from CRAMs
You probably should do this analysis, since there are some CNVs that have very high effect sizes for psychiatric conditions.
Thanks! Sounds good. Yeah, I’ll check for those variants.
Regarding PRS quality, indeed. There’s a table in a collapsible section with an analysis of the quality of PRSs used. Interesting regarding your own conversion from GWAS to PRS.
Ah, cool, yes. Interestingly Claude/GPT left a comment in its code mentioning exactly this problem, and then punted on it and I didn’t notice.
We attempted this and it failed because of contig mismatch with the reference on the CRAM. Going back to it, we could have just downloaded the appropriate one? (DRAGEN/Lumina?) Another thing not done for no good reason that I didn’t catch. (Other things I did catch, but not this.)
Cool, that gives me some things to do.
Oh, huh, DRAGEN is new Illumina software that appears to be using human pangenome references; do you know what reference genome your CRAM was aligned to?
Since it’s already aligned to a reference, your better bet is to remap the coordinates; LiftOver in bcftools is a normal way to remap from one reference to another. I used Manta for calling CNVs, but it seems like maybe DRAGEN is better software?