There are a couple of major problems with naively intervening to edit sites associated with some phenotype in a GWAS or polygenic risk score.
The SNP itself is (usually) not causal
Genotyping arrays select SNPs the genotype of which is correlated with a region around the SNP, they are said to be in linkage with this region as this region tends to be inherited together when recombination happens in meiosis. This is a matter of degree and linkage scores allow thresholds to be set for how indicative a SNP is about the genotype a given region. If it is not the SNP but rather something with which the SNP is in linkage that is causing the effect editing the SNP has no reason the effect the trait in question.
It is not trivial to figure out what in linkage with a SNP might be causing an effect.
Mendelian randomisation (explainer: https://www.bmj.com/content/362/bmj.k601) is a method that permits the identification of causal relationships from observational genetic studies which can help to overcome this issue.
In practice epistatic interactions between QTLs matter for effects sizes and you cannot naively add up the effect sizes of all the QTLs for a trait and expect the result to reflect the real effect size, even if >50% effect are additive.
Terminology:
epistasis—when the effect of a genetic variant is dependent on the genotype of another gene or genes to have an effect.
QTL—quantitative trait locus, a location in the genome where the genotype is correlated with a quantitative phenotype e.g. height
A hypothetical example of how epistasis can lead to non-additivity in QTLs:
SNPs linked with genes A, B and C are associated with some trait.
Variant A is more a more active kinase than regular A that phosphorylates and activates C, So is variant B
Phosphorylation of C is effectively binary, if either A or B does it does not matter so editing either has the same effect.
Variant C is active even when not phosphorylated so editing A and/or B has no effect beyond that of editing C—except maybe side effects from now phosphorylating something else.
In agronomy where this has been best studied with the goal of engineering crops with specific complex traits once you start trying this epistatic effects show up.
The SNP itself is (usually) not causal Genotyping arrays select SNPs the genotype of which is correlated with a region around the SNP, they are said to be in linkage with this region as this region tends to be inherited together when recombination happens in meiosis. This is a matter of degree and linkage scores allow thresholds to be set for how indicative a SNP is about the genotype a given region.
This is taken into account by our models, and is why we see such large gains in editing power from increasing data set sizes: we’re better able to find the causal SNPs. Our editing strategy assumes that we’re largely hitting non-causal SNPs.
In practice epistatic interactions between QTLs matter for effects sizes and you cannot naively add up the effect sizes of all the QTLs for a trait and expect the result to reflect the real effect size, even if >50% effect are additive.
I’m not aware of any evidence for substantial effects of this sort on quantitative traits such as height. We’re also adding up expected effects, and as long as those estimates are unbiased the errors should cancel out as you do enough edits.
One thing we’re worried about is cases where the haplotypes have the small additive effects rather than individual SNPs, and you get an unpredictable (potentially deleterious) effect if you edit to a rare haplotype even if all SNPs involved are common. Are you aware of any evidence suggesting this would be a problem?
Could you expand on what sense you have ‘taken this into account’ in your models? What are you expecting to achieve by editing non-causal SNPs?
The first paper I linked is about epistasic effects on the additivity of a QTLs for quantitative trait, specifically heading date in rice, so this is evidence for this sort of effect on such a trait.
The general problem is without a robust causal understanding of what an edit does it is very hard to predict what shorts of problem might arise from novel combinations of variants in a haplotype. That’s just the nature of complex systems, a single incorrect base in the wrong place may have no effect or cause a critical cascading failure. You don’t know until you test it or have characterized the system so well you can graph out exactly what is going to happen. Just testing it in humans and seeing what happens is eventually going to hit something detrimental. When you are trying to do enhancement you tend to need a positive expectation that it will be safe not just no reason to think it won’t be. Many healthy people would be averse to risking good health for their kid, even at low probability of a bad outcome.
Could you expand on what sense you have ‘taken this into account’ in your models? What are you expecting to achieve by editing non-causal SNPs?
If we have a SNP that we’re 30% sure is causal, we expect to get 30% of its effect conditional on it being causal. Modulo any weird interaction stuff from rare haplotypes, which is a potential concern with this approach.
The first paper I linked is about epistasic effects on the additivity of a QTLs for quantitative trait, specifically heading date in rice, so this is evidence for this sort of effect on such a trait.
I didn’t read your first comment carefully enough; I’ll take a look at this.
I’m curious about the basis on which you are assigning a probability of causality without a method like mendelian randomisation, or something that tries to assign a probability of an effect based on interpreting the biology like a coding of the output of something like SnpEff to an approximate probability of effect.
The logic of 30% of its effect based on 30% chance it’s causal only seems like it will be pretty high variance and only work out over a pretty large number of edits. It is also assuming no unexpected effects of the edits to SNPs that are non-causal for whatever trait you are targeting but might do something else when edited.
I’m curious about the basis on which you are assigning a probability of causality without a method like mendelian randomisation, or something that tries to assign a probability of an effect based on interpreting the biology like a coding of the output of something like SnpEff to an approximate probability of effect.
Using finemapping. I.e. assuming a model where nonzero additive effects are sparsely distributed among SNPs, you can do Bayesian math to infer how probable each SNP is to have a nonzero effect and its expected effect size conditional on observed GWAS results. Things like SnpEff can further help by giving you a better prior.
One thing we’re worried about is cases where the haplotypes have the small additive effects rather than individual SNPs, and you get an unpredictable (potentially deleterious) effect if you edit to a rare haplotype even if all SNPs involved are common.
This is a point of uncertainty that bothered me when I was doing a similar analysis a while ago. GWAS data is possibly good enough to estimate causal effects of haplotypes, but that’s not enough information to do single base edits. To have reasonable confidence of getting the predicted effect, it’d be necessary to to make all the edits to transform the original haplotype into a different haplotype.
And unlike with distant variants where additive effects dominate, it’d make sense if non-additive effects are strong locally, since the variants are near each other. Whether this is actually true in reality is way beyond my knowledge, though.
To dumb it down a bit, here’s my made up example: you get +1 IQ if your brain has surplus oxygen in the blood flowing through it. There’s 1000 ways to get a bit more oxygen in there, but with +1000 oxygen, you still only get +1 IQ.
Kind of, there are many ways that changed in isolation get you a bit more oxygen but many of them act through the same mechanism so you change 1000 things that get you +1 oxygen on their own but in combination only get you +500.
To use a software analogy imagine an object with two methods where if you call either of them a property of an object is set to true, it doesn’t matter if you call both methods or if you have a bunch of functions that call those methods you still just get true. Calling either method or any function that calls them is going to be slightly correlated with an increased probability the the property of the object will be true but it does not add. There are many way to make it true but making it true more times does not make it ‘more true’.
If we change this from a boolean to an integer then some methods might only increment it if it is not already greater than some value specific to the method.
There are a couple of major problems with naively intervening to edit sites associated with some phenotype in a GWAS or polygenic risk score.
The SNP itself is (usually) not causal Genotyping arrays select SNPs the genotype of which is correlated with a region around the SNP, they are said to be in linkage with this region as this region tends to be inherited together when recombination happens in meiosis. This is a matter of degree and linkage scores allow thresholds to be set for how indicative a SNP is about the genotype a given region.
If it is not the SNP but rather something with which the SNP is in linkage that is causing the effect editing the SNP has no reason the effect the trait in question.
It is not trivial to figure out what in linkage with a SNP might be causing an effect.
Mendelian randomisation (explainer: https://www.bmj.com/content/362/bmj.k601) is a method that permits the identification of causal relationships from observational genetic studies which can help to overcome this issue.
In practice epistatic interactions between QTLs matter for effects sizes and you cannot naively add up the effect sizes of all the QTLs for a trait and expect the result to reflect the real effect size, even if >50% effect are additive.
Terminology:
epistasis—when the effect of a genetic variant is dependent on the genotype of another gene or genes to have an effect.
QTL—quantitative trait locus, a location in the genome where the genotype is correlated with a quantitative phenotype e.g. height
A hypothetical example of how epistasis can lead to non-additivity in QTLs:
SNPs linked with genes A, B and C are associated with some trait.
Variant A is more a more active kinase than regular A that phosphorylates and activates C, So is variant B
Phosphorylation of C is effectively binary, if either A or B does it does not matter so editing either has the same effect.
Variant C is active even when not phosphorylated so editing A and/or B has no effect beyond that of editing C—except maybe side effects from now phosphorylating something else.
In agronomy where this has been best studied with the goal of engineering crops with specific complex traits once you start trying this epistatic effects show up.
For example:
https://doi.org/10.1038/s41598-018-20690-w
https://doi.org/10.1007/s00122-010-1517-0
The (much) bigger problem is not editing a bunch of bases in the embryo it’s knowing which ones to edit (safely).
This is taken into account by our models, and is why we see such large gains in editing power from increasing data set sizes: we’re better able to find the causal SNPs. Our editing strategy assumes that we’re largely hitting non-causal SNPs.
I’m not aware of any evidence for substantial effects of this sort on quantitative traits such as height. We’re also adding up expected effects, and as long as those estimates are unbiased the errors should cancel out as you do enough edits.
One thing we’re worried about is cases where the haplotypes have the small additive effects rather than individual SNPs, and you get an unpredictable (potentially deleterious) effect if you edit to a rare haplotype even if all SNPs involved are common. Are you aware of any evidence suggesting this would be a problem?
Could you expand on what sense you have ‘taken this into account’ in your models? What are you expecting to achieve by editing non-causal SNPs?
The first paper I linked is about epistasic effects on the additivity of a QTLs for quantitative trait, specifically heading date in rice, so this is evidence for this sort of effect on such a trait.
The general problem is without a robust causal understanding of what an edit does it is very hard to predict what shorts of problem might arise from novel combinations of variants in a haplotype. That’s just the nature of complex systems, a single incorrect base in the wrong place may have no effect or cause a critical cascading failure. You don’t know until you test it or have characterized the system so well you can graph out exactly what is going to happen. Just testing it in humans and seeing what happens is eventually going to hit something detrimental. When you are trying to do enhancement you tend to need a positive expectation that it will be safe not just no reason to think it won’t be. Many healthy people would be averse to risking good health for their kid, even at low probability of a bad outcome.
If we have a SNP that we’re 30% sure is causal, we expect to get 30% of its effect conditional on it being causal. Modulo any weird interaction stuff from rare haplotypes, which is a potential concern with this approach.
I didn’t read your first comment carefully enough; I’ll take a look at this.
Can you comment your current thoughts on rare haplotypes?
Don’t have much to say on it right now, I really need to do a deep dive into this at some point.
I’m curious about the basis on which you are assigning a probability of causality without a method like mendelian randomisation, or something that tries to assign a probability of an effect based on interpreting the biology like a coding of the output of something like SnpEff to an approximate probability of effect.
The logic of 30% of its effect based on 30% chance it’s causal only seems like it will be pretty high variance and only work out over a pretty large number of edits. It is also assuming no unexpected effects of the edits to SNPs that are non-causal for whatever trait you are targeting but might do something else when edited.
Using finemapping. I.e. assuming a model where nonzero additive effects are sparsely distributed among SNPs, you can do Bayesian math to infer how probable each SNP is to have a nonzero effect and its expected effect size conditional on observed GWAS results. Things like SnpEff can further help by giving you a better prior.
(For people reading this thread who want an intro to finemapping this lecture is a great place to start for a high level overview https://www.youtube.com/watch?v=pglYf7wocSI)
This is a point of uncertainty that bothered me when I was doing a similar analysis a while ago. GWAS data is possibly good enough to estimate causal effects of haplotypes, but that’s not enough information to do single base edits. To have reasonable confidence of getting the predicted effect, it’d be necessary to to make all the edits to transform the original haplotype into a different haplotype.
And unlike with distant variants where additive effects dominate, it’d make sense if non-additive effects are strong locally, since the variants are near each other. Whether this is actually true in reality is way beyond my knowledge, though.
To dumb it down a bit, here’s my made up example: you get +1 IQ if your brain has surplus oxygen in the blood flowing through it. There’s 1000 ways to get a bit more oxygen in there, but with +1000 oxygen, you still only get +1 IQ.
Is that the idea?
Kind of, there are many ways that changed in isolation get you a bit more oxygen but many of them act through the same mechanism so you change 1000 things that get you +1 oxygen on their own but in combination only get you +500.
To use a software analogy imagine an object with two methods where if you call either of them a property of an object is set to true, it doesn’t matter if you call both methods or if you have a bunch of functions that call those methods you still just get true. Calling either method or any function that calls them is going to be slightly correlated with an increased probability the the property of the object will be true but it does not add. There are many way to make it true but making it true more times does not make it ‘more true’.
If we change this from a boolean to an integer then some methods might only increment it if it is not already greater than some value specific to the method.