How many genes out of 20000 commonly vary between humans?
There’s about 4-5 million letters in the genome where at least one percent of humans have a different letter at that location. That’s compared to 3 billion letters overall.
Another way to look at genetic differences is to pick a random pair of humans and ask how much they are likely to differ. The answer is by about 3 million base pairs.
Do more complex traits like intelligence have more gene-gene interactions?
That’s not my impression from reading the literature. There was some giant analysis of educational attainment done last year which found literally zero gene-gene interactions. But I’m not a deep expert on this subject.
Of the total variance, do you know what’s the maximum you could explain with genes?
For intelligence? You can probably get to 1/3rd of variance explained just using SNP arrays like they collect for 23&Me. With whole genome sequencing and more samples you could probably get up to 45%, maybe higher.
Assuming the polygenic scores are not close to the maximum explainable variance: how do you know that there’s not a “complex web” on top of some additive effects?
This is not a theoretical assertion but an empirical one. We have studies on educational attainment with like 3 million participants now that have shown ZERO gene-gene interactions. They definitely exist, (at least for other traits) but according to the authors I guess you need an even larger sample size to identify them. Given how little they expect to improve the predictors power by increasing the sample size, one can infer that these interactions, if they exist (and they surely do to some extent), just don’t explain very much of the variance. (Ctrl+F for “epistatic interactions” in this paper)
There’s about 4-5 million letters in the genome where at least one percent of humans have a different letter at that location. That’s compared to 3 billion letters overall.
Another way to look at genetic differences is to pick a random pair of humans and ask how much they are likely to differ. The answer is by about 3 million base pairs.
Ok. I guess that, for two random humans, you expect almost all 20000 genes to differ at least on a letter, right?
Given how little they expect to improve the predictors power by increasing the sample size, one can infer that these interactions, if they exist (and they surely do to some extent), just don’t explain very much of the variance.
Ok, but this shows that your models do not see the non-additive effects, not that there aren’t any. I don’t know exactly how analyses are done, but assuming they look at interactions with a model like y=β0+β1x1+β2x2+β12x1x2, then they would not pick up the α term in my example because of the hash (the “hash” stands for any very granular and nonlinear function).
But actually I think that it would be very weird to have such “stenographic” interactions only, without also simpler ones, so I’m satisfied with your answer.
Many of the differences between human genomes are actually in “promoter” regions. For a gene to be synthesized into a protein a little enzyme has to come over and bind to a spot next to the gene and transcribe the sequence into mRNA.
Other differences are in regions that don’t seem to affect traits at all. There’s a lot of leftover DNA in our genomes from endoviruses, transposons and other events in our evolutionary history. Sometimes the DNA in those regions randomly mutates into something useful and evolution will start acting on it.
There’s about 4-5 million letters in the genome where at least one percent of humans have a different letter at that location. That’s compared to 3 billion letters overall.
Another way to look at genetic differences is to pick a random pair of humans and ask how much they are likely to differ. The answer is by about 3 million base pairs.
That’s not my impression from reading the literature. There was some giant analysis of educational attainment done last year which found literally zero gene-gene interactions. But I’m not a deep expert on this subject.
For intelligence? You can probably get to 1/3rd of variance explained just using SNP arrays like they collect for 23&Me. With whole genome sequencing and more samples you could probably get up to 45%, maybe higher.
Gwern has written quite extensively about this.
This is not a theoretical assertion but an empirical one. We have studies on educational attainment with like 3 million participants now that have shown ZERO gene-gene interactions. They definitely exist, (at least for other traits) but according to the authors I guess you need an even larger sample size to identify them. Given how little they expect to improve the predictors power by increasing the sample size, one can infer that these interactions, if they exist (and they surely do to some extent), just don’t explain very much of the variance. (Ctrl+F for “epistatic interactions” in this paper)
Ok. I guess that, for two random humans, you expect almost all 20000 genes to differ at least on a letter, right?
Ok, but this shows that your models do not see the non-additive effects, not that there aren’t any. I don’t know exactly how analyses are done, but assuming they look at interactions with a model like y=β0+β1x1+β2x2+β12x1x2, then they would not pick up the α term in my example because of the hash (the “hash” stands for any very granular and nonlinear function).
But actually I think that it would be very weird to have such “stenographic” interactions only, without also simpler ones, so I’m satisfied with your answer.
Many of the differences between human genomes are actually in “promoter” regions. For a gene to be synthesized into a protein a little enzyme has to come over and bind to a spot next to the gene and transcribe the sequence into mRNA.
Other differences are in regions that don’t seem to affect traits at all. There’s a lot of leftover DNA in our genomes from endoviruses, transposons and other events in our evolutionary history. Sometimes the DNA in those regions randomly mutates into something useful and evolution will start acting on it.