This paper found that the heritability of most traits is ~entirely additive, supposedly including IQ according to whatever reference I followed to the paper, though I couldn’t actually find where in the paper it said/implied that.
Thanks. I think it’s an important point you make; I do have it in mind that traits can have nonlinearities at different “stages”, but I hadn’t connected that to the personality trait issue. I don’t immediately see a very strong+clear argument for personality traits being super exceptional here. Intuitively it makes sense that they’re more “complicated” or “involve more volatile forces” or something due to being mental traits, but a bit more clarity would help. In particular, I don’t see the argument being able to support yes significant broadsense heritability but very little apparent narrowsense heritability. (Though maybe it can somehow.)
(Also btw I wouldn’t exactly call multiplicativity by itself “nonlinearity”! I would just say that after the genomic fan-in there is a nonlinearity. It’s linearity as long as there’s a latent variable that’s a sum of alleles. Indeed, as E.G. pointed out to me, IQ could very well be like this, i.e. IQ-associated traits might be even better predicted by assuming IQ is lognormal (or some other such distribution). Though of course then you can’t say that such and such downstream outcome is linear in genes; but the point would be that you can push along the trait by independently pushing on lots of alleles.)
(Also in theory I ought to go through your whole article at some point; but as yet I’m not read in to polygenic scores, as there’s already good enough predictors for the MVP, and biotech is the main bottleneck. Though yes, motivating the biotech depends on understanding PGSes.)
If you take some stupid outcome like “a person’s fleep is their grip strength raised to the power of their alcohol tolerance”, and measure fleep across a population, you will obviously find that there’s a strong non-additive genetic contribution to that outcome. A.k.a. epistasis. If you want to say “no, that’s not really non-additive, and it’s not really epistasis, it’s just that ‘fleep’ is a damn stupid outcome to analyze”, then fine, but then the experts really need to settle on a standardized technical term for “damn stupid outcomes to analyze”, and then need to consider the possibility that pretty much every personality trait and mental health diagnosis (among other things) is a “damn stupid outcome” in much the same way.
I don’t immediately see a very strong+clear argument for personality traits being super exceptional here. Intuitively it makes sense that they’re more “complicated” or “involve more volatile forces” or something due to being mental traits, but a bit more clarity would help.
I do hope to unravel the deep structure of personality variation someday! In particular, what exactly are the linear “traits” that correspond directly to brain algorithm settings and hyperparameters? (See A Theory of Laughter and Neuroscience of human social instincts: a sketch for the very early stages of that. Warning: long.)
I guess a generic answer would be: the path FROM brain algorithm settings and hyperparameters TO decisions and preferences passes through a set of large-scale randomly-initialized learning algorithms churning away for a billion seconds. (And personality traits are basically decisions and preferences—see examples here.) That’s just a massive source of complexity, obfuscating the relationship between inputs and outputs.
A kind of analogy is: if you train a set of RL agents, each with slightly different reward functions, their eventual behavior will not be smoothly variant. Instead there will be lots of “phase shifts” and such.
So again, we have “a set of large-scale randomly-initialized learning algorithms that run for a billion seconds” on the pathway from the genome to (preferences and decisions). And there’s nothing like that on the pathway from the genome to more “physical” traits like blood pressure. Trained models are much more free to vary across an extremely wide, open-ended, high-dimensional space, compared to biochemical developmental pathways.
I don’t see the argument being able to support yes significant broadsense heritability but very little apparent narrowsense heritability. (Though maybe it can somehow.)
See also Heritability, Behaviorism, and Within-Lifetime RL: “As adults in society, people gradually learn patterns of thought & behavior that best tickle their innate internal reward function as adults in society.”
Different people have different innate reward functions, and so different people reliably settle into different patterns of thought & behavior, by the time they reach adulthood. But given a reward function (and learning rate and height and so on), the eventual patterns of thought & behavior that they’ll settle into are reliably predictable, at least within a given country / culture.
That’s just a massive source of complexity, obfuscating the relationship between inputs and outputs.
A kind of analogy is: if you train a set of RL agents, each with slightly different reward functions, their eventual behavior will not be smoothly variant. Instead there will be lots of “phase shifts” and such.
And yet it moves! Somehow it’s heritable! Do you agree there is a tension between heritability and your claims?
No … I think you must not have followed me, so I’ll spell it out in more detail.
Let’s imagine that there’s a species of AlphaZero-chess agents, with a heritable and variable reward function but identical in every other way. One individual might have a reward function of “1 for checkmate”, but another might have “1 for checkmate plus 0.1 for each enemy piece on the right side of the endgame board” or “1 for checkmate minus 0.2 for each enemy pawn that you’ve captured”, or “0 for checkmate but 1 for capturing the enemy queen”, or whatever.
Then every one of these individuals separately grows into “adults” by undergoing the “life experience” of the AlphaZero self-play training regime for 40,000,000 games. And now we look at the behavior of those “adults”.
If you take two identical twin adults in this species, you’ll find that they behave extremely similarly. If one tends to draw out the enemy bishop in thus-and-such situation, then so does the other. Why? Because drawing out the enemy bishop is useful given a certain kind of reward function, and they have the same reward function, and they’re both quite good at maximizing it. So you would find high broad-sense heritability of behavior.
But it’s unlikely that you’ll find a linear map from the space of reward functions to the space of midgame behavioral tendencies. Lots of individuals will be trying to draw out the enemy bishop in such-and-such situation, and lots of individuals won’t be trying to draw out the enemy bishop in that same kind of situation, for lots of different reasons, ultimately related to their different reward functions. The midgame behavior is more-or-less a deterministic function of the reward function, but it’s a highly nonlinear function. So you would measure almost zero narrow-sense heritability of behavior.
Wait a minute. Does your theory predict that heritability estimates about personality traits derived from MZTwins will be much much higher than estimates derived from DZTwins or other methods not involving MZTwins?
Yeah, one of the tell-tale signs of non-additive genetic influences is that MZ twins are still extremely similar, but DZ twins and more distant relatives are more different than you’d otherwise expect. (This connects to PGSs because PGSs are derived from distantly-related people.) See §1.5.5 here, and also §4.4 (including the collapsible box) for some examples.
Mkay.… I’m gonna tap out for now, but this is very helpful, thanks. I’m still pretty skeptical, though indeed
I am failing (and then I think later succeeding) at significant chunks of basic reading comprehension about what you’re saying;
I’m still confused, so my skepticism isn’t a confident No.
As a bookmark/trailhead, I suggest that maybe your theory of “personality has a high complexity but pretty deterministic map from a smallish number of pretty-genetically-linear full-brain-settings to behavior due to convergent instrumentality” and some sort of “personality mysteriously has a bunch of k-th order epistases that all add up” would both predict MZ being more similar than DZ, but your theory would predict this effect more strongly than the k-th order thing.
Another: there’s something weird where I don’t feel your argument about a complex map being deterministic because of convergent instrumentality ought to work for the sorts of things that personality traits are; like they don’t seem analogous to “draws out bishop in xyz position”, and in the chess example idk if I especially would there to be “personality traits” of play.… or something about this.
Another bookmark: IIUC your theory requires that the relevant underlying brain factors are extremeley pinned down by genetics, because the complicated map from underlying brain stuff to personality is chaotic.
If you want to say “no, that’s not really non-additive, and it’s not really epistasis, it’s just that ‘fleep’ is a damn stupid outcome to analyze”, then fine, but then the experts really need to settle on a standardized technical term for “damn stupid outcomes to analyze”, and then need to consider the possibility that pretty much every personality trait and mental health diagnosis (among other things) is a “damn stupid outcome” in much the same way.
Hm. I wrote half a response to this, but then realized that… IDK, we’re thinking of this incorrectly, but I’m not fully sure how. (Ok possibly only I’m thinking of it incorrectly lol. Though I think you are too.) I’ll say some things, but not sure what the overall point should be. (And I still haven’t read your post so maybe you addressed some of this elsewhere.) (And any of the following might be confused, I’m not being careful.)
Thing:
You could have a large number of genetic variants Gk in a genome G, and then you have a measured personality trait p(G). Suppose that in fact p(G)=f(∑kwkGk). If f is linear, like f(x)=ax+b, then p is unproblematically linear. But suppose f is exponentiation, so that
p(G)=e∑kwkGk=∏kwkGk
In this case, p is of course not linear in the Gk. However:
This does not make p a stupid thing to measure in any way. If it’s an interesting trait then it’s an interesting trait.
You could absolutely still affect p straightforwardly using genomic engineering; just up/downvote Gk with positive/negative wk.
There’s an obvious latent, ∑kwkGk, which is linear. This latent should be of interest; and this doesn’t cut against p being real, or vice versa.
In real life, you get data that’s concentrated around one value of ∑kwkGk, with gaussian variation. If you look at ∑k≠iwkGk, it’s also concentrated. So like, take the derivative of ewig+∑k≠iwkGk with respect to g. (Or something like that.) This should tell us something like the “additive” effect of Gi! In other words, around the real life mean, it should be locally kinda linear.
This implies you should see the “additive” variance! Your GWAS should pick stuff up!
So if your GWAS is not picking stuff up, then p(G) is NOT like this!
Thing:
A simple alternative definition of epistasis (which I don’t actually like, but is better than “nonlinear”): There are epistatic effects on p(G) from G when p(G) cannot be written in the form
p(G)=f(∑kwkGk)
with f:R→R arbitrary.
Thing:
Suppose now that
p(G)=(∑kwkGk)×(∑kvkGk)
This matches your multiplicative scenario. But AGAIN, you should be seeing “additive” variance, locally!
Thing:
There’s maybe some nicer understanding of GWASes and PGSes and whatnot in terms of taking something like a derivative at the population mean. So it’s not about linearity, it’s about the size of the derivative. Or something like that.
Thing:
We do care about a totally different sort of nonlinearity, which is that we care whether pushing in some direction increases the trait a lot or a little before it bends back down in a U turn. But this is different from “do I see the local derivative in this SNP”!
Thing:
SINGLE gene—SINGLE gene—SINGLE gene interactions matter for detecting the effects because having this SPECIFIC combination is very rare, so you don’t see the effects often; and the AGGREGATE effect of a single gene that’s involved in these higher order effects can be small because they’re SCRAMBLED ….. or something, I’m confused how this works. This doesn’t work for high-fan-in latents which you then combine, because the aggregates are NOT scrambled and you can detect them. Or something.
Thing:
So I DO NOT BUY a model where personality is hard to see additive stuff in GWASes due to it being several traits multiplied together or something!
Thing:
I’m confused lol.
Thing:
How do the total expected effects of one variant ever cancel out?? (This is a basic question that should/might have a basic-ish answer.) Even if all the genetic effects are from specific 2,3,4 gene sets that have effects together, there’d have to be a lot of them, so if the union of all those sets is a normal number of variants for a polygenic trait (say 10k), then each SNP would be involved in MANY of these things; so that SNP’s expected effect given the population ought to be a great big sum of all these “probability that the rest of the set is there, times the difference in effect for completing the full set with this SNP vs. not having the set”. But then it should be basically a gaussian, and there should be many of these SNPs with significant average effect, which would show in a GWAS! So I’m confused.
around the real life mean, it should be locally kinda linear
Even in personality and mental health, the PGSs rarely-if-ever account for literally zero percent of the variance. Normally the linear term of a Taylor series is not zero.
I think what you’re missing is: the linear approximation only works well (accounts for much of the variance) to the extent that the variation is small. But human behavioral differences—i.e. the kinds of things measured by personality tests and DSM checklists—are not small. There are people with 10× more friends than me, talk to them 10× as often as me, play 10× more sports than me, read 10× less than me, etc.
Why? As in my other comment, small differences in what feels rewarding and motivating to a person can cascade into massive differences in behavior. If someone finds it mildly unpleasant to be around other people, then that’s a durable personality trait, and it impacts a million decisions that they’ll make every day, all in the same direction, and thus it impacts how they spend pretty much all of their waking time, including even what they choose to think about each moment, for their entire life. So much flows from that.
It becomes more complex once you take the sum of the product of several things. At that point the log-additive effect of one of the terms in the sum disappears if the other term in the sum is high. If you’ve got a lot of terms in the sum and the distribution of the variables is correct, this can basically kill the bulk of common additive variance. Conceptually speaking, this can be thought of as “your system is a mixture of a bunch of qualitatively distinct things”. Like if you imagine divorce or depression can be caused by a bunch of qualitatively unrelated things.
Quite plausibly yes these heritability estimates and PGSes are picking up on heterogeneous things, but they still work, and you can still construct the PGS; you find the additive variants when you look.
(Also I am interested in the difference between traits that are OR / SUM of some heritable things and some non-heritable things. E.g. you can get lung cancer from lung cancer genes, or from smoke 5 packs a day. This matters for asking “just how low exactly can we drive down disease risk?”. But this would not show up as missing heritability!)
Isn’t the derivative of the full variable in one of the multiplicands still noticeable? Maybe it would help if you make some quantitative statement?
Taking the logarithm (to linearize the association) scales the derivative down by the reciprocal of the magnitude. So if one of the terms in the sum is really big, all the derivatives get scaled down by a lot. If each of the terms are a product, then the derivative for the big term gets scaled up to cancel out the downscaling, but the small terms do not.
Under the condition I mentioned, polygenic scores will tend to focus on the traits that cause the most common kind of depression, while neglecting other kinds. The missing heritability will be due to missing those other kinds.
Can you please write down the expressions you’re talking about as math? If you’re trying to invoke standard genetics knowledge, I’m not a geneticist and I’m not picking it up from what you’re saying.
Let’s start with the basics: If the outcome f is a linear function of the genes x, that is f(x)=βx, then the effect of each gene is given by the gradient of f, i.e. ∇xf(x)=β. (This is technically a bit sketchy since a genetic variant is discrete while gradients require continuity, but it works well enough as a conceptual approximation for our purposes.) Under this circumstance, we can think of genomic studies as finding β. (This is also technically a bit sketchy because of linkage disequillibrium and such, but it works well enough as a conceptual approximation for our purposes.)
If f isn’t a linear function, then there is no constant β to find. However, the argument for genomic studies still mostly goes through that they can find E[∇xf(x)], it’s just that this expression now denotes a weird mismash effect size that’s not very interpretable.
As you observed, if f is almost-linear, for example if f(x)=eβx, then genomic studies still have good options. The best is probably to measure the genetic influence on logf, as then we get a pretty meaningful coefficient out of it. (If we measured the genetic influence of f without the logarithm, I think under commonly viable assumptions we would get β′i∝eβi−1, but don’t cite me on that.)
The trouble arises when you have deeply nonlinear forms such as f(x)=eβx+eγx. If we take the gradient of this, then the chain rule gives us ∇logf(x)=eβxβ+eγxγeβx+eγx. That is, the two different mechanisms “suppress” each other, so if eβx is usually high, then the γ term would usually be (implicitly!) excluded from the analysis.
Ah. Thank you, this makes sense of what you said earlier. (I / someone could have gotten this from what you had written before, by thinking about it more, probably.)
I agree with your analysis as math.
However, I’m skeptical of the application to the genetics stuff, or at least I don’t see it yet. Specifically, you wrote:
If you’ve got a lot of terms in the sum and the distribution of the variables is correct, this can basically kill the bulk of common additive variance. Conceptually speaking, this can be thought of as “your system is a mixture of a bunch of qualitatively distinct things”. Like if you imagine divorce or depression can be caused by a bunch of qualitatively unrelated things.
And your argument here says that there’s “gradient interference” between the summed products specifically when one of the summed products is really big. But in the case of disease risk, IIUC the sum-of-products f(x) is something like logits. So translating your argument, it’s like:
Suppose one of the causes of X-disease contributes a ton of logits, to the point where it’s already overdetermined that you have X. Then you can’t notice the effects of another one of the causes of X. Even if there are such effects, most people have the disease anyway, so you get very little signal, which only comes from the lucky few who didn’t get X from the first cause.
In this case, yes the analysis is valid, but it’s not very relevant. For the diseases that people tend to talk about, if there are several substantial disjunctive causes (I mean, the risk is a sum of a few different sub-risks), then they all would show substantial signal in the data. None of them drowns out all the others.
Maybe you just meant to say “In theory this could happen”.
Or am I missing what you’re suggesting? E.g. is there a way for there to be a trait that:
has lots of variation (e.g. lots of sick people and lots of non-sick people), and
it’s genetic, and
it’s a fairly simple functional form like we’ve been discussing,
but you can’t optimize it much by changing a bunch of variants found by looking at some millions of genotype/phenotype pairs?
The original discussion was about how personality traits and social outcomes could behave fundamentally differently from biological traits when it comes to genetics. So this isn’t necessarily meant to apply to disease risks.
Well you brought up depression. But anyway, all my questions apply to personality traits as well.
..… To rephrase / explain how confused I am about what you’re trying to tell me: It kinda sounds like you’re saying “If some trait is strongly determined by one big chunk of genes, then you won’t be able to see how some other chunk affects the trait.”. But this can’t explain missing heritability! In this scenario, none of the heritability is even from the second chunk of genes in the first place! Or am I missing something?
Because if some of the heritability is from the second chunk, that means that for some pairs of people, they have roughly the same first chunk but somewhat different second chunks; and they have different traits, due to the difference in second chunks. If some amount of heritability is from the second chunk, then to that extent, there’s a bunch of pairs of people whose trait differences are explained by second chunk differences. If you made a PGS, you’d see these pairs of people and then you’d find out how specifically the second chunk affects the trait.
I could be confused about some really basic math here, but yeah, I don’t see it. Your example for how the gradient doesn’t flow seems to say “the gradient doesn’t flow because the second chunk doesn’t actually affect the trait”.
If some amount of heritability is from the second chunk, then to that extent, there’s a bunch of pairs of people whose trait differences are explained by second chunk differences. If you made a PGS, you’d see these pairs of people and then you’d find out how specifically the second chunk affects the trait.
This only applies if the people are low in the first chunk and differ in the second chunk. Among the people who are high in the first chunk but differ in the second chunk, the logarithm of their trait level will be basically the same regardless of the second chunk (because the logarithm suppresses things by the total), so these people will reduce the PGS coefficients rather than increasing the PGS coefficients. When you create the PGS, you include both groups, so the PGS coefficients will be downwards biased relative to γ.
Among the people who are high in the first chunk but differ in the second chunk, the logarithm of their trait level will be basically the same regardless of the second chunk (because the logarithm suppresses things by the total), so these people will reduce the PGS coefficients rather than increasing the PGS coefficients
It would decrease the narrowsense (or additive) heritability, which you can basically think of as the squared length of your coefficient vector, but it wouldn’t decrease the broadsense heritability, which is basically the phenotypic variance in expected trait levels you’d get by shuffling around the genotypes. The missing heritability problem is that when we measure these two heritabilities, the former heritability is lower than the latter.
Why not? Shuffling around the second chunk, while the first chunk is already high, doesn’t do anything, and therefore does not contribute phenotypic variance to broadsense heritability.
Ok, more specifically, the decrease in the narrowsense heritability gets “double-counted” (after you’ve computed the reduced coefficients, those coefficients also get applied to those who are low in the first chunk and not just those who are high, when you start making predictions), whereas the decrease in the broadsense heritability is only single-counted. Since the single-counting represents a genuine reduction while the double-counting represents a bias, it only really makes sense to think of the double-counting as pathological.
Ah… ok I think I see where that’s going. Thanks! (Presumably there exists some standard text about this that one can just link to lol.)
I’m still curious whether this actually happens.… I guess you can have the “propensity” be near its ceiling.… (I thought that didn’t make sense, but I guess you sometimes have the probability of disease for a near-ceiling propensity be some number like 20% rather than 100%?) I guess intuitively it seems a bit weird for a disease to have disjunctive causes like this, but then be able to max out at the risk at 20% with just one of the disjunctive causes? IDK. Likewise personality...
(Presumably there exists some standard text about this that one can just link to lol.)
I don’t think so.
I’m still curious whether this actually happens.… I guess you can have the “propensity” be near its ceiling.… (I thought that didn’t make sense, but I guess you sometimes have the probability of disease for a near-ceiling propensity be some number like 20% rather than 100%?) I guess intuitively it seems a bit weird for a disease to have disjunctive causes like this, but then be able to max out at the risk at 20% with just one of the disjunctive causes? IDK. Likewise personality...
For something like divorce, you could imagine the following causes:
Most common cause is you married someone who just sucks
… but maybe you married a closeted gay person
… or maybe your partner was good but then got cancer and you decided to abandon them rather than support them through the treatment
The genetic propensities for these three things are probably pretty different: If you’ve married someone who just sucks, then a counterfactually higher genetic propensity to marry people who suck might counterfactually lead to having married someone who sucks more, but a counterfactually higher genetic propensity to marry a closeted gay person probably wouldn’t lead to counterfactually having married someone who sucks more, nor have much counterfactual effect on them being gay (because it’s probably a nonlinear thing), so only the genetic propensity to marry someone who sucks matters.
In fact, probably the genetic propensity to marry someone who sucks is inversely related to the genetic propensity to divorce someone who encounters hardship, so the final cause of divorce is probably even more distinct from the first one.
(Presumably there exists some standard text about this that one can just link to lol.)
I don’t think so.
How confident are you / why do you think this? (It seems fairly plausible given what I’ve heard about the field of genomics, but still curious.) E.g. “I have a genomics PhD” or “I talk to geneticists and they don’t really know about this stuff” or “I follow some twitter stuff and haven’t heard anyone talk about this”.
In fact, probably the genetic propensity to marry someone who sucks is inversely related to the genetic propensity to divorce someone who encounters hardship, so the final cause of divorce is probably even more distinct from the first one.
Ok I’m too tired to follow this so I’ll tap out of the thread for now.
Under the condition I mentioned, polygenic scores will tend to focus on the traits that cause the most common kind of depression, while neglecting other kinds. The missing heritability will be due to missing those other kinds.
I don’t get why you think this. It doesn’t seem to make any sense. You’d still notice the effect of variants that cause depression-rare, exactly like if depression-rare was the only kind of depression. How is your ability to detect depression-rare affected by the fact that there’s some genetic depression-common? Depression-common could just as well have been environmentally caused.
I might be being dumb, I just don’t get what you’re saying and don’t have a firm grounding myself.
It doesn’t matter if depression-common is genetic or environmental. Depression-common leads to the genetic difference between your cases and controls to be small along the latent trait axis that causes depression-rare. So the effect gets estimated to be not-that-high. The exact details of how it fails depends on the mathematical method used to estimate the effect.
Ok I think I get what you’re trying to communicate, and it seems true, but I don’t think it’s very relevant to the missing heritability thing. The situation you’re describing applies to the fully linear case too. You’re just saying that if a trait is more polygenic / has more causes with smaller effects, it’s harder to detect relevant causes. Unless I still don’t get what you’re saying.
It kind-of applies to the Bernoulli-sigmoid-linear case that would usually be applied to binary diagnoses (but only because of sample size issues and because they usually perform the regression one variable at a time to reduce computational difficulty), but it doesn’t apply as strongly as it does to the polynomial case, and it doesn’t apply to the purely linear (or exponential-linear) case at all.
If you have a purely linear case, then the expected slope of a genetic variant onto an outcome of interest is proportional to the effect of the genetic variant.
The issue is in the polynomial case, the effect size of one genetic variant depends on the status of other genetic variants within the same term in the sum. Statistics gives you a sort of average effect size, but that average effect size is only going to be accurate for the people with the most common kind of depression.
I don’t find that surprising at all. IMO, personality is a more of an emergent balancing of multidimensional characteristics than something like height or IQ (though this is mostly vibes-based speculation).
Does it seem likely that a trait that has survival significance (in a highly social animal such as a human) would be emergent? Even if it might have been initially, you’d think selective pressure would have brought forth a set of genes that have significant influence on it.
This paper found that the heritability of most traits is ~entirely additive, supposedly including IQ according to whatever reference I followed to the paper, though I couldn’t actually find where in the paper it said/implied that.
And then suddenly it’s different for personality? Kinda weird.
I have a simple model with toy examples of where non-additivity in personality and other domains comes from, see §4.3.3 here.
Thanks. I think it’s an important point you make; I do have it in mind that traits can have nonlinearities at different “stages”, but I hadn’t connected that to the personality trait issue. I don’t immediately see a very strong+clear argument for personality traits being super exceptional here. Intuitively it makes sense that they’re more “complicated” or “involve more volatile forces” or something due to being mental traits, but a bit more clarity would help. In particular, I don’t see the argument being able to support yes significant broadsense heritability but very little apparent narrowsense heritability. (Though maybe it can somehow.)
(Also btw I wouldn’t exactly call multiplicativity by itself “nonlinearity”! I would just say that after the genomic fan-in there is a nonlinearity. It’s linearity as long as there’s a latent variable that’s a sum of alleles. Indeed, as E.G. pointed out to me, IQ could very well be like this, i.e. IQ-associated traits might be even better predicted by assuming IQ is lognormal (or some other such distribution). Though of course then you can’t say that such and such downstream outcome is linear in genes; but the point would be that you can push along the trait by independently pushing on lots of alleles.)
(Also in theory I ought to go through your whole article at some point; but as yet I’m not read in to polygenic scores, as there’s already good enough predictors for the MVP, and biotech is the main bottleneck. Though yes, motivating the biotech depends on understanding PGSes.)
You’re not the first to complain about my terminology here, but nobody can tell me what terminology is right. So, my opinion is: “No, it’s the genetics experts who are wrong” :)
If you take some stupid outcome like “a person’s fleep is their grip strength raised to the power of their alcohol tolerance”, and measure fleep across a population, you will obviously find that there’s a strong non-additive genetic contribution to that outcome. A.k.a. epistasis. If you want to say “no, that’s not really non-additive, and it’s not really epistasis, it’s just that ‘fleep’ is a damn stupid outcome to analyze”, then fine, but then the experts really need to settle on a standardized technical term for “damn stupid outcomes to analyze”, and then need to consider the possibility that pretty much every personality trait and mental health diagnosis (among other things) is a “damn stupid outcome” in much the same way.
I do hope to unravel the deep structure of personality variation someday! In particular, what exactly are the linear “traits” that correspond directly to brain algorithm settings and hyperparameters? (See A Theory of Laughter and Neuroscience of human social instincts: a sketch for the very early stages of that. Warning: long.)
I guess a generic answer would be: the path FROM brain algorithm settings and hyperparameters TO decisions and preferences passes through a set of large-scale randomly-initialized learning algorithms churning away for a billion seconds. (And personality traits are basically decisions and preferences—see examples here.) That’s just a massive source of complexity, obfuscating the relationship between inputs and outputs.
A kind of analogy is: if you train a set of RL agents, each with slightly different reward functions, their eventual behavior will not be smoothly variant. Instead there will be lots of “phase shifts” and such.
So again, we have “a set of large-scale randomly-initialized learning algorithms that run for a billion seconds” on the pathway from the genome to (preferences and decisions). And there’s nothing like that on the pathway from the genome to more “physical” traits like blood pressure. Trained models are much more free to vary across an extremely wide, open-ended, high-dimensional space, compared to biochemical developmental pathways.
See also Heritability, Behaviorism, and Within-Lifetime RL: “As adults in society, people gradually learn patterns of thought & behavior that best tickle their innate internal reward function as adults in society.”
Different people have different innate reward functions, and so different people reliably settle into different patterns of thought & behavior, by the time they reach adulthood. But given a reward function (and learning rate and height and so on), the eventual patterns of thought & behavior that they’ll settle into are reliably predictable, at least within a given country / culture.
And yet it moves! Somehow it’s heritable! Do you agree there is a tension between heritability and your claims?
No … I think you must not have followed me, so I’ll spell it out in more detail.
Let’s imagine that there’s a species of AlphaZero-chess agents, with a heritable and variable reward function but identical in every other way. One individual might have a reward function of “1 for checkmate”, but another might have “1 for checkmate plus 0.1 for each enemy piece on the right side of the endgame board” or “1 for checkmate minus 0.2 for each enemy pawn that you’ve captured”, or “0 for checkmate but 1 for capturing the enemy queen”, or whatever.
Then every one of these individuals separately grows into “adults” by undergoing the “life experience” of the AlphaZero self-play training regime for 40,000,000 games. And now we look at the behavior of those “adults”.
If you take two identical twin adults in this species, you’ll find that they behave extremely similarly. If one tends to draw out the enemy bishop in thus-and-such situation, then so does the other. Why? Because drawing out the enemy bishop is useful given a certain kind of reward function, and they have the same reward function, and they’re both quite good at maximizing it. So you would find high broad-sense heritability of behavior.
But it’s unlikely that you’ll find a linear map from the space of reward functions to the space of midgame behavioral tendencies. Lots of individuals will be trying to draw out the enemy bishop in such-and-such situation, and lots of individuals won’t be trying to draw out the enemy bishop in that same kind of situation, for lots of different reasons, ultimately related to their different reward functions. The midgame behavior is more-or-less a deterministic function of the reward function, but it’s a highly nonlinear function. So you would measure almost zero narrow-sense heritability of behavior.
Wait a minute. Does your theory predict that heritability estimates about personality traits derived from MZTwins will be much much higher than estimates derived from DZTwins or other methods not involving MZTwins?
Yeah, one of the tell-tale signs of non-additive genetic influences is that MZ twins are still extremely similar, but DZ twins and more distant relatives are more different than you’d otherwise expect. (This connects to PGSs because PGSs are derived from distantly-related people.) See §1.5.5 here, and also §4.4 (including the collapsible box) for some examples.
Mkay.… I’m gonna tap out for now, but this is very helpful, thanks. I’m still pretty skeptical, though indeed
I am failing (and then I think later succeeding) at significant chunks of basic reading comprehension about what you’re saying;
I’m still confused, so my skepticism isn’t a confident No.
As a bookmark/trailhead, I suggest that maybe your theory of “personality has a high complexity but pretty deterministic map from a smallish number of pretty-genetically-linear full-brain-settings to behavior due to convergent instrumentality” and some sort of “personality mysteriously has a bunch of k-th order epistases that all add up” would both predict MZ being more similar than DZ, but your theory would predict this effect more strongly than the k-th order thing.
Another: there’s something weird where I don’t feel your argument about a complex map being deterministic because of convergent instrumentality ought to work for the sorts of things that personality traits are; like they don’t seem analogous to “draws out bishop in xyz position”, and in the chess example idk if I especially would there to be “personality traits” of play.… or something about this.
Another bookmark: IIUC your theory requires that the relevant underlying brain factors are extremeley pinned down by genetics, because the complicated map from underlying brain stuff to personality is chaotic.
Hm. I wrote half a response to this, but then realized that… IDK, we’re thinking of this incorrectly, but I’m not fully sure how. (Ok possibly only I’m thinking of it incorrectly lol. Though I think you are too.) I’ll say some things, but not sure what the overall point should be. (And I still haven’t read your post so maybe you addressed some of this elsewhere.) (And any of the following might be confused, I’m not being careful.)
Thing:
You could have a large number of genetic variants Gk in a genome G, and then you have a measured personality trait p(G). Suppose that in fact p(G)=f(∑kwkGk). If f is linear, like f(x)=ax+b, then p is unproblematically linear. But suppose f is exponentiation, so that
p(G)=e∑kwkGk=∏kwkGk
In this case, p is of course not linear in the Gk. However:
This does not make p a stupid thing to measure in any way. If it’s an interesting trait then it’s an interesting trait.
You could absolutely still affect p straightforwardly using genomic engineering; just up/downvote Gk with positive/negative wk.
There’s an obvious latent, ∑kwkGk, which is linear. This latent should be of interest; and this doesn’t cut against p being real, or vice versa.
In real life, you get data that’s concentrated around one value of ∑kwkGk, with gaussian variation. If you look at ∑k≠iwkGk, it’s also concentrated. So like, take the derivative of ewig+∑k≠iwkGk with respect to g. (Or something like that.) This should tell us something like the “additive” effect of Gi! In other words, around the real life mean, it should be locally kinda linear.
This implies you should see the “additive” variance! Your GWAS should pick stuff up!
So if your GWAS is not picking stuff up, then p(G) is NOT like this!
Thing:
A simple alternative definition of epistasis (which I don’t actually like, but is better than “nonlinear”): There are epistatic effects on p(G) from G when p(G) cannot be written in the form
p(G)=f(∑kwkGk)
with f:R→R arbitrary.
Thing:
Suppose now that
p(G)=(∑kwkGk)×(∑kvkGk)
This matches your multiplicative scenario. But AGAIN, you should be seeing “additive” variance, locally!
Thing:
There’s maybe some nicer understanding of GWASes and PGSes and whatnot in terms of taking something like a derivative at the population mean. So it’s not about linearity, it’s about the size of the derivative. Or something like that.
Thing:
We do care about a totally different sort of nonlinearity, which is that we care whether pushing in some direction increases the trait a lot or a little before it bends back down in a U turn. But this is different from “do I see the local derivative in this SNP”!
Thing:
SINGLE gene—SINGLE gene—SINGLE gene interactions matter for detecting the effects because having this SPECIFIC combination is very rare, so you don’t see the effects often; and the AGGREGATE effect of a single gene that’s involved in these higher order effects can be small because they’re SCRAMBLED ….. or something, I’m confused how this works. This doesn’t work for high-fan-in latents which you then combine, because the aggregates are NOT scrambled and you can detect them. Or something.
Thing:
So I DO NOT BUY a model where personality is hard to see additive stuff in GWASes due to it being several traits multiplied together or something!
Thing:
I’m confused lol.
Thing:
How do the total expected effects of one variant ever cancel out?? (This is a basic question that should/might have a basic-ish answer.) Even if all the genetic effects are from specific 2,3,4 gene sets that have effects together, there’d have to be a lot of them, so if the union of all those sets is a normal number of variants for a polygenic trait (say 10k), then each SNP would be involved in MANY of these things; so that SNP’s expected effect given the population ought to be a great big sum of all these “probability that the rest of the set is there, times the difference in effect for completing the full set with this SNP vs. not having the set”. But then it should be basically a gaussian, and there should be many of these SNPs with significant average effect, which would show in a GWAS! So I’m confused.
Even in personality and mental health, the PGSs rarely-if-ever account for literally zero percent of the variance. Normally the linear term of a Taylor series is not zero.
I think what you’re missing is: the linear approximation only works well (accounts for much of the variance) to the extent that the variation is small. But human behavioral differences—i.e. the kinds of things measured by personality tests and DSM checklists—are not small. There are people with 10× more friends than me, talk to them 10× as often as me, play 10× more sports than me, read 10× less than me, etc.
Why? As in my other comment, small differences in what feels rewarding and motivating to a person can cascade into massive differences in behavior. If someone finds it mildly unpleasant to be around other people, then that’s a durable personality trait, and it impacts a million decisions that they’ll make every day, all in the same direction, and thus it impacts how they spend pretty much all of their waking time, including even what they choose to think about each moment, for their entire life. So much flows from that.
It becomes more complex once you take the sum of the product of several things. At that point the log-additive effect of one of the terms in the sum disappears if the other term in the sum is high. If you’ve got a lot of terms in the sum and the distribution of the variables is correct, this can basically kill the bulk of common additive variance. Conceptually speaking, this can be thought of as “your system is a mixture of a bunch of qualitatively distinct things”. Like if you imagine divorce or depression can be caused by a bunch of qualitatively unrelated things.
Hm....
Not sure how to parse this. (What do you mean ” the distribution of the variables is correct”?)
Isn’t the derivative of the full variable in one of the multiplicands still noticeable? Maybe it would help if you make some quantitative statement?
I mean, I think depression is heritable, and I think there are polygenic scores that do predict some chunk of this. (From a random google: https://jamanetwork.com/journals/jamapsychiatry/fullarticle/2783096 )
Quite plausibly yes these heritability estimates and PGSes are picking up on heterogeneous things, but they still work, and you can still construct the PGS; you find the additive variants when you look.
(Also I am interested in the difference between traits that are OR / SUM of some heritable things and some non-heritable things. E.g. you can get lung cancer from lung cancer genes, or from smoke 5 packs a day. This matters for asking “just how low exactly can we drive down disease risk?”. But this would not show up as missing heritability!)
Taking the logarithm (to linearize the association) scales the derivative down by the reciprocal of the magnitude. So if one of the terms in the sum is really big, all the derivatives get scaled down by a lot. If each of the terms are a product, then the derivative for the big term gets scaled up to cancel out the downscaling, but the small terms do not.
Under the condition I mentioned, polygenic scores will tend to focus on the traits that cause the most common kind of depression, while neglecting other kinds. The missing heritability will be due to missing those other kinds.
Can you please write down the expressions you’re talking about as math? If you’re trying to invoke standard genetics knowledge, I’m not a geneticist and I’m not picking it up from what you’re saying.
Let’s start with the basics: If the outcome f is a linear function of the genes x, that is f(x)=βx, then the effect of each gene is given by the gradient of f, i.e. ∇xf(x)=β. (This is technically a bit sketchy since a genetic variant is discrete while gradients require continuity, but it works well enough as a conceptual approximation for our purposes.) Under this circumstance, we can think of genomic studies as finding β. (This is also technically a bit sketchy because of linkage disequillibrium and such, but it works well enough as a conceptual approximation for our purposes.)
If f isn’t a linear function, then there is no constant β to find. However, the argument for genomic studies still mostly goes through that they can find E[∇xf(x)], it’s just that this expression now denotes a weird mismash effect size that’s not very interpretable.
As you observed, if f is almost-linear, for example if f(x)=eβx, then genomic studies still have good options. The best is probably to measure the genetic influence on logf, as then we get a pretty meaningful coefficient out of it. (If we measured the genetic influence of f without the logarithm, I think under commonly viable assumptions we would get β′i∝eβi−1, but don’t cite me on that.)
The trouble arises when you have deeply nonlinear forms such as f(x)=eβx+eγx. If we take the gradient of this, then the chain rule gives us ∇logf(x)=eβxβ+eγxγeβx+eγx. That is, the two different mechanisms “suppress” each other, so if eβx is usually high, then the γ term would usually be (implicitly!) excluded from the analysis.
Ah. Thank you, this makes sense of what you said earlier. (I / someone could have gotten this from what you had written before, by thinking about it more, probably.)
I agree with your analysis as math.
However, I’m skeptical of the application to the genetics stuff, or at least I don’t see it yet. Specifically, you wrote:
And your argument here says that there’s “gradient interference” between the summed products specifically when one of the summed products is really big. But in the case of disease risk, IIUC the sum-of-products f(x) is something like logits. So translating your argument, it’s like:
In this case, yes the analysis is valid, but it’s not very relevant. For the diseases that people tend to talk about, if there are several substantial disjunctive causes (I mean, the risk is a sum of a few different sub-risks), then they all would show substantial signal in the data. None of them drowns out all the others.
Maybe you just meant to say “In theory this could happen”.
Or am I missing what you’re suggesting? E.g. is there a way for there to be a trait that:
has lots of variation (e.g. lots of sick people and lots of non-sick people), and
it’s genetic, and
it’s a fairly simple functional form like we’ve been discussing,
but you can’t optimize it much by changing a bunch of variants found by looking at some millions of genotype/phenotype pairs?
The original discussion was about how personality traits and social outcomes could behave fundamentally differently from biological traits when it comes to genetics. So this isn’t necessarily meant to apply to disease risks.
Well you brought up depression. But anyway, all my questions apply to personality traits as well.
..… To rephrase / explain how confused I am about what you’re trying to tell me: It kinda sounds like you’re saying “If some trait is strongly determined by one big chunk of genes, then you won’t be able to see how some other chunk affects the trait.”. But this can’t explain missing heritability! In this scenario, none of the heritability is even from the second chunk of genes in the first place! Or am I missing something?
Some of the heritability would be from the second chunk of genes.
To the extent that the heritability is from the second chunk, to that extent the gradient does flow, no?
Why?
Because if some of the heritability is from the second chunk, that means that for some pairs of people, they have roughly the same first chunk but somewhat different second chunks; and they have different traits, due to the difference in second chunks. If some amount of heritability is from the second chunk, then to that extent, there’s a bunch of pairs of people whose trait differences are explained by second chunk differences. If you made a PGS, you’d see these pairs of people and then you’d find out how specifically the second chunk affects the trait.
I could be confused about some really basic math here, but yeah, I don’t see it. Your example for how the gradient doesn’t flow seems to say “the gradient doesn’t flow because the second chunk doesn’t actually affect the trait”.
This only applies if the people are low in the first chunk and differ in the second chunk. Among the people who are high in the first chunk but differ in the second chunk, the logarithm of their trait level will be basically the same regardless of the second chunk (because the logarithm suppresses things by the total), so these people will reduce the PGS coefficients rather than increasing the PGS coefficients. When you create the PGS, you include both groups, so the PGS coefficients will be downwards biased relative to γ.
Wouldn’t this also decrease the heritability?
It would decrease the narrowsense (or additive) heritability, which you can basically think of as the squared length of your coefficient vector, but it wouldn’t decrease the broadsense heritability, which is basically the phenotypic variance in expected trait levels you’d get by shuffling around the genotypes. The missing heritability problem is that when we measure these two heritabilities, the former heritability is lower than the latter.
Why not? Shuffling around the second chunk, while the first chunk is already high, doesn’t do anything, and therefore does not contribute phenotypic variance to broadsense heritability.
Ok, more specifically, the decrease in the narrowsense heritability gets “double-counted” (after you’ve computed the reduced coefficients, those coefficients also get applied to those who are low in the first chunk and not just those who are high, when you start making predictions), whereas the decrease in the broadsense heritability is only single-counted. Since the single-counting represents a genuine reduction while the double-counting represents a bias, it only really makes sense to think of the double-counting as pathological.
Ah… ok I think I see where that’s going. Thanks! (Presumably there exists some standard text about this that one can just link to lol.)
I’m still curious whether this actually happens.… I guess you can have the “propensity” be near its ceiling.… (I thought that didn’t make sense, but I guess you sometimes have the probability of disease for a near-ceiling propensity be some number like 20% rather than 100%?) I guess intuitively it seems a bit weird for a disease to have disjunctive causes like this, but then be able to max out at the risk at 20% with just one of the disjunctive causes? IDK. Likewise personality...
I don’t think so.
For something like divorce, you could imagine the following causes:
Most common cause is you married someone who just sucks
… but maybe you married a closeted gay person
… or maybe your partner was good but then got cancer and you decided to abandon them rather than support them through the treatment
The genetic propensities for these three things are probably pretty different: If you’ve married someone who just sucks, then a counterfactually higher genetic propensity to marry people who suck might counterfactually lead to having married someone who sucks more, but a counterfactually higher genetic propensity to marry a closeted gay person probably wouldn’t lead to counterfactually having married someone who sucks more, nor have much counterfactual effect on them being gay (because it’s probably a nonlinear thing), so only the genetic propensity to marry someone who sucks matters.
In fact, probably the genetic propensity to marry someone who sucks is inversely related to the genetic propensity to divorce someone who encounters hardship, so the final cause of divorce is probably even more distinct from the first one.
How confident are you / why do you think this? (It seems fairly plausible given what I’ve heard about the field of genomics, but still curious.) E.g. “I have a genomics PhD” or “I talk to geneticists and they don’t really know about this stuff” or “I follow some twitter stuff and haven’t heard anyone talk about this”.
Ok I’m too tired to follow this so I’ll tap out of the thread for now.
Thanks again!
I talk to geneticists (mostly on Twitter, or rather now BlueSky) and they don’t really know about this stuff.
Not right now, I’m on my phone. Though also it’s not standard genetics math.
Ok.
I don’t get why you think this. It doesn’t seem to make any sense. You’d still notice the effect of variants that cause depression-rare, exactly like if depression-rare was the only kind of depression. How is your ability to detect depression-rare affected by the fact that there’s some genetic depression-common? Depression-common could just as well have been environmentally caused.
I might be being dumb, I just don’t get what you’re saying and don’t have a firm grounding myself.
It doesn’t matter if depression-common is genetic or environmental. Depression-common leads to the genetic difference between your cases and controls to be small along the latent trait axis that causes depression-rare. So the effect gets estimated to be not-that-high. The exact details of how it fails depends on the mathematical method used to estimate the effect.
Ok I think I get what you’re trying to communicate, and it seems true, but I don’t think it’s very relevant to the missing heritability thing. The situation you’re describing applies to the fully linear case too. You’re just saying that if a trait is more polygenic / has more causes with smaller effects, it’s harder to detect relevant causes. Unless I still don’t get what you’re saying.
It kind-of applies to the Bernoulli-sigmoid-linear case that would usually be applied to binary diagnoses (but only because of sample size issues and because they usually perform the regression one variable at a time to reduce computational difficulty), but it doesn’t apply as strongly as it does to the polynomial case, and it doesn’t apply to the purely linear (or exponential-linear) case at all.
If you have a purely linear case, then the expected slope of a genetic variant onto an outcome of interest is proportional to the effect of the genetic variant.
The issue is in the polynomial case, the effect size of one genetic variant depends on the status of other genetic variants within the same term in the sum. Statistics gives you a sort of average effect size, but that average effect size is only going to be accurate for the people with the most common kind of depression.
I don’t find that surprising at all. IMO, personality is a more of an emergent balancing of multidimensional characteristics than something like height or IQ (though this is mostly vibes-based speculation).
Does it seem likely that a trait that has survival significance (in a highly social animal such as a human) would be emergent? Even if it might have been initially, you’d think selective pressure would have brought forth a set of genes that have significant influence on it.