I’m an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Email: steven.byrnes@gmail.com. Leave me anonymous feedback here. I’m also at: RSS feed , X/Twitter , Bluesky , Mastodon , Threads , GitHub , Wikipedia , Physics-StackExchange , LinkedIn
Steven Byrnes
There have been many papers looking for dominance and epistasis, but little has been found. EA4 tested across the genome for dominance and found nothing.
See §4.4.3 for my response.
Epistasis generally causes MZ to be more than 2DZ, which is not commonly seen.
See the collapsible box labeled “Box: Twin-study evidence of epistasis in adult personality, mental health, and behavior” in §4.4.2 for many apparent examples of precisely this. Do you disagree with that? Is there more evidence I’m missing?
Remember, I’m claiming that non-additive genetics are important in adult personality, mental health, and behavioral things like divorce, but that they’re NOT very important in height or blood pressure or (I think) IQ or EA.
You make a mistake in your terminology.
This is very possible!! It wouldn’t be the first time. I can still make changes. I found the use of terminology in the literature confusing … and I find your comment confusing too. :(
My background is physics not genetics, and thus I’m using the word “nonlinearity” in the linear algebra sense. I.e., if we take a SNP array that measures N SNPs, we can put the set of all possible genomes (as measured by this array) into an N-dimensional abstract vector space, I think. Then there’s a map from this N-dimensional space to, let’s say, extroversion. Both what you call dominance, and what you call epistasis, would make this map “nonlinear” (in the linear algebra sense). See what I mean?
If it’s true that people in genetics use the term “nonlinearity” to refer specifically to nonlinearity-at-a-single-locus, then I would want to edit my post somehow! (Is it true? I don’t want to just take your word for it.) I don’t want people to be confused. However, nonlinearity-in-the-linear-algebra-sense is a very useful notion in this context. I will feel handicapped if I’m forbidden from referring to that concept. Maybe I’ll put in a footnote or something? Or switch from “nonlinearity” to “non-additivity”? (Does “non-additivity” subsume both dominance and epistasis?)
Update: I replaced the word “epistasis” with “non-additive genetic effects” in a bunch of places throughout the post. Hopefully that makes things clearer??
As discussed in my post, studies of heritability range between slightly-helpful and extremely-helpful if you’re interested in any of the following topics (among others):
guessing someone’s likely adult traits (disease risk, personality, etc.) based on their family history and childhood environment
assessing whether it’s plausible that some parenting or societal “intervention” (hugs and encouragement, getting divorced, imparting sage advice, parochial school, etc.) will systematically change what kind of adult the kid will grow into.
calculating, using, and understanding polygenic scores
trying to understand some outcome (schizophrenia, extroversion, intelligence, or whatever) by studying the genes that correlate with it
I think that those are enough to make heritability “relevant towards nurture/nature debates”, although I guess I don’t know exactly what you mean by that.
Update: also Heritability: Five Battles. :)
I claim that if you find someone who’s struggling to get out of bed, making groaning noises, and ask them the following question:
Hey, I have a question about your values. The thing you’re doing right now, staying in bed past your alarm, in order to be more comfortable at the expense of probably missing your train and having to walk to work in the cold rain … is this thing you’re doing in accordance with your values?
I bet the person says “no”. Yet, they’re still in fact doing that thing, which implies (tautologically) that they have some desire to do it—I mean, they’re not doing it “by accident”! So it’s conflicting desires, not conflicting values.
I don’t think your wooden log example is relevant. Insofar as different values are conflicting, that conflict has already long ago been resolved, and the resolution is: the action which best accords with the person’s values, in this instance, is to get up. And yet, they’re still horizontal.
Another example: if someone says “I want to act in accordance with my values” or “I don’t always act in accordance with my values”, we recognize these as two substantive claims. The first is not a tautology, and the second is not a self-contradiction.
OK I’m more confused by your model than I thought.
There should be some part of your framework that’s hooked up to actual decision-making—some ingredient for which “I do things iff this ingredient has a high score” is tautological (cf. my “1.5.3 “We do things exactly when they’re positive-valence” should feel almost tautological”) . IIUC that’s “value function” in your framework. (Right?)
If your proposal is that some latent variable in the world-model gets a flag meaning “this latent variable is the Value Function”, thus hooking that latent variable up to decision-making in a mechanical, tautological way, then how does that flag wind up attached to that latent variable, rather than to some other latent variable? What if the world-model lacks any latent variable that looks like what the value function is supposed to look like?
~~
My proposal (and I think LeCun’s and certainly AlphaZero’s) is instead: the true “value function” is not part of the world model. Not mathematically, not neuroanatomically—IMO the world model is in the cortex, the value function is in the striatum. (The reward function is not part of the world model either, but I guess you already agree about that.)
…However, the world-model might wind up incorporating a model of the value function and reward function, just like the world-model might wind up incorporating a model of any other salient aspect of my world and myself. It won’t necessarily form such a model—the world-model inside simple fish brains probably doesn’t have a model of the value function, and ditto for sufficiently young human children. But for human adults, sure. If so, the representation of my value function in my world model is not my actual value function, just as the representation of my arm in my world model is not my actual arm.
If you think that my proposal here is inconsistent with the fact that I don’t want to do heroin right now, then I disagree, and I’m happy to explain why.
under a value-aligned sovereign AI, if my nose is itchy then it should get scratched, all else equal
Well, if the AI can make my nose not itch in the first place, I’m OK with that too. Whereas I wouldn’t make an analogous claim about things that I “value”, by my definition of “value”. If I really want to have children, I’m not OK with the AI removing my desire to have children, as a way to “solve” that “problem”. That’s more of a “value” and not just a desire.
That’s the sort of reasoning which should naturally show up in a Value RL style system capable of nontrivial model structure learning.
I’m not sure what point your making here. If human brains run on Value RL style systems (which I think I agree with), and humans in fact do that kind of reasoning, then tautologically, that kind of reasoning is a thing that can show up in Value RL style systems.
Still, there’s a problem that it’s possible for some course-of-action to seem appealing when I think about it one way, and unappealing when I think about it a different way. Ego-dystonic desires like addictions are one example of that, but it also comes up in tons of normal situations like deciding what to eat. It’s a problem in the sense that it’s unclear what a “value-aligned” AI is supposed to be doing in that situation.
I know of only one class of cognitive model compatible with all of them at once. Hutter and Everitt called it Value Reinforcement Learning, though the name does not make the model obvious.
My answer (which I’m guessing is equivalent to yours and Hutter & Everitt’s but just wanted to check) is:
“There’s a ground-truth reward function, and there’s a learned value function. There’s a learning algorithm that continually updates the latter in a way that tends to systematically make it a better approximation of the former. And meanwhile, the learned value function is used for model-based planning at inference time.”
In LeCun’s framework those two things are called “Intrinsic Cost” and “Critic” respectively. I think I’ve also heard them called “reward function” and “learned reward model” respectively. In my own writing I went through several iterations but am now calling them “actual valence” and “valence guess” respectively.
“Is scratching your nose right now something you desire?” Yes. “Is scratching your nose right now something you value?” Not really, no. But I claim that the Value Reinforcement Learning framework would assign a positive score to the idea of scratching my nose when it’s itchy. Otherwise, nobody would scratch their nose.
I desire peace and justice, but I also value peace and justice, so that’s not a good way to distinguish them.
(I suspect that you took my definition of “values” to be nearly synonymous with rewards or immediately anticipated rewards, which it very much is not; projected-upstream-generators-of-rewards are a quite different beast from rewards themselves, especially as we push further upstream.)
No, that’s not what I think. I think your definition points to whether things are motivating versus demotivating all-things-considered, including both immediate plans and long-term plans. And I want to call that desires. Desires can be long-term—e.g. “being a dad someday is something I very much desire”.
I think “values”, as people use the term in everyday life, tends to be something more specific, where not only is the thing motivating, but it’s also motivating when you think about it in a self-reflective way. A.k.a. “X is motivating” AND “the-idea-of-myself-doing-X is motivating”. If I’m struggling to get out of bed, because I’m going to be late for work, then the feeling of my head remaining on the pillow is motivating, but the self-reflective idea of myself being in bed is demotivating. Consequently, I might describe the soft feeling of the pillow on my head as something I desire, but not something I value.
(I talk about this in §8.4.2–8.5 here but that might be pretty hard to follow out of context.)
I should note here that lots of people claim that, when they talk about human values, they mean <other definition>. But in approximately 100% of cases, one finds that the definition given is not a very good match even to their own usage of the term, even allowing for some looseness and the occasional outright mistake in casual use. More generally, this problem applies basically whenever anyone tries to define any natural language term; that’s why I usually recommend using examples instead of definitions whenever possible.
I think that your definition is a bad match too.
Specifically: Go ask someone on the street to give ten everyday examples where their “desires” come apart from their “values”. I claim that you’ll find that your proposed definition in this post (Value Reinforcement Learning etc.) is reliably pointing to their “desires”, not their “values”, in all those cases where the two diverge.
That’s very helpful, thanks!
OK gotcha. But I can just rephrase slightly, let me try again:
(B’’) “…Gee, I guess this reprocessing must have been a kind of ‘training / practice / exercise’ during which I could forge new better subconscious habits and associations related to ‘type-of-situation X’ (which used to invoke anxiety). And these new subconscious habits and associations are now serving me well when I encounter type-of-situation X (or anything that vaguely rings of it) an adult context too.”
After all, you can’t form new subconscious habits and associations related to “type-of-situation X” except by making “type-of-situation X” thoughts active somehow during that process. It seems plausible to me that invoking a childhood memory where type-of-situation X triggered unhealthy anxiety would be very effective way to do that.
~~
I think what I’m suggesting is not that different from what you’re suggesting. Maybe the difference is when you wrote “…some specific childhood experiences that had to do with spiders, that seemed to be at the root of the phobia…”.
My mental image is, like, there’s some neuron in the amygdala, and one day in childhood it forms Synapse S connecting some input related to the idea of spiders with some output related to fear reactions. Then the goal for the adult therapy session is to delete Synapse S (or form different connections that counteract its effects, or whatever). Basically, my proposal is:
One day in childhood → Synapse S forms
Adult sees spider → Synapse S → fear reactions
I’m contrasting that with:
[What I don’t believe, but it sounds like maybe you do?] Adult sees spider → childhood memory reactivates, at least a little bit → fear reactions
In other words, I want to say that the childhood experience is “at the root of the phobia” as a matter of the historical record of how Synapse S came to be there, but it’s not “at the root of the phobia” in the sense of the episodic memory itself playing a critical causal role in the real-time anxiety reaction.
…And I’m saying that my hypothesis would nevertheless be compatible with childhood-memory-based therapies being effective, because invoking the actual episodic childhood memory itself, in a therapeutic context, is one possible path to delete or inactivate Synapse S.
Well, hmm, on second thought, I guess both stories are possible, maybe they coexist.
Thanks! I basically agree.
I think that, if we assume that there’s a world in which (1) at least some humans own some capital in the post-AGI economy (hence rapidly exponentially growing wealth), (2) nobody is worried about expropriation or violence, and (3) humans have the knowledge and power to pass and enforce effective laws holding up human interests in regards to externalities (e.g. AGIs creating new exotic forms of lethal pollution while following the letter of the existing law, or building a Dyson swarm that blocks out the sun)…
…then that’s already pretty great! That would be far better than my baseline expectation.
I think that, if we assume (1-3), then the non-capital-owning humans have a great chance of doing OK too, via (A) charity from the fabulously-wealthy capital-owning humans, or through (B) political imposition of UBI (assuming democracy), or, like you said, (C) getting employed by the fabulously-wealthy capital-owning humans who specifically want to employ other humans (or selling ownerships rights to their posthumous skulls, ofc :) ).
Thanks! Yeah, I think I would have said something pretty similar to that.
Actually, I might have gone a bit further and said:
Maybe, people have the experience
(A) “First, I reprocessed the childhood scare experience. Second, I found some that my adult anxiety was generally relieved to some extent.”
…and they naturally conclude
(B) “…Therefore, the childhood scare experience must have been (partly) causing the adult anxiety all along.”
…but I wonder if we could also entertain an alternate theory:
(B’) “…Gee, I guess this reprocessing must have been a kind of ‘training / practice / exercise’ during which I could forge new better subconscious habits and associations related to ‘the feeling of anxiety’ in general. And these new subconscious habits and associations are now serving me well in a wide variety of adult contexts.”
After all, you can’t form new subconscious habits and associations related to “the feeling of anxiety” except by invoking “the feeling of anxiety” somehow in the process. It seems plausible to me that childhood memories would be very effective way to do that. After all, (1) I think emotions are generally very strong in childhood and teenage years, and (2) maybe there’s some sense in which long-ago memories are objectively “safer” since the situation is long over, and thus it’s easier to entertain the idea that the feeling is not serving any real purpose.
Also, AFAICT, people achieve great therapeutic success by methods that involve bringing up childhood memories, but other people also achieve great therapeutic success by methods that don’t. :)
I’m not an expert like you are—indeed I have no personal experience whatsoever—so you can tell me if that doesn’t ring true. :)
There’s no source that I especially like and endorse, that’s why I wrote this post :)
Heritability: Five Battles
Individual humans can make pretty cool mechanical hands — see here. That strongly suggests that dexterous robot hands can make dexterous robot hands, enabling exponential growth even without spinning up new heavy machinery and production lines, I figure.
In the teleoperated robots category (which is what we should be talking about if we’re assuming away algorithm challenges!), Ugo might or might not be vaporware but they mention a price point below $10/day. There’s also the much more hardcore Sarcos Guardian XT (possibly discontinued??). Pricing is not very transparent, but I found a site that said you lease it for $5K/month, which isn’t bad considering how low the volumes are.
I think you’re arguing that Principle (A) has nothing to teach us about AGI, and shouldn’t even be brought up in an AGI context except to be immediately refuted. And I think you’re wrong.
Principle (A) applied to AGIs says: The universe won’t run out of productive things for AGIs to do. In this respect, AGIs are different from, say, hammers. If a trillion hammers magically appeared in my town, then we would just have to dispose of them somehow. That’s way more hammers than anyone wants. There’s nothing to be done with them. Their market value would asymptote to zero.
AGIs will not be like that. It’s a big world. No matter how many AGIs there are, they can keep finding and inventing new opportunities. If they outgrow the planet, they can start in on Dyson spheres. The idea that AGIs will simply run out of things to do after a short time and then stop self-reproducing—the way I would turn off a hammer machine after the first trillion hammers even if its operating costs were zero—is wrong.
So yes, I think this is a valid lesson that we can take from Principle (A) and apply to AGIs, in order to extract an important insight. This is an insight that not everyone gets, not even (actually, especially not) most professional economists, because most professional economists are trained to lump in AGIs with hammers, in the category of “capital”, which implicitly entails “things that the world needs only a certain amount of, with diminishing returns”.
So, kudos to Principle (A). Do you agree?
We can imagine a hypothetical world where a witch cast a magical spell that destroyed 99.9999999% of existing chips, and made it such that it’s only possible to create one new computer chip per day. And the algorithms are completely optimized—as good as they could possibly be. In that case, the price of compute would get bid up to the maximum economic value that it can produce anywhere in the world, which would be quite high.
The company would not have an opportunity cost, because using AI would not be a cheap option.
See what I mean? You’re assuming that the price of AI will wind up low, instead of arguing for it. As it happens, I do think the price of AI will wind up low!! But if you want to convince someone who believes in Principle (A), you need to engage with the idea of this race between the demand curve speeding to the right versus the supply curve speeding to the right. It doesn’t just go without saying.
I’m still pretty sure that you think I believe things that I don’t believe. I’m trying to narrow down what it is and how you got that impression. I just made a number of changes to the wording, but it’s possible that I’m still missing the mark.
My position is that if you accept certain arguments made about really smart AIs in “The Sun is Big”, Principle A, by itself, ceases to make sense in this context.
When I stated Principle (A) at the top of the post, I was stating it as a traditional principle of economics. I wrote: “Traditional economics thinking has two strong principles, each based on abundant historical data”, and put in a link to a wikipedia article with more details. You see what I mean? I wasn’t endorsing it as always and forever true. Quite the contrary: The punchline of the whole article is: “here are three traditional economic principles, but at least one will need to be discarded post-AGI.”
“AI will [roughly] amount to X”, for any X, including “high-skilled entrepreneurial human labor” is a positive claim, not a default background assumption of discourse, and in my reckoning, that particular one is unjustified.
I did some rewriting of this part, any chance that helps?
I like “different machines that produce different states”. I would bring up an example where we replace the coin by a pseudorandom number generator with seed 93762. If the recipient of the photons happens to know that the seed is 93762, then she can put every photon into state |0> with no losses. If the recipient of the photons does not know that the random seed is 93762, then she has to treat the photons as unpolarized light, which cannot be polarized without 50% loss.
So for this machine, there’s no getting away from saying things like: “There’s a fact of the matter about what the state of each output photon is. And for any particular experiment, that fact-of-the-matter might or might not be known and acted upon. And if it isn’t known and acted upon, then we should start talking about probabilistic ensembles, and we may well want to use density matrices to make those calculations easier.”
I think it’s weird and unhelpful to say that the nature of the machine itself is dependent on who is measuring its output photons much later on, and how, right?