Richard_Ngo comments on Richard Ngo’s Shortform

Richard_Ngo 20 Jan 2026 22:48 UTC
LW: 116 AF: 40
13
AF
An analogy that points at one way I think the instrumental/terminal goal distinction is confused:
Imagine trying to classify genes as either instrumentally or terminally valuable from the perspective of evolution. Instrumental genes encode traits that help an organism reproduce. Terminal genes, by contrast, are the “payload” which is being passed down the generations for their own sake.
This model might seem silly, but it actually makes a bunch of useful predictions. Pick some set of genes which are so crucial for survival that they’re seldom if ever modified (e.g. the genes for chlorophyll in plants, or genes for ATP production in animals). Treating those genes as “terminal” lets you “predict” that other genes will gradually evolve in whichever ways help most to pass those terminal genes on, which is what we in fact see.
But of course there’s no such thing as “terminal genes”. What’s actually going on is that some genes evolved first, meaning that a bunch of downstream genes ended up selected for compatibility with them. In principle evolution would be fine with the terminal genes being replaced, it’s just that it’s computationally difficult to find a way to do so without breaking downstream dependencies.
I think this is a good analogy for how human values work. We start off with some early values, and then develop instrumental strategies for achieving them. Those instrumental strategies become crystallized and then give rise to other instrumental strategies for achieving them, and so on. Understood this way, we can describe an organism’s goals/strategies purely in terms of which goals “have power over” which other goals, which goals are most easily replaced, etc, without needing to appeal to some kind of essential “terminalism” that some goals have and others don’t. (Indeed, the main reason you’d need that concept is to describe someone who has modified their goals towards having a sharper instrumental/terminal distinction—i.e. it’s a self-fulfilling prophecy.)
That’s the descriptive view. But “from the inside” we still want to know which goals we should pursue, and how to resolve disagreements between our goals. How to figure that out without labeling some goals as terminal and others as instrumental? I don’t yet have a formal answer, but my current informal answer is that there’s a lot of room for positive-sum trade between goals, and so you should set up a system which maximizes the ability of those goals to cooperate with each other, especially by developing new “compromise” goals that capture the most important parts of each.
This leads to a pretty different view of the world from the Bostromian one. It often feels like the Bostrom paradigm implicitly divides the future into two phases. There’s the instrumental phase, during which your decisions are dominated by trying to improve your long-term ability to achieve your goals. And there’s the terminal phase, during which you “cash out” your resources into whatever you value. This isn’t a *necessary* implication of the instrumental/terminal distinction, but I expect it’s an emergent consequence in a range of environments of taking the instrumental/terminal distinction seriously. E.g. in our universe it sure seems like any scale-sensitive value system should optimize purely for number of galaxies owned for a long time before trying to turn those galaxies into paperclips/hedonium/etc.
From the alternative perspective I’ve outlined above, though, the process of instrumentally growing and gaining resources is also simultaneously the process of constructing values. In other words, we start off with underspecified values as children, but then over time choose to develop them in ways which are instrumentally useful. This process leads to the emergence of new, rich, nuanced goals which satisfy our original goals while also going far beyond them, just as the development of complex multicellular organisms helps to propagate the original bacterial genes for chlorophyll or ATP—not by “maximizing” for those “terminal” genes, but by building larger creatures much more strange and wonderful.
What links here?
- AI #152: Brought To You By The Torment Nexus by Zvi (22 Jan 2026 14:40 UTC; 34 points)
- Daniel Kokotajlo 21 Jan 2026 18:03 UTC
  LW: 34 AF: 20
  8
  AF Parent
  Thinking step by step:
  I like the point that fundamentally the structure is tree-like, and insofar as terminal goals are a thing it’s basically just that they are the leaf nodes on the tree instead of branches or roots. Note that this doesn’t mean terminal goals aren’t a thing; the distinction is real and potentially important.
  I think an improvement on the analogy would be compare to a human organization rather than to a tree. In a human organization (such as a company or a bureaucracy) at first there is one or a small group of people, and then they hire more people to help them with stuff (and retain the option to fire them if they stop helping) and then those people hire people etc. and eventually you have six layers of middle management. Perhaps goals are like this. Evolution and/or reinforcement learning gives us some goals, and then those goals create subgoals to help them, and then those subgoals create subgoals, etc. In general when it starts to seem that a subgoal isn’t going to help with the goal it’s supposed to help with, it gets ‘fired.’ However, sometimes subgoals are ‘sticky’ and become terminal-ish goals, analogous to how it’s sometimes hard to fire people & how the leaders of an organization can find themselves pushed around by the desires and culture of the mass of employees underneath them. Also, sometimes the original goals evolution or RL gave us might wither away (analogous to retiring) or be violently ousted (analogous to what happened to OpenAI’s board) in some sort of explicit conscious conflict with other factions. (example: Someone is in a situation where they are incentivized to lie to succeed at their job, does some moral reasoning and then decides that honesty is merely a means to an end, and then starts strategically deceiving people for the greater good, quashing the qualms of their conscience which had evolved a top-level goal of being honest back in childhood.)
  w.r.t. Bostrom: I’m not convinced yet. I agree with “the process of instrumentally growing and gaining resources is also the process of constructing values.” But you haven’t argued that this is good or desireable. Analogy: “The process of scaling your organization from a small research nonprofit to a megacorp with an army of superintelligences is also the process of constructing values. This process leads to the emergence of new, rich, nuanced goals (like profit, and PR, and outcompeting rivals, and getting tentacles into governments, and paying huge bonuses to the greedy tech executives and corporate lawyers we’ve recruited) which satisfy our original goals while also going far beyond them...” Perhaps this is the way things typically are, but if hypothetically things didn’t have to be that way—if the original goals/leaders could press a button that would costlessly ensure that this value drift didn’t happen, and that the accumulated resources would later be efficiently spent purely on the original goals of the org and not on all these other goals that were originally adopted as instrumentally useful… shouldn’t they? If so, then the Bostromian picture is how things will be in the limit of increased rationality/intelligence/etc., but not necessarily how things are now.
- TsviBT 21 Jan 2026 2:55 UTC
  LW: 34 AF: 18
  2
  AF Parent
  
  But of course there’s no such thing as “terminal genes”. What’s actually going on is that some genes evolved first, meaning that a bunch of downstream genes ended up selected for compatibility with them. In principle evolution would be fine with the terminal genes being replaced, it’s just that it’s computationally difficult to find a way to do so without breaking downstream dependencies.
  
  I think your analysis is incorrect. The book is called “The Selfish Gene”. No basic unit of evolution is perfect, but probably the best available is the gene—which is to say, genomic locus (defined relative to surrounding context). An organism is a temporary coalition of its genes. Generally there’s quite strong instrumental alignment between all the genes in an organism, but it’s not always perfect, and you do get gene drives in nature. If a gene could favor itself at the expense of the other genes in that organism (in terms of overall population frequency), it totally would.
  
  I think this is a good analogy for how human values work. We start off with some early values, and then develop instrumental strategies for achieving them. Those instrumental strategies become crystallized and then give rise to other instrumental strategies for achieving them, and so on. Understood this way, we can describe an organism’s goals/strategies purely in terms of which goals “have power over” which other goals, which goals are most easily replaced, etc, without needing to appeal to some kind of essential “terminalism” that some goals have and others don’t.
  
  This describes some but not all of how our values work. There are free parameters in what you do with the universe; what sets those free parameters is of basic interest. (Cf. https://www.lesswrong.com/posts/NqsNYsyoA2YSbb3py/fundamental-question-what-determines-a-mind-s-effects , though that statement is quite flawed as well.)
  - robo 21 Jan 2026 16:48 UTC
    1 point
    −1
    Parent
    I think your analysis is incorrect
    
    Well, it’s kinda true, right? Ontogeny recapitulates phylogeny (developing embryos look like worms, then fish, amphibians—they mirror the path of evolution). That’s because it’s easier for evolution to add steps at the end than to changes steps in the beginning. It happens with computers too—modern Intel and AMD chips still startup in 16 bit real mode.
    
    In the coalition of genes that make it into a gamete, newer genes support the old genes, but not vice versa. The genes that control apoptosis (p53 etc.) are obligate mutualists—apoptosis genes support older particular genes, but older genes don’t support apoptosis genes in particular.
    - TsviBT 21 Jan 2026 18:54 UTC
      4 points
      0
      Parent
      Right, the part about layers of more and less conserved genes is true AFAIK. (I think actually ontogeny doesn’t recapitulate phylogeny linearly, but rather there’s a kinda of hourglass structure where some mid-development checkpoints are most conserved—but I’m not remembering where I saw this—possibly in a book or paper by Rupert Riedl or Günter Wagner.)
      
      What I’m objecting to, is viewing that as a growth of a values structure for the values of [the evolution of a species, as an agent]. That’s because that entity doesn’t really value genes at all; it doesn’t care about the payload of genes. Individual genes selfishly care about themselves as a payload, being payloaded into the gene pool of the species; each variant wants its frequency to go up. The species-evolution doesn’t care about that. I think the species-evolution is a less coherent way of imputing agency to evolution compared to selfish genes, though still interesting. But if impute values to a species-evolution, I’m not sure what you’d get, and I think it would be something like “performs well in this ecological niche”—though there would be edge cases that are harder to describe, such as long-term trends due to sexual selection or due for example to any sort of frequency-dependent effects of genes.
    - cdt 21 Jan 2026 18:17 UTC
      2 points
      0
      Parent
      You may mean phylogenetic inertia.
- Eli Tyre 21 Jan 2026 2:09 UTC
  14 points
  12
  Parent
  This is great, and on an important topic that’s right in the center of our collective ontology and where I’ve been feeling for a while that our concepts are inadequate.
  
  Top level post! Top level post!
- Linch 21 Jan 2026 20:36 UTC
  6 points
  2
  Parent
  I think this sort of assumes that terminal-ish goals are developed earlier and thus more stable and instrumental-ish goals are developed later and more subject to change.
  I think this may or may not be true on the individual level but it’s probably false on the ecological level.
  Competitive pressures shape many instrumental-ish goals to be convergent whereas terminal-ish goals have more free parameters.
- Jeremy Gillen 21 Jan 2026 1:10 UTC
  6 points
  2
  Parent
  We start off with some early values, and then develop instrumental strategies for achieving them. Those instrumental strategies become crystallized and then give rise to other instrumental strategies for achieving them, and so on.
  This seems true of me for some cases, mostly during childhood. Maybe it was a hack that evolution used to get from near-sensory value specifications to more abstract values. But if I (maybe unfairly) take this as a full model of human values and entirely remove the terminal-instrumental distinction, then it seems to make a bunch of false predictions. E.g. there are lots of jobs that people don’t grow to love doing. E.g. There are lots of things that people love doing after only trying them once (where they tried it for no particular instrumental reason).
  there’s a lot of room for positive-sum trade between goals
  Once each goal exists there’s room for positive sum trade, but creating new goals is always purely negative for every other currently existing goal, right? My vague memory is that your response is that constructing new instrumental goals is somehow necessary for computational tractability, but I don’t get why that would be true.
  - Richard_Ngo 21 Jan 2026 1:39 UTC
    5 points
    2
    Parent
    creating new goals is always purely negative for every other currently existing goal, right
    No more than hiring new employees is purely negative for existing employees at a company.
    The premise I’m working with here is that you can’t create goals without making them “terminal” in some sense (just as you can’t hire employees without giving them some influence over company culture).
    - Jeremy Gillen 21 Jan 2026 2:17 UTC
      4 points
      0
      Parent
      You did a good job of communicating your positive feelings about this kind of value system, I understand slightly better why you like it.
      I can see how it can be worth the trade-off to make a new goal if that’s the only way to get the work done. But it’s negative if the work can be done directly.
      And we know many small-ish cases where we can directly compute a policy from a goal. So what makes it impossible to make larger plans without adding new goals? And why does adding new goals shift it from impossible to possible?
- DanielFilan 22 Jan 2026 22:10 UTC
  LW: 4 AF: 3
  0
  AF Parent
  
  We start off with some early values, and then develop instrumental strategies for achieving them. Those instrumental strategies become crystallized and then give rise to other instrumental strategies for achieving them, and so on. Understood this way, we can describe an organism’s goals/strategies purely in terms of which goals “have power over” which other goals, which goals are most easily replaced, etc, without needing to appeal to some kind of essential “terminalism” that some goals have and others don’t.
  
  I think it would help me if you explained what you think it would mean for there to be an instrumental/terminal distinction, since to my eyes you’ve just spelled out the instrumental/terminal split.
- Wei Dai 22 Jan 2026 2:14 UTC
  LW: 4 AF: 4
  2
  AF Parent
  How does the Shareholder Value Revolution fit into your picture? From an AI overview:
  1. The Intellectual Origins (The 1970s)
  
  The revolution was born out of economic stagnation in the 1970s. As U.S. corporate profits dipped and competition from Japan and Germany rose, economists and theorists argued that American managers had become “fat and happy,” running companies for their own comfort rather than efficiency.
  
  Two key intellectual pillars drove the change:
  
  Milton Friedman (The Moral Argument): In a famous 1970 New York Times essay, Friedman argued, “The social responsibility of business is to increase its profits.” He posited that executives spending money on “social causes” (like keeping inefficient plants open to save jobs) were essentially stealing from the owners (shareholders).
  Jensen and Meckling (The Economic Argument—“Agency Theory”): In 1976, these economists published a paper describing the “Principal-Agent Problem.” They argued that managers (agents) were not aligned with shareholders (principals). Managers wanted perks (corporate jets, large empires), while shareholders wanted profit. The solution? Align their interests by paying executives in stock.
  It seems to better fit my normative picture of human values: terminal values come from philosophy, and subservience of instrumental values to terminal values improves over time as we get better at it, without need to permanently raise instrumental values to terminal status or irreversibly commingle the two.
- aysja 21 Jan 2026 19:19 UTC
  4 points
  2
  Parent
  I agree that human values are more accretive like this, but I would also call those genes “terminal” in the same sense that I call some of my own goals “terminal.” E.g., I can usually ask myself why I’m taking a given action and my brain will give a reasonable answer: “because I want to finish this post,” “because I’m hungry,” whatever. And then I can keep double clicking on those: “I want to finish the post because I don’t think this crux has been spelled out very well yet” and I can keep going and going until at some point the answer is like “I don’t know, because it’s intrinsically beautiful?” and that’s around when I call the goal/preference “terminal.” Which is similar in structure to a story I imagine evolution might tell if it “asked itself” why some particular gene developed.
  Perhaps “terminal” is the wrong word for this, but having a handle for these high-level, upstream nodes in my motivational complex has been helpful. And they do hold a special status, at least for me, because many of the “instrumental” actions (or subgoals) could be switched out while preserving this more nebulous desire to “understand” or “find beauty” or what have you. That feels like an important distinction that I want to keep while also agreeing they aren’t always cleanly demarcated as such. E.g., writing has both instrumental and terminal qualities to me, which can make it a more confusing goal-structure to orient to, but also as you say: more strange and wonderful, too.
  - Mateusz Bagiński 21 Jan 2026 20:36 UTC
    2 points
    0
    Parent
    Yeah, one possible successor concept to the instrumental/terminal distinction might be something like “does this thing clearly draw its raison d’être from another thing, or is it its own source of raison d’être or some third thing which is like a nebulous, non-explicitized symbiosis”, where the raison d’être is itself something potentially revisable by reflection (or whatever mind-shaping process).
- cubefox 21 Jan 2026 19:16 UTC
  4 points
  0
  Parent
  
  without needing to appeal to some kind of essential “terminalism” that some goals have and others don’t.
  
  That appeal doesn’t seem overly problematic though, as some goals are clearly terminal. For example, eating chocolate (or rather: something that tastes like it). Or not dying. Those goals are given to us by evolution. Chocolate is a case where we actually have an instrumental reason not to eat it (too much sugar for modern environments), which counteracts the terminal goal in the opposite direction. Which means they are clearly different. Are there perhaps other edge cases where the instrumental/terminal distinction is harder to apply?
  
  (Indeed, the main reason you’d need that concept is to describe someone who has modified their goals towards having a sharper instrumental/terminal distinction—i.e. it’s a self-fulfilling prophecy.)
  
  I argue the main reason is different: First, we need to distinguish instrumental from terminal goals because instrumental goals are affected by beliefs. When those beliefs change, the instrumental goals change. For example, I may want to eat spinach because I believe it’s healthy. So that’s an instrumental goal. If my belief changed, I might abandon that goal. But if I liked spinach for its own sake (terminally), I wouldn’t need such a supporting belief. As in the case of chocolate.
  
  Second, beliefs can be true or false, or epistemically justified or unjustified, which means instrumental goals that are based on beliefs which are mistaken in this way are then also mistaken. That doesn’t happen for terminal goals. (Terminal goals can still be mutually incoherent if they violate certain axioms of utility theory, but that only means a set of goals is mistaken, not necessarily individual goals in that set.)
- Mateusz Bagiński 20 Jan 2026 22:57 UTC
  4 points
  1
  Parent
  From the alternative perspective I’ve outlined above, though, the process of instrumentally growing and gaining resources is also simultaneously the process of constructing values.
  Reminds me of Daniel Polani’s talk on sensor evolution. The gist is: basically, all viable ways to grow/evolve new sensors for getting the information that would help you achieve your goals get you sensors that deliver OOMs more information than you need, and then this information “has to go somewhere”,^[1] so you grow new goals. (Of course, rinse and repeat, in a loop.)
  1. ^
    I didn’t exactly understand why it had to “go somewhere” or maybe forgot his rationale for believing that it has to go somewhere.
- J Bostock 23 Jan 2026 18:47 UTC
  2 points
  0
  Parent
  I think humans do value-inference in both directions. Our “terminal” values are in part grown out of which high-level things correlate with low-level things. An example is John Wentworth—who seems to lack the circuits for feeling (companionate) love—saying he thinks relationships and kids are lame compared to his goals of saving the world, to the point where he would prefer not to be modified to be able to feel love, and says he would view a drug which enabled him to feel love similarly to a syringe of heroin. Clearly, his brain has built up terminal values V(saving world) > V(wife and kids) out of his lower-level instincts.
  Sexuality seems to be another case of this; it is interesting just how variable it can be. Just take a look at the amount of variance in how attracted monosexual people are trans people of the gender (and conversely not the assigned-at-birth sex) that they’re attracted to. Some people’s values infer the “right” gender cues and ignore the mechanical ones, some just don’t. (though I will admit I’m going off of zeitgeist here and have basically no experience in this domain)
  My best guess is that the brain is re-using its epistemic inference circuits (which are good at taking in information, gestalting it, and penalizing by complexity) and running a kind of “What are my values?” inference, which seeks a relatively non-contradictory, relatively simple value system, then doing it’s own equivalent of backpropagation to smooth all the conflicting lower-level drives towards that, similarly to how it can smooth over conflicting predictive circuits to make its own models more consistent, just by thinking (i.e. without external input).
- Karl Krueger 22 Jan 2026 20:32 UTC
  2 points
  0
  Parent
  We start off with some early values, and then develop instrumental strategies for achieving them. Those instrumental strategies become crystallized and then give rise to other instrumental strategies for achieving them, and so on.
  I want to call this something like “developmental axiology” but that sounds more like Kegan whereas I mean it more like evo-devo.
- tailcalled 22 Jan 2026 18:47 UTC
  2 points
  0
  Parent
  This treats sexual selection as determined by instrumental genes and selecting for instrumental genes, but I feel like it makes more sense to say that sexual selection selects for terminal genes (or at least terminal phenotypes), since those are the ones organisms will spontaneously collaborate to promote.
- romeostevensit 21 Jan 2026 0:21 UTC
  LW: 2 AF: 1
  0
  AF Parent
  The bit about layering creating functional fixedness reminds me of organisms (especially humans, but more broadly evolution as a search process) as ‘homeostatic envelope extenders’ a la Nozick’s take on Quine.
- koanchuk 21 Jan 2026 1:20 UTC
  1 point
  0
  Parent
  Evolution doesn’t determine what’s right, it only shows you what’s left! After selection, that is. Indeed, there are conceivable environments where the chlorophyll gene would be rapidly outcompeted by something that is better at exploiting free energy gradients, e.g. environments where chlorophyll is for some reason less useful than it is today. Its ubiquity today is basically contextual and so doesn’t represent any kind of terminus. If we model evolution as an agent with values, then it seems to me that it has one coherent terminal goal, which, if we go by observation, is to accelerate the production of entropy, for which the selection of entities that are successively better at exploiting available energy gradients for their temporary survival and reproduction is instrumental. Strange conclusion, or strange premise?

Richard_Ngo comments on Richard Ngo’s Shortform

1. The Intellectual Origins (The 1970s)