Jeremy Gillen comments on Richard Ngo’s Shortform

Jeremy Gillen 21 Jan 2026 1:10 UTC
6 points
2
We start off with some early values, and then develop instrumental strategies for achieving them. Those instrumental strategies become crystallized and then give rise to other instrumental strategies for achieving them, and so on.
This seems true of me for some cases, mostly during childhood. Maybe it was a hack that evolution used to get from near-sensory value specifications to more abstract values. But if I (maybe unfairly) take this as a full model of human values and entirely remove the terminal-instrumental distinction, then it seems to make a bunch of false predictions. E.g. there are lots of jobs that people don’t grow to love doing. E.g. There are lots of things that people love doing after only trying them once (where they tried it for no particular instrumental reason).
there’s a lot of room for positive-sum trade between goals
Once each goal exists there’s room for positive sum trade, but creating new goals is always purely negative for every other currently existing goal, right? My vague memory is that your response is that constructing new instrumental goals is somehow necessary for computational tractability, but I don’t get why that would be true.
- Richard_Ngo 21 Jan 2026 1:39 UTC
  5 points
  2
  Parent
  creating new goals is always purely negative for every other currently existing goal, right
  No more than hiring new employees is purely negative for existing employees at a company.
  The premise I’m working with here is that you can’t create goals without making them “terminal” in some sense (just as you can’t hire employees without giving them some influence over company culture).
  - Jeremy Gillen 21 Jan 2026 2:17 UTC
    4 points
    0
    Parent
    You did a good job of communicating your positive feelings about this kind of value system, I understand slightly better why you like it.
    I can see how it can be worth the trade-off to make a new goal if that’s the only way to get the work done. But it’s negative if the work can be done directly.
    And we know many small-ish cases where we can directly compute a policy from a goal. So what makes it impossible to make larger plans without adding new goals? And why does adding new goals shift it from impossible to possible?