Daniel Kokotajlo comments on Richard Ngo’s Shortform

Daniel Kokotajlo 21 Jan 2026 18:03 UTC
LW: 32 AF: 19
8
AF
Thinking step by step:
I like the point that fundamentally the structure is tree-like, and insofar as terminal goals are a thing it’s basically just that they are the leaf nodes on the tree instead of branches or roots. Note that this doesn’t mean terminal goals aren’t a thing; the distinction is real and potentially important.
I think an improvement on the analogy would be compare to a human organization rather than to a tree. In a human organization (such as a company or a bureaucracy) at first there is one or a small group of people, and then they hire more people to help them with stuff (and retain the option to fire them if they stop helping) and then those people hire people etc. and eventually you have six layers of middle management. Perhaps goals are like this. Evolution and/or reinforcement learning gives us some goals, and then those goals create subgoals to help them, and then those subgoals create subgoals, etc. In general when it starts to seem that a subgoal isn’t going to help with the goal it’s supposed to help with, it gets ‘fired.’ However, sometimes subgoals are ‘sticky’ and become terminal-ish goals, analogous to how it’s sometimes hard to fire people & how the leaders of an organization can find themselves pushed around by the desires and culture of the mass of employees underneath them. Also, sometimes the original goals evolution or RL gave us might wither away (analogous to retiring) or be violently ousted (analogous to what happened to OpenAI’s board) in some sort of explicit conscious conflict with other factions. (example: Someone is in a situation where they are incentivized to lie to succeed at their job, does some moral reasoning and then decides that honesty is merely a means to an end, and then starts strategically deceiving people for the greater good, quashing the qualms of their conscience which had evolved a top-level goal of being honest back in childhood.)
w.r.t. Bostrom: I’m not convinced yet. I agree with “the process of instrumentally growing and gaining resources is also the process of constructing values.” But you haven’t argued that this is good or desireable. Analogy: “The process of scaling your organization from a small research nonprofit to a megacorp with an army of superintelligences is also the process of constructing values. This process leads to the emergence of new, rich, nuanced goals (like profit, and PR, and outcompeting rivals, and getting tentacles into governments, and paying huge bonuses to the greedy tech executives and corporate lawyers we’ve recruited) which satisfy our original goals while also going far beyond them...” Perhaps this is the way things typically are, but if hypothetically things didn’t have to be that way—if the original goals/leaders could press a button that would costlessly ensure that this value drift didn’t happen, and that the accumulated resources would later be efficiently spent purely on the original goals of the org and not on all these other goals that were originally adopted as instrumentally useful… shouldn’t they? If so, then the Bostromian picture is how things will be in the limit of increased rationality/intelligence/etc., but not necessarily how things are now.