The argument in the tweet also goes through if the AI has 1000 goals as alien as maximizing granite spheres, which I would guess Rob thinks is more realistic.
As an aside: If one thinks 1000 goals is more realistic, then I think it’s better to start communicating using examples like that, instead of “single goal” examples. (I myself lazily default to “paperclips” to communicate AGI risk quickly to laypeople, so I am critiquing myself to some extent as well.)
Anyways, on your read, how is “maximize X-quantity” different from “max EU where utility is linearly increasing in granite spheres”?
There’s a trivial sense in which the agent is optimizing the world and you can rationalize a utility function from that, but I think an agent that, from our perspective, basically just maximizes granite spheres can look quite different from the simple picture of an agent that always picks the top action according to some (not necessarily explicit) granite-sphere valuation of the actions, in ways such that the argument still goes through.
The agent can have all the biases humans do.
The agent can violate VNM axioms in any other way that doesn’t ruin it, basically anything that has low frequency or importance.
The agent only tries to maximize granite spheres 1 out of every 5 seconds, and the other 4⁄5 is spent just trying not to be turned off.
The agent has arbitrary deontological restrictions, say against sending any command to its actuators whose hash starts with 123.
The agent has 5 goals it is jointly pursuing, but only one of them is consequentialist.
The agent will change its goal depending on which cosmic rays it sees, but is totally incorrigible to us.
The original wording of the tweet was “Suppose that the AI’s sole goal is to maximize the number of granite spheres in its future light cone.” This is a bit closer to my picture of EU maximization but some of the degrees of freedom still apply.
As an aside: If one thinks 1000 goals is more realistic, then I think it’s better to start communicating using examples like that, instead of “single goal” examples. (I myself lazily default to “paperclips” to communicate AGI risk quickly to laypeople, so I am critiquing myself to some extent as well.)
Anyways, on your read, how is “maximize X-quantity” different from “max EU where utility is linearly increasing in granite spheres”?
There’s a trivial sense in which the agent is optimizing the world and you can rationalize a utility function from that, but I think an agent that, from our perspective, basically just maximizes granite spheres can look quite different from the simple picture of an agent that always picks the top action according to some (not necessarily explicit) granite-sphere valuation of the actions, in ways such that the argument still goes through.
The agent can have all the biases humans do.
The agent can violate VNM axioms in any other way that doesn’t ruin it, basically anything that has low frequency or importance.
The agent only tries to maximize granite spheres 1 out of every 5 seconds, and the other 4⁄5 is spent just trying not to be turned off.
The agent has arbitrary deontological restrictions, say against sending any command to its actuators whose hash starts with 123.
The agent has 5 goals it is jointly pursuing, but only one of them is consequentialist.
The agent will change its goal depending on which cosmic rays it sees, but is totally incorrigible to us.
The original wording of the tweet was “Suppose that the AI’s sole goal is to maximize the number of granite spheres in its future light cone.” This is a bit closer to my picture of EU maximization but some of the degrees of freedom still apply.