There’s a trivial sense in which the agent is optimizing the world and you can rationalize a utility function from that, but I think an agent that, from our perspective, basically just maximizes granite spheres can look quite different from the simple picture of an agent that always picks the top action according to some (not necessarily explicit) granite-sphere valuation of the actions, in ways such that the argument still goes through.
The agent can have all the biases humans do.
The agent can violate VNM axioms in any other way that doesn’t ruin it, basically anything that has low frequency or importance.
The agent only tries to maximize granite spheres 1 out of every 5 seconds, and the other 4⁄5 is spent just trying not to be turned off.
The agent has arbitrary deontological restrictions, say against sending any command to its actuators whose hash starts with 123.
The agent has 5 goals it is jointly pursuing, but only one of them is consequentialist.
The agent will change its goal depending on which cosmic rays it sees, but is totally incorrigible to us.
The original wording of the tweet was “Suppose that the AI’s sole goal is to maximize the number of granite spheres in its future light cone.” This is a bit closer to my picture of EU maximization but some of the degrees of freedom still apply.
There’s a trivial sense in which the agent is optimizing the world and you can rationalize a utility function from that, but I think an agent that, from our perspective, basically just maximizes granite spheres can look quite different from the simple picture of an agent that always picks the top action according to some (not necessarily explicit) granite-sphere valuation of the actions, in ways such that the argument still goes through.
The agent can have all the biases humans do.
The agent can violate VNM axioms in any other way that doesn’t ruin it, basically anything that has low frequency or importance.
The agent only tries to maximize granite spheres 1 out of every 5 seconds, and the other 4⁄5 is spent just trying not to be turned off.
The agent has arbitrary deontological restrictions, say against sending any command to its actuators whose hash starts with 123.
The agent has 5 goals it is jointly pursuing, but only one of them is consequentialist.
The agent will change its goal depending on which cosmic rays it sees, but is totally incorrigible to us.
The original wording of the tweet was “Suppose that the AI’s sole goal is to maximize the number of granite spheres in its future light cone.” This is a bit closer to my picture of EU maximization but some of the degrees of freedom still apply.