Replace “utility function” with “goal”, and “expected utility maximizer” with “goal-directed system”
Figure out an argument for why the AI systems we build will be goal-directed
Make peace with the fact that you don’t get to have formal answers that apply with the certainty of theorems, and you have to rely on intuitions
The Human Compatible answer is:
We use expected utility maximization because that’s what the standard model says: all of the AI systems we build today optimize a definite specification that is effectively assumed to be handed down from God, we are simply mimicking that in our expected utility maximization model.
What do you mean, “learned models”? Deep learning is going to hit a brick wall; we’re going to build AI systems out of legible algorithms that really are optimizing their specifications, like planning algorithms.
My current answer is basically the Value Learning sequence answer with a more fleshed out version of point 2 (though I would probably prefer to state it differently now).
----
Btw, the original point of utility functions is to compactly describe a given set of preferences. It was originally primarily descriptive; the problem occurs when you treat it as prescriptive. But for the descriptive purpose, utility functions are still great; the value add is that the sentence “my utility function is X” is expected to be much shorter than the sentence “my preferences are X”.
For AI risk in particular:
The Value Learning sequence answer is:
Replace “utility function” with “goal”, and “expected utility maximizer” with “goal-directed system”
Figure out an argument for why the AI systems we build will be goal-directed
Make peace with the fact that you don’t get to have formal answers that apply with the certainty of theorems, and you have to rely on intuitions
The Human Compatible answer is:
We use expected utility maximization because that’s what the standard model says: all of the AI systems we build today optimize a definite specification that is effectively assumed to be handed down from God, we are simply mimicking that in our expected utility maximization model.
What do you mean, “learned models”? Deep learning is going to hit a brick wall; we’re going to build AI systems out of legible algorithms that really are optimizing their specifications, like planning algorithms.
My current answer is basically the Value Learning sequence answer with a more fleshed out version of point 2 (though I would probably prefer to state it differently now).
----
Btw, the original point of utility functions is to compactly describe a given set of preferences. It was originally primarily descriptive; the problem occurs when you treat it as prescriptive. But for the descriptive purpose, utility functions are still great; the value add is that the sentence “my utility function is X” is expected to be much shorter than the sentence “my preferences are X”.