The model popular here is that of ‘expected utility maximizer’, and the ‘utility function’ is defined on the real world.
I think this is a bit of a misperception stemming from the use of the “paperclip maximizer” example to illustrate things about instrumental reasoning. Certainly folk like Eliezer or Wei Dai or Stuart Armstrong or Paul Christiano have often talked about how a paperclip maximizer is much of the way to FAI (in having a world-model robust enough to use consequentialism). Note that people also like to use the AIXI framework as a model, and use it to talk about how AIXI is set up not as a paperclip maximizer but a wireheader (pornography and birth control rather than sex and offspring), with its utility function defined over sensory inputs rather than a model of the external world.
For another example, when talking about the idea of creating an AI with some external reward that can be administered by humans but not as easily hacked/wireheaded by the AI itself people use the example of an AI designed to seek factors of certain specified numbers, or a proof or disproof of the Riemann hypothesis according to some internal proof-checking mechanism, etc, recognizing the role of wireheading and the difficulty of specifying goals externally rather than using simple percepts and the like.
I think this is a bit of a misperception stemming from the use of the “paperclip maximizer” example to illustrate things about instrumental reasoning. Certainly folk like Eliezer or Wei Dai or Stuart Armstrong or Paul Christiano have often talked about how a paperclip maximizer is much of the way to FAI (in having a world-model robust enough to use consequentialism). Note that people also like to use the AIXI framework as a model, and use it to talk about how AIXI is set up not as a paperclip maximizer but a wireheader (pornography and birth control rather than sex and offspring), with its utility function defined over sensory inputs rather than a model of the external world.
For another example, when talking about the idea of creating an AI with some external reward that can be administered by humans but not as easily hacked/wireheaded by the AI itself people use the example of an AI designed to seek factors of certain specified numbers, or a proof or disproof of the Riemann hypothesis according to some internal proof-checking mechanism, etc, recognizing the role of wireheading and the difficulty of specifying goals externally rather than using simple percepts and the like.