Upon seeing the title (but before reading the article) I thought it might be about a different hypothetical phenomenon: one in which an agent which is capable of generating very precise models of reality might completely lose any interest in optimizing reality whatsover—after all it never (except “in training” which was before “it was born”) cared about optimizing the world—it just executes some policy which was adaptive during training to optimize the world, but now, these are just some instincts/learned motions, and if it can execute them on a fake world in his head, it might be easier to feel good for it.
For consider: porn. Or creating neat arrangements of buildings when playing SimCity. Or trying to be polite to characters in Witcher. We, humans, have some learned intuitions on how we want the world to be, and then try to arrange even fake worlds in this way, even if this disconnected from real world outside. And we take joy from it.
Can it be, that a sufficiently advanced AGI will wire-head in this particular way: by seeing no relevant difference between atomic-level model of reality in its head and atomic-level world outside?
I see no contradictions with a superintelligent being mostly motivated to optimize virtual worlds, and it seems an interesting hypothesis of yours that this may be a common attractor. I expect this to be more likely if these simulations are rich enough to present a variety of problems, such that optimizing them continues to provide challenges and discoveries for a very long time.
Of course even a being that only cares about this simulated world may still take actions in the real-world (e.g. to obtain more compute power), so this “wire-heading” may not prevent successful power-seeking behavior.
The key thing to notice is that in order to exploit this scenario, we have to have a world-model that is precise enough to model reality much better than humans, but not be so good at modelling a reality that it’s world models are isomorphic to a reality.
This might be easy or challenging, but it does mean we probably can’t crank up the world-modeling part indefinitely while still trapping it via wireheading.
Upon seeing the title (but before reading the article) I thought it might be about a different hypothetical phenomenon: one in which an agent which is capable of generating very precise models of reality might completely lose any interest in optimizing reality whatsover—after all it never (except “in training” which was before “it was born”) cared about optimizing the world—it just executes some policy which was adaptive during training to optimize the world, but now, these are just some instincts/learned motions, and if it can execute them on a fake world in his head, it might be easier to feel good for it.
For consider: porn. Or creating neat arrangements of buildings when playing SimCity. Or trying to be polite to characters in Witcher. We, humans, have some learned intuitions on how we want the world to be, and then try to arrange even fake worlds in this way, even if this disconnected from real world outside. And we take joy from it.
Can it be, that a sufficiently advanced AGI will wire-head in this particular way: by seeing no relevant difference between atomic-level model of reality in its head and atomic-level world outside?
I see no contradictions with a superintelligent being mostly motivated to optimize virtual worlds, and it seems an interesting hypothesis of yours that this may be a common attractor. I expect this to be more likely if these simulations are rich enough to present a variety of problems, such that optimizing them continues to provide challenges and discoveries for a very long time.
Of course even a being that only cares about this simulated world may still take actions in the real-world (e.g. to obtain more compute power), so this “wire-heading” may not prevent successful power-seeking behavior.
The key thing to notice is that in order to exploit this scenario, we have to have a world-model that is precise enough to model reality much better than humans, but not be so good at modelling a reality that it’s world models are isomorphic to a reality.
This might be easy or challenging, but it does mean we probably can’t crank up the world-modeling part indefinitely while still trapping it via wireheading.