The way that I choose to evaluate my overall experience is generally through the perception of my own feelings. Therefore, I assume this simulated world will be evaluated in a similar way: I perceive the various occurrences within it and rate them according to my preferences. I assume the AI will receive this information and be able to update the simulated world accordingly. The main difference then, appears to be that the AI will not have access to my nervous system, if my avatar is being represented in this world and that is all the AI has access to, which would prevent it from wire-heading by simply manipulating my brain however it wants. Likewise it would not have access to its own internal hardware or be able to model it (since that would require knowledge of actual physics). It could in theory be able to interact with buttons and knobs in the simulated world that were connected to its hardware in the real world.
I think this is basically the correct approach and it actually is being considered by AI researchers (take Paul’s recent paper for example, human yes-or-no feedback on actions in a simulated environment). The main difficulty then becomes domain transfer, when the AI is “released” into the physical world—it now has access to both its own hardware and human “hardware”, and I don’t see how to predict its actions once it learns these additional facts. I don’t think we have much theory for what happens then, but the approach is probably very suitable for narrow AI and for training robots that will eventually take actions in the real world.
The way that I choose to evaluate my overall experience is generally through the perception of my own feelings. Therefore, I assume this simulated world will be evaluated in a similar way: I perceive the various occurrences within it and rate them according to my preferences. I assume the AI will receive this information and be able to update the simulated world accordingly. The main difference then, appears to be that the AI will not have access to my nervous system, if my avatar is being represented in this world and that is all the AI has access to, which would prevent it from wire-heading by simply manipulating my brain however it wants. Likewise it would not have access to its own internal hardware or be able to model it (since that would require knowledge of actual physics). It could in theory be able to interact with buttons and knobs in the simulated world that were connected to its hardware in the real world.
I think this is basically the correct approach and it actually is being considered by AI researchers (take Paul’s recent paper for example, human yes-or-no feedback on actions in a simulated environment). The main difficulty then becomes domain transfer, when the AI is “released” into the physical world—it now has access to both its own hardware and human “hardware”, and I don’t see how to predict its actions once it learns these additional facts. I don’t think we have much theory for what happens then, but the approach is probably very suitable for narrow AI and for training robots that will eventually take actions in the real world.
It does have access to your nervous system since your nervous system can be rewired via backdriving inputs from your perceptions.