If the simulation approach is to be effective, it probably has to have pretty high fidelity, in which case sim behaviours are likely to be pretty representative of the real world behaviour
Yes. I expect that, before smart AI does competent harmful actions (as opposed to flailing randomly, which can also do some damage), then there will exist, somewhere within the AI, a pretty detailed simulation of what is going to happen.
Reasons humans might not read the simulation and shut it down.
A previous competent harmful action intended to prevent this.
The sheer number of possibilities the AI considers actions.
Default difficulty of a human understanding the simulation.
Lets consider an optimistic case. You have found a magic computer and have programmed the laws of quantum field theory. You have added various features so you can put a virtual camera and microphone at any point in the simulation. Lets say you have a full VR setup. There is still a lot of room for all sorts of subtle indirect bad effects to slip under the radar. Because the world is a big place and you can’t watch all of it.
Also, you risk any prediction of a future infohazard becoming a current day infohazard.
In the other extreme, it’s a total black box. Some utterly inscrutable computation, perhaps learned from training data. Well in the worst case, the whole AI, from data in to action out, is one big holomorphically encrypted black box.
Yes. I expect that, before smart AI does competent harmful actions (as opposed to flailing randomly, which can also do some damage), then there will exist, somewhere within the AI, a pretty detailed simulation of what is going to happen.
Reasons humans might not read the simulation and shut it down.
A previous competent harmful action intended to prevent this.
The sheer number of possibilities the AI considers actions.
Default difficulty of a human understanding the simulation.
Lets consider an optimistic case. You have found a magic computer and have programmed the laws of quantum field theory. You have added various features so you can put a virtual camera and microphone at any point in the simulation. Lets say you have a full VR setup. There is still a lot of room for all sorts of subtle indirect bad effects to slip under the radar. Because the world is a big place and you can’t watch all of it.
Also, you risk any prediction of a future infohazard becoming a current day infohazard.
In the other extreme, it’s a total black box. Some utterly inscrutable computation, perhaps learned from training data. Well in the worst case, the whole AI, from data in to action out, is one big holomorphically encrypted black box.