The first question is whether you have enough information to locate human behavior. The concept of optimization is fairly straightforward, and it could get a rough estimate of our intelligence by seeing humans trying to solve some puzzle. In other words, the amount of data needed to get an optimizer is small. The amount of data needed to totally describe every detail of human values is large. This means that a random hypothesis based on a small amount of data will be an optimizer with non-human goals.
For example, maybe the human trainers value having real authentic experiences, but never had a cause to express that preference during training. The imitation fills the universe with people in VR pods not knowing that their life is fake. The imitations do however have a preference for ( random alien preference) because the trainers never showed that they didn’t prefer that.
Lets suppose you gave it vast amounts of data, and have a hypothesis space of all possible turing machines. (weighted by size). One fairly simple turing machine that would predict the data is a quantum simulation of a world similar to our own.
(Less than a kilobyte on the laws of QM, and the rest of the data goes towards pointing at a branch of the quantum multiverse with humans similar to us in. The simulation would also need something pointing at the input cable of the simulated AI. This gives us a virtual copy of the universe, as a program that predicts the flow of electricity in a particular cable. This code will be optimized to be short, not to be human comprehensible. I would not expect to be able to easily extract a human mind from the model.
If you put an upper bound on run time, and it is easily large enough to accurately simulate a human mind, then I would expect a program that was attempting to reason abstractly about the surrounding world. In a large pile of data, there will be many seemingly unrelated surface facts that actually have deep connections. A superhuman mind that abstractly reasons about the outside world, could use evolutionary psychology to predict human behavior. Using the laws of physics and a rough idea of humanities tech level to predict info about our tech. Intelligent abstract reasoning about our surroundings is likely to win out over simple heuristics be having more predictive power per bit. If you give it enough compute to predict humans, it also has enough compute for this.
All the problems of mesa optimization can’t be ruled out. Alternately it could be abstractly reasoning about its input wire, and give us a fast approximation of the virtual universe program above.
Finally, the virtual humans might realize that they are virtual and panic about it.
The first question is whether you have enough information to locate human behavior. The concept of optimization is fairly straightforward, and it could get a rough estimate of our intelligence by seeing humans trying to solve some puzzle. In other words, the amount of data needed to get an optimizer is small. The amount of data needed to totally describe every detail of human values is large. This means that a random hypothesis based on a small amount of data will be an optimizer with non-human goals.
For example, maybe the human trainers value having real authentic experiences, but never had a cause to express that preference during training. The imitation fills the universe with people in VR pods not knowing that their life is fake. The imitations do however have a preference for ( random alien preference) because the trainers never showed that they didn’t prefer that.
Lets suppose you gave it vast amounts of data, and have a hypothesis space of all possible turing machines. (weighted by size). One fairly simple turing machine that would predict the data is a quantum simulation of a world similar to our own.
(Less than a kilobyte on the laws of QM, and the rest of the data goes towards pointing at a branch of the quantum multiverse with humans similar to us in. The simulation would also need something pointing at the input cable of the simulated AI. This gives us a virtual copy of the universe, as a program that predicts the flow of electricity in a particular cable. This code will be optimized to be short, not to be human comprehensible. I would not expect to be able to easily extract a human mind from the model.
If you put an upper bound on run time, and it is easily large enough to accurately simulate a human mind, then I would expect a program that was attempting to reason abstractly about the surrounding world. In a large pile of data, there will be many seemingly unrelated surface facts that actually have deep connections. A superhuman mind that abstractly reasons about the outside world, could use evolutionary psychology to predict human behavior. Using the laws of physics and a rough idea of humanities tech level to predict info about our tech. Intelligent abstract reasoning about our surroundings is likely to win out over simple heuristics be having more predictive power per bit. If you give it enough compute to predict humans, it also has enough compute for this.
All the problems of mesa optimization can’t be ruled out. Alternately it could be abstractly reasoning about its input wire, and give us a fast approximation of the virtual universe program above.
Finally, the virtual humans might realize that they are virtual and panic about it.