Is it true that a more and more sophisticated world model is enough to “hit the core of intelligence”?
A sufficiently sophisticated world-model, such as one that includes human values, would include humans. Humans are general problem-solvers, so the AI would have an algorithm for general problem-solving represented inside it somewhere. And once it’s there, I expect sharp steep gradients towards co-opting that algorithm for the AI’s own tasks.
I agree that figuring out a way to stop this, to keep the AI from growing agency even as its world-model becomes superhumanly advanced, would be fruitful.
Humans have sophisticated world models that contain simulacra of other humans—specifically those we are most familiar with. I think you could make an interesting analogy perhaps to multiple personality disorder being an example of simulacra breaking out of the matrix and taking over the simulation, but it’s a bit of a stretch.
What are sharp gradients? Is that a well established phenomena in ML you could point me at?
The simulacra in a simulation do not automatically break out and takeover, that’s just a scary fantasy. Like so much of AI risk discourse—just because something is possible in principle does not make it plausible or likely in reality.
Uh, apologies, I meant steepest gradients. The SGD is a locally greedy optimization process that updates a ML model’s parameters in the direction of the highest local increase in performance, i. e. along the steepest gradients. I’m saying that once there’s a general-purpose problem-solving algorithm represented somewhere in the ML model, the SGD (or evolution, or whatever greedy selection algorithm we’re using) would by default attempt to loop it into the model’s own problem-solving, because that would increase its performance the most. Even if it was assembled accidentally, or as part of the world-model and not the ML model’s own policy.
A sufficiently sophisticated world-model, such as one that includes human values, would include humans. Humans are general problem-solvers, so the AI would have an algorithm for general problem-solving represented inside it somewhere. And once it’s there, I expect
sharpsteep gradients towards co-opting that algorithm for the AI’s own tasks.I agree that figuring out a way to stop this, to keep the AI from growing agency even as its world-model becomes superhumanly advanced, would be fruitful.
Humans have sophisticated world models that contain simulacra of other humans—specifically those we are most familiar with. I think you could make an interesting analogy perhaps to multiple personality disorder being an example of simulacra breaking out of the matrix and taking over the simulation, but it’s a bit of a stretch.
What are sharp gradients? Is that a well established phenomena in ML you could point me at?
The simulacra in a simulation do not automatically break out and takeover, that’s just a scary fantasy. Like so much of AI risk discourse—just because something is possible in principle does not make it plausible or likely in reality.
I’m not talking about a simulacrum breaking out.
Uh, apologies, I meant steepest gradients. The SGD is a locally greedy optimization process that updates a ML model’s parameters in the direction of the highest local increase in performance, i. e. along the steepest gradients. I’m saying that once there’s a general-purpose problem-solving algorithm represented somewhere in the ML model, the SGD (or evolution, or whatever greedy selection algorithm we’re using) would by default attempt to loop it into the model’s own problem-solving, because that would increase its performance the most. Even if it was assembled accidentally, or as part of the world-model and not the ML model’s own policy.