Be a little careful with this. It’s possible to make the AI do all sorts of strange things via unusual world models. Ie a paperclip maximizing AI can believe “everything you see is a simulation, but the simulators will make paperclips in the real world if you do X”
If your confident that the world model is true, I think this isn’t a problem.
I hesitate to say “confident”. But I think you’re not gonna have world models emerging LLMs that are wrapped in a “this is a simulation” layer.. probably?
Also maybe even if they did, the procedure I’m describing, if it worked at all, would naively make them care about some simulated thing for its own sake. Not care about the simulated thing for instrumental reasons so it could get some other thing in the real world.
Be a little careful with this. It’s possible to make the AI do all sorts of strange things via unusual world models. Ie a paperclip maximizing AI can believe “everything you see is a simulation, but the simulators will make paperclips in the real world if you do X”
If your confident that the world model is true, I think this isn’t a problem.
I hesitate to say “confident”. But I think you’re not gonna have world models emerging LLMs that are wrapped in a “this is a simulation” layer.. probably?
Also maybe even if they did, the procedure I’m describing, if it worked at all, would naively make them care about some simulated thing for its own sake. Not care about the simulated thing for instrumental reasons so it could get some other thing in the real world.