The virtual AI within its virtual world

A pu­ta­tive new idea for AI con­trol; in­dex here.

In a pre­vi­ous post, I talked about an AI op­er­at­ing only on a vir­tual world (ideas like this used to be pop­u­lar, un­til it was re­al­ised the AI might still want to take con­trol of the real world to af­fect the vir­tual world; how­ever, with meth­ods like in­differ­ence, we can guard against this much bet­ter).

I men­tioned that the more of the AI’s al­gorithm that ex­isted in the vir­tual world, the bet­ter it was. But why not go the whole way? Some peo­ple at MIRI and other places are work­ing on agents mod­el­ling them­selves within the real world. Why not have the AI model it­self as an agent in­side the vir­tual world? We can quine to do this, for ex­am­ple.

Then all the re­stric­tions on the AI—mem­ory ca­pac­ity, speed, available op­tions—can be speci­fied pre­cisely, within the al­gorithm it­self. It will only have the re­sources of the vir­tual world to achieve its goals, and this will be speci­fied within it. We could define a “break” in the vir­tual world (ie any out­side in­terfer­ence that the AI could cause, were it to hack us to af­fect its vir­tual world) as some­thing that would pe­nal­ise the AI’s achieve­ments, or sim­ply as some­thing im­pos­si­ble ac­cord­ing to its model or be­liefs. It would re­ally be a case of “given these clear re­stric­tions, find the best ap­proach you can to achieve these goals in this spe­cific world”.

It would be idea if the AI’s mo­tives were not given in terms of achiev­ing any­thing in the vir­tual world, but in terms of mak­ing the de­ci­sions that, sub­ject to the given re­stric­tions, were most likely to achieve some­thing if the vir­tual world were run in its en­tirety. That way the AI wouldn’t care if the vir­tual world were shut down or any­thing similar. It should only seek to self mod­ify in way that makes sense within the world, and un­der­stand it­self ex­ist­ing com­pletely within these limi­ta­tions.

Of course, this would ideally re­quire flawless im­ple­men­ta­tion of the code; we don’t want bugs de­vel­op­ing in the vir­tual world that point to real world effects (un­less we’re re­ally con­fi­dent we have prop­erly coded the “care only about the what would hap­pen in the vir­tual world, not what ac­tu­ally does hap­pen).

Any thoughts on this idea?