Sandboxing by Physical Simulation?

This is a sim­ple idea. I do not re­mem­ber if I’ve seen it any­where. It is prob­a­bly not origi­nal, but I am mildly sur­prised that I haven’t come across it, even if only to see it re­futed. If this is an old/​dumb idea, please let me know and I’ll delete it.

Peo­ple have built uni­ver­sal Tur­ing ma­chines in Minecraft. It is straight­for­ward to build a vir­tu­al­ized Tur­ing ma­chine by simu­lat­ing a phys­i­cal sys­tem which car­ries out the me­chan­i­cal ac­tions needed to in­stan­ti­ate one. You could ob­vi­ously build a com­puter in a simu­lated physics sim­pler than Minecraft, but the Minecraft ex­am­ple is a more vivid one.

I don’t even want to guess how much more com­pu­ta­tion­ally ex­pen­sive it would be to run an AI, and the AI’s simu­lated en­vi­ron­ment, on a Tur­ing ma­chine be­ing run on a simu­lated phys­i­cal com­puter, it­self be­ing simu­lated on mun­dane hard­ware. But it does strike me that an AI should have sig­nifi­cantly more trou­ble hack­ing its way out of this kind of sand­box­ing.

It would likely have more trou­ble notic­ing that it’s in a simu­la­tion in the first place, and if it did no­tice, it would likely have a much harder time do­ing any­thing about it. Our hard­ware re­lies on phys­i­cal laws that we don’t full un­der­stand. It makes er­rors that we some­times can’t re­ally no­tice or stop, be­cause fun­da­men­tally it’s made of atoms and fields. One can imag­ine ways in which a clever be­ing might in­ten­tion­ally strain phys­i­cal hard­ware to see how it re­acts. We already know that an at­tacker can in­duce a bit flip in neigh­bor­ing mem­ory through physics trick­ery. In con­trast, the un­der­ly­ing simu­lated physics of the AI world/​brain could be ex­tremely sim­ple and de­signed to be as free of sur­prises as pos­si­ble. Within the dou­bly-simu­lated world, and within its own simu­lated brain, there would be no stray EM fields, no In­ter­net ports ac­ci­den­tally left open. The AI could bang on the strata of its physics all it wanted, and all “we” would see would be the flick­er­ing light of the Red­stone ma­chine.

I’m not one to un­der­es­ti­mate a su­per­in­tel­li­gence, but the baseline se­cu­rity of this kind of dou­ble-Sand­box­ing feels qual­i­ta­tively differ­ent than that of phys­i­cal hard­ware.