Thought experiment: coarse-grained VR utopia

I think I’ve come up with a fun thought ex­per­i­ment about friendly AI. It’s pretty ob­vi­ous in ret­ro­spect, but I haven’t seen it posted be­fore.

When think­ing about what friendly AI should do, one big source of difficulty is that the in­puts are sup­posed to be hu­man in­tu­itions, based on our coarse-grained and con­fused world mod­els. While the AI’s ac­tions are sup­posed to be fine-grained ac­tions based on the true na­ture of the uni­verse, which can turn out very weird. That leads to a messy prob­lem of trans­lat­ing prefer­ences from one do­main to an­other, which crops up ev­ery­where in FAI think­ing, Wei’s com­ment and Eliezer’s writeup are good places to start.

What I just re­al­ized is that you can hand­wave the prob­lem away, by imag­in­ing a uni­verse whose true na­ture agrees with hu­man in­tu­itions by fiat. Think of it as a coarse-grained vir­tual re­al­ity where ev­ery­thing is built from poly­gons and tex­tures in­stead of atoms, and all in­ter­ac­tions be­tween ob­jects are ex­plic­itly coded. It would con­tain player avatars, con­trol­led by or­di­nary hu­man brains sit­ting out­side the simu­la­tion (so the simu­la­tion doesn’t even need to sup­port thought).

The FAI-rele­vant ques­tion is: How hard is it to de­scribe a coarse-grained VR utopia that you would agree to live in?

If de­scribing such a utopia is fea­si­ble at all, it in­volves think­ing about only hu­man-scale ex­pe­riences, not physics or tech. So in the­ory we could hand it off to hu­man philoso­phers or some other hu­man-based pro­ce­dure, thus deal­ing with “com­plex­ity of value” with­out much risk. Then we could launch a pow­er­ful AI aimed at re­build­ing re­al­ity to match it (more con­cretely, mak­ing the world’s con­scious ex­pe­riences match a spe­cific coarse-grained VR utopia, with­out any ex­tra hid­den suffer­ing). That’s still a very hard task, be­cause it re­quires solv­ing de­ci­sion the­ory and the prob­lem of con­scious­ness, but it seems more man­age­able than solv­ing friendli­ness com­pletely. The re­sult­ing world would be sub­op­ti­mal in many ways, e.g. it wouldn’t have much room for sci­ence or self-mod­ifi­ca­tion, but it might be enough to avert AI dis­aster (!)

I’m not propos­ing this as a plan for FAI, be­cause we can prob­a­bly come up with some­thing bet­ter. But what do you think of it as a thought ex­per­i­ment? Is it a use­ful way to split up the prob­lem, sep­a­rat­ing the com­plex­ity of hu­man val­ues from the com­plex­ity of non-hu­man na­ture?