I agree with you about not knowing any foolproof wording. In terms of what Eliezer had in mind though, here’s what the LessWrong wiki has to say on CEV:
In calculating CEV, an AI would predict what an idealized version of us would want, “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”.
http://wiki.lesswrong.com/wiki/CEV
So it’s not just, “be good to humans,” but rather, “do what (idealized) humans would want you to.” I think it’s an open question whether those would be the same thing.
I agree with you about not knowing any foolproof wording. In terms of what Eliezer had in mind though, here’s what the LessWrong wiki has to say on CEV:
So it’s not just, “be good to humans,” but rather, “do what (idealized) humans would want you to.” I think it’s an open question whether those would be the same thing.