Correct me if I’m wrong, but it sounds to me like you’re operating from a definition of Friendliness that is something like, “be good to humans.” Whereas, my understanding is that Friendliness is more along the lines of “do what we would want you to do if we were smarter / better.” So, if we would want an AI to be a good galactic citizen if we thought about it more, that’s what it would do.
Does your critique still apply to this CEV-type definition of Friendliness?
I thought it wasn’t so much “do what we would want you to do if we were better”, as “be good to humans, using the definitions of ‘good’ and ‘humans’ that we’d supply if we were better at anticipating what will actually benefit us and the consequences of particular ways of wording constraints”.
Because couldn’t it decide that a better human would be purely altruistic and want to turn over all the resources in the universe to a species able to make more efficient use of them?
I have more questions than answers, and I’d be suspicious of anyone who, at this stage, was 100% certain that they knew a foolproof way to word things.
I agree with you about not knowing any foolproof wording. In terms of what Eliezer had in mind though, here’s what the LessWrong wiki has to say on CEV:
In calculating CEV, an AI would predict what an idealized version of us would want, “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”.
http://wiki.lesswrong.com/wiki/CEV
So it’s not just, “be good to humans,” but rather, “do what (idealized) humans would want you to.” I think it’s an open question whether those would be the same thing.
Correct me if I’m wrong, but it sounds to me like you’re operating from a definition of Friendliness that is something like, “be good to humans.” Whereas, my understanding is that Friendliness is more along the lines of “do what we would want you to do if we were smarter / better.” So, if we would want an AI to be a good galactic citizen if we thought about it more, that’s what it would do.
Does your critique still apply to this CEV-type definition of Friendliness?
I thought it wasn’t so much “do what we would want you to do if we were better”, as “be good to humans, using the definitions of ‘good’ and ‘humans’ that we’d supply if we were better at anticipating what will actually benefit us and the consequences of particular ways of wording constraints”.
Because couldn’t it decide that a better human would be purely altruistic and want to turn over all the resources in the universe to a species able to make more efficient use of them?
I have more questions than answers, and I’d be suspicious of anyone who, at this stage, was 100% certain that they knew a foolproof way to word things.
I agree with you about not knowing any foolproof wording. In terms of what Eliezer had in mind though, here’s what the LessWrong wiki has to say on CEV:
So it’s not just, “be good to humans,” but rather, “do what (idealized) humans would want you to.” I think it’s an open question whether those would be the same thing.