More or less, yes, because I care about not killing ‘unthinkable’ numbers of people due to a failure of imagination.
That’s the answer I wanted, but you forgot to answer my other question.
A human-CEV AI would extrapolate the desires of humans as (it believes) they existed right before it got the ability to alter their brains, afaict, and use this to predict what they’d tell it to do if they thought faster, better, stronger, etc.
I would see a human-CEV AI as programmed with the belief “The human CEV is correct”. Since I believe that the human CEV is very close to correct, I believe that this would produce an AI that gives very good answers.
A Pebblesorter-CEV Ai would be programmed with the belief “The pebblesorter CEV is correct”, which I believe is false but pebblesorters believe is true or close to true.
Since I believe that the human CEV is very close to correct, I believe that this would produce an AI that gives very good answers.
This presumes that the problem of specifying a CEV is well-posed. I haven’t seen any arguments around SI or LW about this very fundamental idea. I’m probably wrong and this has been addressed and will be happy to read more, but it would seem to me that it’s quite reasonable to assume that a tiny tiny error in specifying the CEV could lead to disastrously horrible results as perceived by the CEV itself.
That’s the answer I wanted, but you forgot to answer my other question.
I would see a human-CEV AI as programmed with the belief “The human CEV is correct”. Since I believe that the human CEV is very close to correct, I believe that this would produce an AI that gives very good answers.
A Pebblesorter-CEV Ai would be programmed with the belief “The pebblesorter CEV is correct”, which I believe is false but pebblesorters believe is true or close to true.
This presumes that the problem of specifying a CEV is well-posed. I haven’t seen any arguments around SI or LW about this very fundamental idea. I’m probably wrong and this has been addressed and will be happy to read more, but it would seem to me that it’s quite reasonable to assume that a tiny tiny error in specifying the CEV could lead to disastrously horrible results as perceived by the CEV itself.