To clarify my original comment, the hard part is designing an AGI that is provably friendly, not “sorta kinda outta be friendly” and you did not suggest anything like a proof.
Disclaimer: I have my doubts that a provably friendly AGI is possible, but that is, nevertheless, the task EY set out to do.
What is it you think may not be possible? I imagine that proofs for being self-reflectively coherent, actually optimizing its goal, stable thru rewrites and ontological revolutions, and so on are totally doable. The only fuzzyness is whether CEV is the best goal, or even a friendly one. I think the “provably friendly” thing is about having proved all those nice strong properties plus giving it the best goal system we have, (which is currently CEV).
To clarify my original comment, the hard part is designing an AGI that is provably friendly, not “sorta kinda outta be friendly” and you did not suggest anything like a proof.
Disclaimer: I have my doubts that a provably friendly AGI is possible, but that is, nevertheless, the task EY set out to do.
What is it you think may not be possible? I imagine that proofs for being self-reflectively coherent, actually optimizing its goal, stable thru rewrites and ontological revolutions, and so on are totally doable. The only fuzzyness is whether CEV is the best goal, or even a friendly one. I think the “provably friendly” thing is about having proved all those nice strong properties plus giving it the best goal system we have, (which is currently CEV).
In the Godel sense. For example, it might be independent in whatever mathematical framework EY would find it reasonable to start from.