How should values be combined? [CEV answer, from what I understand, is to use something like Nick Bostrom’s parlimentary model, along with an “anti-unilateral” protocol]
(Of course, the why of CEV is an answer to a more complicated set of questions.)
An obvious thought is that the parlimentary model part seems to be mostly solved by Critch’s futarchy theorem. The scary thing about this is the prospect of people losing almost all of their voting power by making poor bets. But I think this can be solved by giving each person an equally powerful “guardian angel” AGI aligned with them specifically, and having those do the betting. That feels intuitively acceptable to me at least.
The next thought concerns the “anti-unilateral” protocol (i.e. the protocol at the end of the “Selfish Bastards” section). It seems like it would be good if we could formalize the “anti-unilateral-selfishness” part of it and bake it into something like Critch’s futarchy theorem, instead of running a complicated protocol.
Coherent Extrapolated Volition (CEV) is Eliezer’s proposal of a potentially good thing to target with an aligned superintelligence.
When I look at it, CEV factors into an answer to three questions:
Whose values count? [CEV answer: every human alive today counts equally]
How should values be extrapolated? [CEV answer: Normative Extrapolated Volition]
How should values be combined? [CEV answer, from what I understand, is to use something like Nick Bostrom’s parlimentary model, along with an “anti-unilateral” protocol]
(Of course, the why of CEV is an answer to a more complicated set of questions.)
An obvious thought is that the parlimentary model part seems to be mostly solved by Critch’s futarchy theorem. The scary thing about this is the prospect of people losing almost all of their voting power by making poor bets. But I think this can be solved by giving each person an equally powerful “guardian angel” AGI aligned with them specifically, and having those do the betting. That feels intuitively acceptable to me at least.
The next thought concerns the “anti-unilateral” protocol (i.e. the protocol at the end of the “Selfish Bastards” section). It seems like it would be good if we could formalize the “anti-unilateral-selfishness” part of it and bake it into something like Critch’s futarchy theorem, instead of running a complicated protocol.