Sorry, I deleted my comment because I wanted to think it over a bit more, whether the “value is fragile” criticism applies to your idea.
That was unfortunate, because the Value is Fragile issue is important in this discussion regardless of whether it is more of an issue for CEV or my suggestion.
The merged utility function would be, assuming they have equal bargaining power: U(unhappy boring future)=0, U(happy boring future)=50, U(unhappy non-boring future)=50, U(happy non-boring future)=100.
Do you agree that’s a problem?
Well, that merged utility function is certainly less than ideal. Presumably we would prefer that (unhappy non-boring) and (happy boring) had been assigned utilities of zero, like (unhappy boring). However, I will point out that if the difference between an acceptable future and a horrible one is only 100 utils, then 50 utils penalty also ought to be enough to prevent those half-horrible futures. Furthermore, a Nash bargain is characterized by both a composite utility function and a fairness constraint. (That is, a collective behaving in conformance with a Nash bargain is not precisely rational. It might split its charitable giving between two charities, for example.) That fairness constraint provides a second incentive driving the collective away from those mixed futures.
However, when presenting an example intended to point out the flaws in one proposal, it is usually a good idea to see how the other proposals do on that example. In this case, it seems that the CEV version of this example might be a seed AI which is created by Alice OR Bob. It is either boring or unhappy, but not both, with a coin flip deciding which.
That was unfortunate, because the Value is Fragile issue is important in this discussion regardless of whether it is more of an issue for CEV or my suggestion.
Well, that merged utility function is certainly less than ideal. Presumably we would prefer that (unhappy non-boring) and (happy boring) had been assigned utilities of zero, like (unhappy boring). However, I will point out that if the difference between an acceptable future and a horrible one is only 100 utils, then 50 utils penalty also ought to be enough to prevent those half-horrible futures. Furthermore, a Nash bargain is characterized by both a composite utility function and a fairness constraint. (That is, a collective behaving in conformance with a Nash bargain is not precisely rational. It might split its charitable giving between two charities, for example.) That fairness constraint provides a second incentive driving the collective away from those mixed futures.
However, when presenting an example intended to point out the flaws in one proposal, it is usually a good idea to see how the other proposals do on that example. In this case, it seems that the CEV version of this example might be a seed AI which is created by Alice OR Bob. It is either boring or unhappy, but not both, with a coin flip deciding which.