There are a bunch of technical nits I could pick with the article (the most salient one now being “Force them to seek a partner (co-parent).”)
I’m not insisting on that suggestion, but I’m curious why you want to single that one out as something to object to.
I think Eliezer’s response would be that value is fragile: it doesn’t matter if we preserve most of our values if we lose a single critical one. … What do you think?
I probably agree with Eliezer on the importance of preserving our human values near-exactly (though I might disagree with him about what those values are). But I don’t see how preserving all of our values would be easier in the case of a FOOMing singleton than in the scenario I promote here—in which we have a collectively slow-FOOMing society of roughly-balanced-in-power AIs and (in the early stages) unenhanced humans. In fact, I think that preserving human values will be easier if we have a large number of independent advocates of those values—each of them possibly characterizing those values in slightly different ways.
Sorry, I deleted my comment because I wanted to think it over a bit more, whether the “value is fragile” criticism applies to your idea. I think it does after all.
Suppose there are two critical values, happiness and boredom. We want to be happy, but not by doing the same thing over and over again. So something like: U(unhappy boring future)=0, U(happy boring future)=10, U(unhappy non-boring future)=0, U(happy non-boring future)=100.
Now suppose Alice and Bob each creates an AI. Alice manages to preserve happiness, and Bob manages to preserve boredom. What happens when their AIs merge? The merged utility function would be, assuming they have equal bargaining power: U(unhappy boring future)=0, U(happy boring future)=50, U(unhappy non-boring future)=50, U(happy non-boring future)=100.
Let me offer another example showing the virtues of my ‘bargaining’ approach to value robustness.
Suppose the ‘true’ human values require all four of (a)happiness, (b)excitement, (c)social interaction, and (d)‘meaningfulness’ (whatever that is). But of five seed AIs, only one of them got this aspect of human values exactly right: (abcd) = 100, all other possibilities = 0.
The other four all leave out one of the requirements—for example: (acd) = 100, any of a,c,or d missing = 0, b is irrelevant.
If these 5 AIs strike a bargain, they will assign: (abcd) = 100, any three out of four = 20, anything else = 0. Three out of four of the essential values scores only 20 because four out of five AIs consider that situation unacceptable.
So I tend to think that the bargaining dynamic tends to “robustly preserve fragility of values”, if that is not too much of an oxymoron.
Sorry, I deleted my comment because I wanted to think it over a bit more, whether the “value is fragile” criticism applies to your idea.
That was unfortunate, because the Value is Fragile issue is important in this discussion regardless of whether it is more of an issue for CEV or my suggestion.
The merged utility function would be, assuming they have equal bargaining power: U(unhappy boring future)=0, U(happy boring future)=50, U(unhappy non-boring future)=50, U(happy non-boring future)=100.
Do you agree that’s a problem?
Well, that merged utility function is certainly less than ideal. Presumably we would prefer that (unhappy non-boring) and (happy boring) had been assigned utilities of zero, like (unhappy boring). However, I will point out that if the difference between an acceptable future and a horrible one is only 100 utils, then 50 utils penalty also ought to be enough to prevent those half-horrible futures. Furthermore, a Nash bargain is characterized by both a composite utility function and a fairness constraint. (That is, a collective behaving in conformance with a Nash bargain is not precisely rational. It might split its charitable giving between two charities, for example.) That fairness constraint provides a second incentive driving the collective away from those mixed futures.
However, when presenting an example intended to point out the flaws in one proposal, it is usually a good idea to see how the other proposals do on that example. In this case, it seems that the CEV version of this example might be a seed AI which is created by Alice OR Bob. It is either boring or unhappy, but not both, with a coin flip deciding which.
I think that preserving human values will be easier if we have a large number of independent advocates of those values—each of them possibly characterizing those values in slightly different ways.
It seems probable. Multiple humans seem pretty likely to be preserved due to their historical value—if for no other reason.
I’m not insisting on that suggestion, but I’m curious why you want to single that one out as something to object to.
I probably agree with Eliezer on the importance of preserving our human values near-exactly (though I might disagree with him about what those values are). But I don’t see how preserving all of our values would be easier in the case of a FOOMing singleton than in the scenario I promote here—in which we have a collectively slow-FOOMing society of roughly-balanced-in-power AIs and (in the early stages) unenhanced humans. In fact, I think that preserving human values will be easier if we have a large number of independent advocates of those values—each of them possibly characterizing those values in slightly different ways.
Sorry, I deleted my comment because I wanted to think it over a bit more, whether the “value is fragile” criticism applies to your idea. I think it does after all.
Suppose there are two critical values, happiness and boredom. We want to be happy, but not by doing the same thing over and over again. So something like: U(unhappy boring future)=0, U(happy boring future)=10, U(unhappy non-boring future)=0, U(happy non-boring future)=100.
Now suppose Alice and Bob each creates an AI. Alice manages to preserve happiness, and Bob manages to preserve boredom. What happens when their AIs merge? The merged utility function would be, assuming they have equal bargaining power: U(unhappy boring future)=0, U(happy boring future)=50, U(unhappy non-boring future)=50, U(happy non-boring future)=100.
Do you agree that’s a problem?
Let me offer another example showing the virtues of my ‘bargaining’ approach to value robustness.
Suppose the ‘true’ human values require all four of (a)happiness, (b)excitement, (c)social interaction, and (d)‘meaningfulness’ (whatever that is). But of five seed AIs, only one of them got this aspect of human values exactly right: (abcd) = 100, all other possibilities = 0. The other four all leave out one of the requirements—for example: (acd) = 100, any of a,c,or d missing = 0, b is irrelevant.
If these 5 AIs strike a bargain, they will assign: (abcd) = 100, any three out of four = 20, anything else = 0. Three out of four of the essential values scores only 20 because four out of five AIs consider that situation unacceptable.
So I tend to think that the bargaining dynamic tends to “robustly preserve fragility of values”, if that is not too much of an oxymoron.
That was unfortunate, because the Value is Fragile issue is important in this discussion regardless of whether it is more of an issue for CEV or my suggestion.
Well, that merged utility function is certainly less than ideal. Presumably we would prefer that (unhappy non-boring) and (happy boring) had been assigned utilities of zero, like (unhappy boring). However, I will point out that if the difference between an acceptable future and a horrible one is only 100 utils, then 50 utils penalty also ought to be enough to prevent those half-horrible futures. Furthermore, a Nash bargain is characterized by both a composite utility function and a fairness constraint. (That is, a collective behaving in conformance with a Nash bargain is not precisely rational. It might split its charitable giving between two charities, for example.) That fairness constraint provides a second incentive driving the collective away from those mixed futures.
However, when presenting an example intended to point out the flaws in one proposal, it is usually a good idea to see how the other proposals do on that example. In this case, it seems that the CEV version of this example might be a seed AI which is created by Alice OR Bob. It is either boring or unhappy, but not both, with a coin flip deciding which.
It seems probable. Multiple humans seem pretty likely to be preserved due to their historical value—if for no other reason.