Let me offer another example showing the virtues of my ‘bargaining’ approach to value robustness.
Suppose the ‘true’ human values require all four of (a)happiness, (b)excitement, (c)social interaction, and (d)‘meaningfulness’ (whatever that is). But of five seed AIs, only one of them got this aspect of human values exactly right: (abcd) = 100, all other possibilities = 0.
The other four all leave out one of the requirements—for example: (acd) = 100, any of a,c,or d missing = 0, b is irrelevant.
If these 5 AIs strike a bargain, they will assign: (abcd) = 100, any three out of four = 20, anything else = 0. Three out of four of the essential values scores only 20 because four out of five AIs consider that situation unacceptable.
So I tend to think that the bargaining dynamic tends to “robustly preserve fragility of values”, if that is not too much of an oxymoron.
Let me offer another example showing the virtues of my ‘bargaining’ approach to value robustness.
Suppose the ‘true’ human values require all four of (a)happiness, (b)excitement, (c)social interaction, and (d)‘meaningfulness’ (whatever that is). But of five seed AIs, only one of them got this aspect of human values exactly right: (abcd) = 100, all other possibilities = 0. The other four all leave out one of the requirements—for example: (acd) = 100, any of a,c,or d missing = 0, b is irrelevant.
If these 5 AIs strike a bargain, they will assign: (abcd) = 100, any three out of four = 20, anything else = 0. Three out of four of the essential values scores only 20 because four out of five AIs consider that situation unacceptable.
So I tend to think that the bargaining dynamic tends to “robustly preserve fragility of values”, if that is not too much of an oxymoron.