There is a type of value that I thought of that does not seem to be easy to aggregate.
Value X: Values having all of the aggregated Value system be Value X in finite time T. Achieves more value if the finite time T is shorter, Achieves less value if the finite time T is longer. Achieves no value if Value X does not have all of the Aggregated Value system be Value X in finite time. Is risk ambivalent, so a 50% chance of having all of the aggregated Value system being Value X in finite time T is half as good as all of the aggregated Value system being Value X in finite time T.
As an example of Value X, imagine a Blue Cult Member that believes when everyone is a Blue Cult Member that all Blue Cult Member go into heaven, which is awesome, and gives absurdly high amounts of utilons. Nothing matters other than this value.
I mean, you could say something like “Alright Blue Cult members, in the aggregation, we will give an epsilon chance of the AI making everyone become a Blue Cult member after each eon.” This might give the Blue cult members value, but from other value systems perspective, it would probably be a lot like adding more existential risk to the system.
What might a resolution of aggregating across values that resemble this look like?
Including value X in the aggregation is easy: just include a term in the aggregated utility function that depends on the aggregation used in the future. The hard part is maximizing such an aggregated utility function. If Value X takes up enough of the utility function already, an AI maximizing the aggregation might just replace its utility function with Value X and start maximizing that. Otherwise, the AI would probably ignore Value X’s preference to be the only value represented in the aggregation, since complying would cost it more utility elsewhere than it gains. There’s no point to the lottery you suggest, since a lottery between two outcomes cannot have higher utility than either of the outcomes themselves. If Value X is easily satisfied by silly technicalities, the AI could build a different AI with the aggregated utility function, make sure that the other AI becomes more powerfull than it is, and then replace its own utility function with Value X.
I don’t think your Blue Cult example works very well, because for them, the preference for everyone to join the Blue Cult is an instrumental rather than terminal value.
There is a type of value that I thought of that does not seem to be easy to aggregate.
Value X: Values having all of the aggregated Value system be Value X in finite time T. Achieves more value if the finite time T is shorter, Achieves less value if the finite time T is longer. Achieves no value if Value X does not have all of the Aggregated Value system be Value X in finite time. Is risk ambivalent, so a 50% chance of having all of the aggregated Value system being Value X in finite time T is half as good as all of the aggregated Value system being Value X in finite time T.
As an example of Value X, imagine a Blue Cult Member that believes when everyone is a Blue Cult Member that all Blue Cult Member go into heaven, which is awesome, and gives absurdly high amounts of utilons. Nothing matters other than this value.
I mean, you could say something like “Alright Blue Cult members, in the aggregation, we will give an epsilon chance of the AI making everyone become a Blue Cult member after each eon.” This might give the Blue cult members value, but from other value systems perspective, it would probably be a lot like adding more existential risk to the system.
What might a resolution of aggregating across values that resemble this look like?
Including value X in the aggregation is easy: just include a term in the aggregated utility function that depends on the aggregation used in the future. The hard part is maximizing such an aggregated utility function. If Value X takes up enough of the utility function already, an AI maximizing the aggregation might just replace its utility function with Value X and start maximizing that. Otherwise, the AI would probably ignore Value X’s preference to be the only value represented in the aggregation, since complying would cost it more utility elsewhere than it gains. There’s no point to the lottery you suggest, since a lottery between two outcomes cannot have higher utility than either of the outcomes themselves. If Value X is easily satisfied by silly technicalities, the AI could build a different AI with the aggregated utility function, make sure that the other AI becomes more powerfull than it is, and then replace its own utility function with Value X.
I don’t think your Blue Cult example works very well, because for them, the preference for everyone to join the Blue Cult is an instrumental rather than terminal value.
Thank you very much for helping me break that down!