The difficulty with preference fullfilment is that we need to target AI altruistic utility function exactly at humans, not, for example, all sentient/agentic beings. Superintelligent entity can use some acausal decision theory and discover some Tegmark-like theory of the multiverse, decide that there is much more paperclip maximizers than human-adjusted entities and fulfill their preferences, not ours.
I think that the kind of simulation that preference fulfillment is based on, naturally only targets actually existing humans (or animals). The kind of reasoning that makes you think about doing acausal trades with entities in other branches of Tegmarkian multiverse doesn’t seem strongly motivational in humans, I think because humans are rewarded by the actual predicted sensory experiences of their actions so they are mostly motivated by things that affect the actual world they live in. I think that if an AI were to actually end up doing acausal trades of that type, the motivation for that would need to come from a module that was running something else than a preference fulfillment drive.
I think that recursively self-improved superintelligent reflectively consistent human would do acausal trade to ensure human/human-compatible flourishing across the multiverse as well? Maybe they would do it after and only after turning local reality into Utopia, because our local preferences are stronger than general, but I don’t see the reason to not do it after local altruistic utility hits some upper attainable level, unless we find unexpected evidence that there is no multiverse in any variation.
By the way, it was a hyperbolized example of “AI learns some core concept of altruism but doesn’t prefer human preferences explicitly” . Scenario “After reflection, AI decides that insects preferences matters as much as humans” is also undesirable. My point is that learning and preserving during reflection “core altruism” is easier than doing the same for “altruism towards existing humans”, because “existing humans” are harder to specify than abstract concept of agents. We know examples when very altruistic people neglected needs of actually existing people in favor of needs of someone else—animals, future generations, future utopias, non-existent people (see antinatalism), long dead ancestors, imaginary entities (like gods or nations) etc.
I also think that for humans imaginary (not percieved directly, modeled) people often are more important, and especially they are important in the moments that really matters—when you resist conformity, for example.
The difficulty with preference fullfilment is that we need to target AI altruistic utility function exactly at humans, not, for example, all sentient/agentic beings. Superintelligent entity can use some acausal decision theory and discover some Tegmark-like theory of the multiverse, decide that there is much more paperclip maximizers than human-adjusted entities and fulfill their preferences, not ours.
I think that the kind of simulation that preference fulfillment is based on, naturally only targets actually existing humans (or animals). The kind of reasoning that makes you think about doing acausal trades with entities in other branches of Tegmarkian multiverse doesn’t seem strongly motivational in humans, I think because humans are rewarded by the actual predicted sensory experiences of their actions so they are mostly motivated by things that affect the actual world they live in. I think that if an AI were to actually end up doing acausal trades of that type, the motivation for that would need to come from a module that was running something else than a preference fulfillment drive.
I think that recursively self-improved superintelligent reflectively consistent human would do acausal trade to ensure human/human-compatible flourishing across the multiverse as well? Maybe they would do it after and only after turning local reality into Utopia, because our local preferences are stronger than general, but I don’t see the reason to not do it after local altruistic utility hits some upper attainable level, unless we find unexpected evidence that there is no multiverse in any variation.
By the way, it was a hyperbolized example of “AI learns some core concept of altruism but doesn’t prefer human preferences explicitly” . Scenario “After reflection, AI decides that insects preferences matters as much as humans” is also undesirable. My point is that learning and preserving during reflection “core altruism” is easier than doing the same for “altruism towards existing humans”, because “existing humans” are harder to specify than abstract concept of agents. We know examples when very altruistic people neglected needs of actually existing people in favor of needs of someone else—animals, future generations, future utopias, non-existent people (see antinatalism), long dead ancestors, imaginary entities (like gods or nations) etc.
I also think that for humans imaginary (not percieved directly, modeled) people often are more important, and especially they are important in the moments that really matters—when you resist conformity, for example.