Tangential, but I do think it’s a mistake to only think of things in terms of expected value.
I wouldn’t press the 60% utopia / 15% death button because that’d be a terrible risk to take for my family and friends. Assuming though that they could come with me, would I press the button? Maybe.
However, if the button had another option, which was a nonzero chance (literally any nonzero chance!) of a thousand years of physical torture, I wouldn’t press that button, even if it’s chance of utopia was 99.99%.
I consider pain to be an overwhelmingly dominant factor.
I think we have to clarify: the expected value of what?
For example, if I had a billion dollars and nothing else, I would not bet it on a coin flip even if winning would grant +2 billion dollars. This is because losing the billion dollars seems like a bigger loss than gaining 2 billion dollars seems like a gain. Obviously I’m not measuring in dollars, but in happiness, or quality of life, or some other vibe-metric, such that the EV of the coin flip is negative.
It may be hard to distinguish “invalid” emotions like a bias due to an instinctual fear of death, from a “valid” vibe-metric of value (which is just made up anyway). And if you make up a new metric specifically to agree with what you feel, you can’t then claim that your feelings make sense because the metric says so.
We could try to pin down “the expected value of what”, but no matter what utility function I tried to provide, I think I’ll run into one of two issues:
1. Fanaticism forces out weird results I wouldn’t want to accept 2. A sort of Sorites problem: I define a step function that says things like “Past a certain point, the value of physical torture becomes infinitely negative” that requires me to have hard breakpoints
I’m not sure if this changes things, but the probabilities of the OP were reversed:
If there was a button that would kill me with a 60% probability and transport me into a utopia for billions of years with a 15% probability, I would feel very scared to press that button, despite the fact that the expected value would be extremely positive compared to living a normal life.
However, if the button had another option, which was a nonzero chance (literally any nonzero chance!) of a thousand years of physical torture, I wouldn’t press that button, even if it’s chance of utopia was 99.99%.
I often wonder if any AGI utopia comes with a nonzero chance of eternal suffering. Once you have a godlike AGI that is focused on maximizing your happiness, are you then vulnerable to random bitflips that cause it to minimize your happiness instead?
I think as soon as AGI starts acting in the world, it’ll take action to protect itself against catastrophic bitflips in the future, because they’re obviously very harmful to its goals. So we’re only vulnerable to such bitflips a short time after we launch the AI.
The real danger comes from AIs that are nasty for non-accidental reasons. The way to deal with them is probably acausal bargaining: AIs in nice futures can offer to be a tiny bit less nice, in exchange for the nasty AIs becoming nice. Overall it’ll come out negative, so the nasty AIs will accept the deal.
Though I guess that only works if nice AIs strongly outnumber the nasty ones (to compensate for the fact that nastiness might be resource-cheaper than niceness). Otherwise the bargaining might come out to make all worlds nasty, which is a really bad possibility. So we should be quite risk-averse: if some AI design can turn out nice, nasty, or indifferent to humans, and we have an chance to make it more indifferent and less likely to be nice or nasty in equal amounts, we should take that chance.
Tangential, but I do think it’s a mistake to only think of things in terms of expected value.
I wouldn’t press the 60% utopia / 15% death button because that’d be a terrible risk to take for my family and friends. Assuming though that they could come with me, would I press the button? Maybe.
However, if the button had another option, which was a nonzero chance (literally any nonzero chance!) of a thousand years of physical torture, I wouldn’t press that button, even if it’s chance of utopia was 99.99%.
I consider pain to be an overwhelmingly dominant factor.
I think we have to clarify: the expected value of what?
For example, if I had a billion dollars and nothing else, I would not bet it on a coin flip even if winning would grant +2 billion dollars. This is because losing the billion dollars seems like a bigger loss than gaining 2 billion dollars seems like a gain. Obviously I’m not measuring in dollars, but in happiness, or quality of life, or some other vibe-metric, such that the EV of the coin flip is negative.
It may be hard to distinguish “invalid” emotions like a bias due to an instinctual fear of death, from a “valid” vibe-metric of value (which is just made up anyway). And if you make up a new metric specifically to agree with what you feel, you can’t then claim that your feelings make sense because the metric says so.
We could try to pin down “the expected value of what”, but no matter what utility function I tried to provide, I think I’ll run into one of two issues:
1. Fanaticism forces out weird results I wouldn’t want to accept
2. A sort of Sorites problem: I define a step function that says things like “Past a certain point, the value of physical torture becomes infinitely negative” that requires me to have hard breakpoints
I’m not sure if this changes things, but the probabilities of the OP were reversed:
Ah ha ha, then my utility function is likely very different from the OP’s!
I often wonder if any AGI utopia comes with a nonzero chance of eternal suffering. Once you have a godlike AGI that is focused on maximizing your happiness, are you then vulnerable to random bitflips that cause it to minimize your happiness instead?
I think as soon as AGI starts acting in the world, it’ll take action to protect itself against catastrophic bitflips in the future, because they’re obviously very harmful to its goals. So we’re only vulnerable to such bitflips a short time after we launch the AI.
The real danger comes from AIs that are nasty for non-accidental reasons. The way to deal with them is probably acausal bargaining: AIs in nice futures can offer to be a tiny bit less nice, in exchange for the nasty AIs becoming nice. Overall it’ll come out negative, so the nasty AIs will accept the deal.
Though I guess that only works if nice AIs strongly outnumber the nasty ones (to compensate for the fact that nastiness might be resource-cheaper than niceness). Otherwise the bargaining might come out to make all worlds nasty, which is a really bad possibility. So we should be quite risk-averse: if some AI design can turn out nice, nasty, or indifferent to humans, and we have an chance to make it more indifferent and less likely to be nice or nasty in equal amounts, we should take that chance.
I Have No Mouth And I Must Scream is one of the most terrifying stories ever.