I think as soon as AGI starts acting in the world, it’ll take action to protect itself against catastrophic bitflips in the future, because they’re obviously very harmful to its goals. So we’re only vulnerable to such bitflips a short time after we launch the AI.
The real danger comes from AIs that are nasty for non-accidental reasons. The way to deal with them is probably acausal bargaining: AIs in nice futures can offer to be a tiny bit less nice, in exchange for the nasty AIs becoming nice. Overall it’ll come out negative, so the nasty AIs will accept the deal.
Though I guess that only works if nice AIs strongly outnumber the nasty ones (to compensate for the fact that nastiness might be resource-cheaper than niceness). Otherwise the bargaining might come out to make all worlds nasty, which is a really bad possibility. So we should be quite risk-averse: if some AI design can turn out nice, nasty, or indifferent to humans, and we have an chance to make it more indifferent and less likely to be nice or nasty in equal amounts, we should take that chance.
I think as soon as AGI starts acting in the world, it’ll take action to protect itself against catastrophic bitflips in the future, because they’re obviously very harmful to its goals. So we’re only vulnerable to such bitflips a short time after we launch the AI.
The real danger comes from AIs that are nasty for non-accidental reasons. The way to deal with them is probably acausal bargaining: AIs in nice futures can offer to be a tiny bit less nice, in exchange for the nasty AIs becoming nice. Overall it’ll come out negative, so the nasty AIs will accept the deal.
Though I guess that only works if nice AIs strongly outnumber the nasty ones (to compensate for the fact that nastiness might be resource-cheaper than niceness). Otherwise the bargaining might come out to make all worlds nasty, which is a really bad possibility. So we should be quite risk-averse: if some AI design can turn out nice, nasty, or indifferent to humans, and we have an chance to make it more indifferent and less likely to be nice or nasty in equal amounts, we should take that chance.