Quick thought: If you have an aligned AI in a multipolar scenario, other AIs might threaten to cause S-risk in order to get said FAI to do stuff, or as blackmail. Therefore, we should make the FAI think of X-risk and S-risk as equally bad (even though S-risk is in reality terrifyingly worse), because that way other powerful AIs will simply use oblivion as a threat instead of astronomical suffering (as oblivion is much easier to bring about).
It is possible that an FAI would be able to do some sort of weird crazy acausal decision-theory trick to make itself act as if it doesn’t care about anything done in efforts to blackmail it or something like that. But this is just to make sure.
This has the obvious problem that an AI will then be indifferent between astronomical suffering and oblivion. In ANY situation where it will need to choose between those two, it will not care about which occurs on the merits, not just blackmail situations.
You don’t want your AI to prefer a 99.999% chance of astronomical suffering to a 99.9999% of oblivion. Astronomical suffering is much worse.
Quick thought: If you have an aligned AI in a multipolar scenario, other AIs might threaten to cause S-risk in order to get said FAI to do stuff, or as blackmail. Therefore, we should make the FAI think of X-risk and S-risk as equally bad (even though S-risk is in reality terrifyingly worse), because that way other powerful AIs will simply use oblivion as a threat instead of astronomical suffering (as oblivion is much easier to bring about).
It is possible that an FAI would be able to do some sort of weird crazy acausal decision-theory trick to make itself act as if it doesn’t care about anything done in efforts to blackmail it or something like that. But this is just to make sure.
This kind of idea has been discussed under the names “surrogate goals” and “safe Pareto improvements”, see here.
This has the obvious problem that an AI will then be indifferent between astronomical suffering and oblivion. In ANY situation where it will need to choose between those two, it will not care about which occurs on the merits, not just blackmail situations.
You don’t want your AI to prefer a 99.999% chance of astronomical suffering to a 99.9999% of oblivion. Astronomical suffering is much worse.