Yudkowsky put a lot of focus on the inadequacy of threats, and that was one part I never understood. Like he said Dath Ilan would destroy the universe before giving into aliens that would say “give us $5 or we’ll destroy the universe”. But other humans are doing way worse than that all the time, all over the place, especially in positions of power, yet if we all went MAD all the time there’d be no humanity.
On a recent re-read I think I understand a bit better.
It’s true that individual humans can’t realistically avoid giving in to threats or even accidentally threatening others, but institutions can commit to it as a legible position, e.g. “we will not negotiate with terrorists”.
If an irrational entity has the ability to unilaterally destroy the universe then it’s probably going to get destroyed anyway, so it makes more sense to follow through on precommitments in the real world and in counterfactuals to coordinate with actually rational agents.
I think the key is that if we all went MAD legibly at the same time then things would work out a lot better. And refusing to give in to threats doesn’t necessarily mean destruction, it can be as simple as collectively refusing to pay ransomware attackers even though it is currently more expensive, in the expectation that eventually it will be less expensive.
Yudkowsky put a lot of focus on the inadequacy of threats, and that was one part I never understood. Like he said Dath Ilan would destroy the universe before giving into aliens that would say “give us $5 or we’ll destroy the universe”. But other humans are doing way worse than that all the time, all over the place, especially in positions of power, yet if we all went MAD all the time there’d be no humanity.
On a recent re-read I think I understand a bit better.
It’s true that individual humans can’t realistically avoid giving in to threats or even accidentally threatening others, but institutions can commit to it as a legible position, e.g. “we will not negotiate with terrorists”.
If an irrational entity has the ability to unilaterally destroy the universe then it’s probably going to get destroyed anyway, so it makes more sense to follow through on precommitments in the real world and in counterfactuals to coordinate with actually rational agents.
I think the key is that if we all went MAD legibly at the same time then things would work out a lot better. And refusing to give in to threats doesn’t necessarily mean destruction, it can be as simple as collectively refusing to pay ransomware attackers even though it is currently more expensive, in the expectation that eventually it will be less expensive.