Unnamed comments on Diagonalization: A (slightly) more rigorous model of paranoia

Unnamed 17 Nov 2025 21:05 UTC
26 points
2
I count 5 strategies in this post & the previous one, rather than 3:
1. Blinding. Block information input from the adversary to you.
2. Privacy. Block information output from you to the adversary.
3. Disempowerment. Don’t let the adversary have control over parts of the environment that you care about.
4. Vindictiveness. Do things that are opposed to the adversary’s interests.
5. Randomness. Do things that are hard for the adversary to predict.
#3 Disempowerment was least explicitly stated in your writing but was present in how you talked about purging / removal from your environment. Examples: Don’t have a joint bank account with them, don’t appoint them to a be in charge of a department in your organization, don’t make agreements with them where they have the official legal rights but there’s a handshake deal that they’ll share things with you.
Truman’s response to the Red Scare included all (or at least most) of the first 4 strategies. It was primarily #2 Privacy—in fact the Soviet spies were mainly doing espionage—acquiring confidential information from the US government—and purging them was blocking them from getting that information. But Truman was worried about them doing subversion (getting the US government to make bad decisions) which would make purging them #3 Disempowerment. And executing them (rather than just firing them) makes it #4 Vindictive too.
The Madman Theory example in the other post is mainly about vindictiveness (it’s a threat to retaliate), even though it’s done in a way that involves some randomness.
#5 Randomness feels least like a single coherent thing out of these 5. I’d break it into:
5a Maximin. Do things that work out best in the worst case scenario. This often involves a mixed strategy where you randomize across multiple possible actions (assuming you have a hidden source of randomness).
5b Erraticness. Thwart their expectations. Don’t do the thing that they’re expecting you to do, or do something that they wouldn’t have expected.
Though #5b Erraticness seems like an actively bad idea if you have been fully diagonalized, since in case you won’t actually succeed at thwarting their expectations and your erratic action will instead be just what they wanted you to do. It is instead a strategy for cat-and-mouse games where they can partially model you but you can still hope to outsmart them.
If you have been diagonalized, it’s better to limit your repertoire of actions. Choose inaction where possible, stick to protocol, don’t do things that are out of distribution. The smaller the set of actions that you ever do, the fewer options the diagonalizer has for what to get you to do. A hacker gets a computer system into a weird edge case, a social engineer gets someone to break protocol, a jailbreaker gets an LLM into an out-of-distribution state. An aspiring diagonalizer also wants to influence the process that you use to make decisions, and falling back on a pre-existing protocol can block that influence. I would include this on my list of strategies, maybe #6 Act Conservatively.
Looking back through these, most of them aren’t that specific to diagonalization scenarios. Strategies 4 (Vindictiveness) & 5a (Maximin) are standard game theory which come up in lots of contexts. I think that strategies 1-3 fall out of a fairly broad sense of what it means for someone to be an adversary—they are acting contrary to your interests, in a way that’s entangled with you; they’re not just off somewhere else doing things you don’t like, they are in some way using you to get more of the thing that’s bad for you. In what ways might they be using you to get more of the thing? Maybe they’re getting information from you which they can then use for their purposes, maybe they’re trying to influence what you do so you do what they want, maybe you’ve let them have control over something which you could have disallowed. Strategies 1 (Blinding), 2 (Privacy), and 3 (Disempowerment) just involve undoing/blocking one of those.
- habryka 17 Nov 2025 21:21 UTC
  4 points
  0
  Parent
  #5 Randomness feels least like a single coherent thing out of these 5. I’d break it into:
  5a Maximin. Do things that work out best in the worst case scenario. This often involves a mixed strategy where you randomize across multiple possible actions (assuming you have a hidden source of randomness).
  5b Erraticness. Thwart their expectations. Don’t do the thing that they’re expecting you to do, or do something that they wouldn’t have expected.
  Though #5b Erraticness seems like an actively bad idea if you have been fully diagonalized, since in case you won’t actually succeed at thwarting their expectations and your erratic action will instead be just what they wanted you to do. It is instead a strategy for cat-and-mouse games where they can partially model you but you can still hope to outsmart them.
  I found this section quite helpful and think splitting that into these two parts is probably the right call (including the caveat that this backfires if your opponent has actually diagonalized you).
  I am working on a post trying to find a set of more common-language abstractions for reasoning about this stuff, where I think the eraticness fits a bit better into.