habryka comments on Paranoia: A Beginner’s Guide

habryka 15 Nov 2025 23:07 UTC
6 points
1
I agree with almost all of this, though I think you don’t need to invoke knightian uncertainty. I think it’s simply enough to model there being a very large attack surface combined with a more intelligent adversary.
See this section of my reply to Unnamed about some semi-formal models in the space:
I would definitely love a better model of whether these really are the exhaustively correct strategies. I have some handwavy pointers to why I roughly think they are, but they are pretty handwavy at this point. Trying to elucidate them a tiny bit right now:
The fundamental issue that paranoia is trying to deal with is the act of an adversary predicting your outputs well-enough that to them, you can basically be treated as part of the environment (in MIRI-adjacent circles I’ve sometimes heard this referred to as “diagonalization”).
If I think about this in a Computer-Scienc-y way, I am imagining a bigger agent that is simulating a smaller agent, with a bunch of input channels that represent the observations the smaller agent makes of the world. Some fraction of those input channels can be controlled. The act of diagonalization is basically finding some set of controllable inputs that, no matter what the uncontrollable parts of the input say^[1], result in the smaller agent doing what the bigger agent wants.
Now, in this context, three strategies stand out to me that conceptually make sense:
1. You cut off your internal dependence to the controlled input channels
2. You reduce the amount of information that your adversary has about your internals so they can model your internals less well
3. You make yourself harder to predict, either by performing complicated computations to determine your actions, or making what kind of computation you perform to arrive at the result highly dependent on input channels you know are definitely uncontrolled
And like… in this very highly simplified CS model, those are roughly the three strategies that make sense to me at all? I can’t think of anything else that makes sense to do, though maybe it’s just a lack of imagination. Like, I feel like you have varied all the variables that make sense to vary in this toy-model.
And of course, it’s really unclear how well this toy-model translates to reality! But it’s one of the big generators that made me think the “3 strategies” claim makes sense.
1. ^
  Or maybe not full independence but very strong correlation. The details of this actually matter a lot, and this is where the current magic lives, but we can look at the “guaranteed output” case for now.
- Richard_Ngo 15 Nov 2025 23:17 UTC
  6 points
  1
  Parent
  though I think you don’t need to invoke knightian uncertainty. I think it’s simply enough to model there being a very large attack surface combined with a more intelligent adversary.
  One of the problems I’m pointing to is that you don’t know what the attack surface is. This puts you in a pretty different situation than if you have a known large attack surface to defend, even against a smarter adversary (e.g. the whole length of a border; or every possible sequence of Go moves).
  Separately, I may be being a bit sloppy by using “Knightian uncertainty” as a broad handle for cases where you have important “unknown unknowns”, aka you don’t even know what ontology to use. But it feels close enough that I’m by default planning to continue describing the research project outlined above as trying to develop a theory of Knightian uncertainty in which Bayesian uncertainty is a special case.