Vladimir_Nesov comments on Diagonalization: A (slightly) more rigorous model of paranoia

Vladimir_Nesov 16 Nov 2025 11:05 UTC
14 points
0

We can proof by contradiction that if one agent is capable of predicting another agent, the other agent cannot in turn do the same.

Only if one of them is diagonalizing the other (acting contrary to what the other would’ve predicted about its actions). If this isn’t happening, maybe there is no problem.

For example, the halting problem is unsolvable because you are asking for a predictor that simultaneously predicts behavior of every program, and among all programs there is at least one (that’s easy to construct) that is diagonalizing the predictor’s prediction of its behavior (acting contrary to what the predictor would’ve predicted about its behavior), by predicting the predictor and doing the opposite. But proving that a specific program halts (or not) is often possible, that’s not the halting problem.

If the smaller agent was also perfectly predicting the bigger agent, then the bigger agent couldn’t be perfectly predicting the smaller agent, as doing so would trigger an infinite regress

There is no infinite regress, and probably no useful ordering of agents/programs by how big they are in this way. It’s perfectly possible for agents to reason about each other, including about their predictions about themselves or each other. And where there is diagonalization, it doesn’t exactly say which agent was bigger (an agent can even diagonalize itself, to make its own actions unpredictable to itself).

See for example the ASP problem where in Newcomb’s problem the predictor is “smaller” and by-stipulation predictable (rather than an all-powerful Omega), and so the “bigger” box-choosing agent needs to avoid any sudden movements in its thoughts to keep itself predictable and get the big box filled by the predictor.

Maybe quines can illustrate how there is no by-default infinite regress. You can write a program in python that prints a program in Java that in turn prints the original program in python. Neither of the programs is “bigger” than the other.

When a larger agent contains a smaller agent this way, the smaller agent can simply be treated like any other part of the environment. If you want to achieve a goal, you simply figure what action of yours produces the best outcome, including the reaction from the smaller agent.

Other than blinding itself to the bigger agent’s actions, alternative safer ways of observing the bigger agent might be available, reasoning about it rather than observing what it actually does directly. Even a “big” agent doesn’t contain or control all the reasonings about it, a theory of an agent is bigger than the agent itself, and others can pick and choose what to reason about. Also, self-contained reasoning that produces some conclusion can itself make use of observations of the “big” agent, if the observations are not used for anything else, so it’s not even necessarily about blinding, but rather compartmentalized reasoning where the observations (tainted data) don’t get indiscriminate influence, but can still be carefully used to learn things.

Ok, but please, does anyone have a suggestion for a better term than “diagonalization”?

It’s from Cantor’s diagonal argument. See also diagonal argument, Lawvere’s fixpoint theorem. It’s just this, you construct an endomap without fixpoints and that breaks stuff. This works as well for maps that are defined/enacted by agents in their behavior, mapping beliefs/observations to actions, you just need to close the loop so that beliefs/observations start talking about the same things as the actions.
- habryka 16 Nov 2025 18:38 UTC
  2 points
  0
  Parent
  Ok, having more time today and thinking more about it, I have updated the description of the proof in the infobox! Curious whether it seems better/more accurate to you now.
- habryka 16 Nov 2025 18:12 UTC
  2 points
  0
  Parent
  Only if one of them is diagonalizing the other (acting contrary to what the other would’ve predicted about its actions). If this isn’t happening, maybe there is no problem.
  Ah, yes, of course. I’ll update the description.
  Agree on all the rest, I think. I didn’t intent to establish a strict ordering of agents (though my usage of bigger and smaller in the strict case of adversarial diagonalizing agents sure suggested it). In those case I find it a useful visualization to think about bigger and smaller.
  It’s from Cantor’s diagonal argument.
  I agree that “diagonalization” is a fine term for the specific narrow thing where you choose actions contrary to what the other agent would have predicted you would do, in the way described here, but I am more talking about the broader phenomenon of “simulating other agents adversarially in order to circumvent their predictions”. “Leveling” is apparently a term from poker that means something kind of similar and more general:
  Levelling
  Leveling in poker is the process of anticipating what your opponent thinks you are thinking, often leading to deeper layers of strategic decision-making.
  Its purpose is to outthink opponents by operating on a higher mental “level” than they are.
  Like, I would like a term for this kind of thing that is less opinionated about the exact setup, and technical limitations. Like, I am pretty sure there is a more general phenomenon here.
  - Vladimir_Nesov 16 Nov 2025 22:47 UTC
    2 points
    0
    Parent
    
    I am more talking about the broader phenomenon of “simulating other agents adversarially in order to circumvent their predictions”
    
    The idea of “simulating adversarially” might be a bit confusing in the context of diagonalization, since it’s the diagonalization that is adversarial, not the simulation. In particular, you’d want mutual simulation (or rather more abstract reasoning) for coordination. If you merely succeed in acting contrary to a prediction, making the prediction wrong, that’s not diagonalization. What diagonalization does is make the prediction not-happen in the first place (or in the case of putting a credence on something, for the credence to remain at some weaker prior). So diagonalization is something done against a predictor whose prediction is targeted, rather than something done by the predictor. A diagonalizer might itself want to be a predictor, but that is not necessary if the prediction is just given to it.
- Annabelle 16 Nov 2025 14:43 UTC
  1 point
  0
  Parent
  We can proof by contradiction that if one agent is capable of predicting another agent, the other agent cannot in turn do the same.
  I’m glad you responded to this as this stood out to me too.
  Maybe quines can illustrate how there is no by-default infinite regress.
  Quines only illustrate that there is no by-default infinite regress within the assumed system (here, a formal, deterministic string-rewriting game), which is built on assumptions themselves subject to the Munchhausen Trilemma.
  I’m not trying to be pedantic here; I think it’s pretty important to consider the implications of this.

Vladimir_Nesov comments on Diagonalization: A (slightly) more rigorous model of paranoia

Levelling