Geoffrey Irving comments on Prover-Estimator Debate: A New Scalable Oversight Protocol

Geoffrey Irving 18 Jun 2025 13:21 UTC
LW: 1 AF: 1
2
AF
I think this maybe a forest vs. trees thing where we failed to say the thing explicitly because we’ve been intentionally living in the “Alice and Bob have similar heuristics and computational power” for years: the $A$ set in the paper models the set of heuristics, but I am surprised that the term “heuristic” ended up not in the final text (we just speak a lot about relative computational resources).

Fundamentally if you let B attend to either A’s thoughts or to A’s context and also ensure A and B are similar, we think A shouldn’t be able to hide things from B. If A is a very different model, or a stronger model, this won’t hold, and we should expect arguments which A can navigate in ways B can’t defend against. So in order to get anything like this to go through you need some kind of “B knows what A knows” invariant.