Vladimir_Nesov comments on Towards_Keeperhood’s Shortform

Vladimir_Nesov 13 Oct 2025 21:57 UTC
12 points
0
Just surpassing the limits of human capability at something is not any update at all at this point, because AlphaZero (with frontier LLMs using much more compute). Programming seems less of an update than natural language proof IMO, because for programming you can get away with straightforward verifiable rewards, which can’t be manually formulated for many crucial real world tasks. But natural language proof IMO requires valid informal proofs rather than merely correct or formally winning answers, which more directly demonstrates that even with a more fuzzy kind of correctness feedback LLMs can still be trained to operate at the limits of human capability.

So in my view it’s specifically this year’s natural language proof IMO results (from OpenAI and GDM) that lend a lot of credence to the following recent claim by Sholto Douglas of Anthropic:

So far the evidence indicates that our current methods haven’t yet found a problem domain that isn’t tractable with sufficient effort.
- Towards_Keeperhood 14 Oct 2025 16:31 UTC
  3 points
  2
  Parent
  I definitely have to update here—that’s just law of probability. Maybe you don’t have to update much if you already expected to have superhuman competetive programming around now.
  But also this isn’t the only update that informs my new timelines. I was saying more like “look I wrote down advanced predictions and it was actually useful to me”, rather than intending to give an epistemically legible account of my timeline models.