Thomas Kwa comments on Thomas Kwa’s Shortform

Thomas Kwa 27 Jan 2026 19:30 UTC
8 points
2
I think there are a variety of explanations consistent with there being 2x uplift already:
- The METR benchmark just isn’t precise enough
- One-time gains from RLVR that caused a steeper slope in 2024-2025 have petered out, but they’ve been replaced by uplift
- Models have reached some time horizon threshold where they’re increasingly useful
- In the past, problems like reward hacking or poor generalization have limited real-world uplift, but these are solved enough to get 2x uplift.
My median guess would be something lower than 2x, but we just don’t have enough data.