Thomas Kwa comments on Thomas Kwa’s Shortform

Thomas Kwa 26 Jan 2026 23:26 UTC
39 points
10
We could already be in takeoff:
In Tom Davidson’s semi-endogenous growth model, whether we get a software-only singularity boils down to whether r > 1, where r is a parameter in the model [1]. How far we are from takeoff is mostly determined by the AI R&D speedup current AIs provide. Because both parameters are rather difficult to estimate, I believe we can’t rule out that
- 2x uplift is already happening at the most advanced AI lab
  - Anecdotes of people being sped up by over 2x make this seem plausible (e.g. one of my colleagues has estimated he’s sped up by over 30x on some days. We did this exploratory uplift estimate by using GPT-5 to estimate the time an unassisted human would need to do the tasks he completed each day). Even if other activities like large-scale experiments aren’t sped up much by AI, you don’t need that much substitutability for a 30x SWE speed increase to reduce the need for these enough to speed up overall AI R&D progress by 2x.
- r = 1.6 (meaning each doubling of AI capabilities, as measured by equivalent software engineering labor, is 1.6 times faster than the previous doubling)
  - Epoch’s estimates of r are highly uncertain, and 1.6 is well within the error bars of 3 of their 4 methods.
If these parameters are true, then we’re already in takeoff; according to the Forethought web app, AI companies will nearly 1,000x their amount of equivalent software engineering labor in the next 12 months, and we’ll get to 1,000,000x in 19 months (by August 2027). A 1,000,000x increase is “only” equivalent to 6 years of progress at the 2020-2024 pace, but AIs got so much smarter from 2020-2024 that these AIs will probably have crazy, transformative effects on the world like in AI 2027. Again, nothing very unexpected has to be true for us to be in this world, just two parameters being at the high end of what we think is likely.
(From the Forethought web app with $f = 2.0$ , $r = 1.6$ , and default parameters otherwise. 1 unit on the y axis is equivalent to a 10x increase in AI R&D labor)
[1]: More accurately, $r := λ α / β$ where $λ$ represents inefficiency of parallel work, $α$ is the elasticity of research output to cognitive labor at a fixed compute budget, and $β$ represents “ideas getting harder to find” in AI R&D.
- faul_sname 27 Jan 2026 1:21 UTC
  6 points
  2
  Parent
  We did this exploratory uplift estimate by using GPT-5 to estimate the time an unassisted human would need to do the tasks he completed each day
  
  This doesn’t seem like the right metric. An alternative metric might be “given your pre-llm workload, how much faster can you get through it”. That’s also not quite what you care about—what you actually care about is “how many copies of non-llm-assisted you is a single llm-assisted you worth”, but that’s a much harder question to get an objective measure of.
  
  Concrete example: I recently started using a property-based testing framework. Before I started using that tooling, I spent probably about 2% of my time writing the sorts of tests that this framework can generate, probably about 50 such tests per month. I can now trivially create a million tests per month using this framework. It would, in theory, have taken me 400 months of development work to write the same number of tests as the PBT framework allows me to write in a single month ^[1] . And yet, I claim, using that framework increased my productivity by ~2%, not ~400x.
  
  By the second metric I think I’m personally observing a ~25% speedup, though it’s hard to tell since I’ve pretty much entirely stopped doing a lot of fairly valuable work which is hard to automate with AI in favor of doing other work which made no sense to do in the pre-LLM days but is now close enough to free that it makes sense to do ^[2] .
  1. ↩︎
    In practice, if for some reason I actually needed to write a million repetitive tests, I would have noticed that I was spending an awful lot of time writing the same shape of code over and over and built a half-baked replacement for a property-based testing framework myself. But I bet the same is true of your friend who experienced “30x” productivity gains—if he had actually personally built all of the stuff he used LLMs to build over that month, he would have built tools and workflows for himself over the course of doing that which would have made him much faster at it.
  2. ↩︎
    And which is time-sensitive, because it’s work where its value corresponds to the strength of the company’s offerings relative to competitors, so picking up the LLM-enabled gains early is much more valuable than waiting.
  - Thomas Kwa 27 Jan 2026 2:40 UTC
    6 points
    2
    Parent
    Thanks. We’re aware of the difference between these metrics, which as Tom Cunningham has explained to me correspond to Laspeyres and Paasche price indices in economics vs the true utility or production gain over a period. There are various ways to get better data, but the point here is just that we can’t put an upper bound on current frontier lab uplift, especially not one under 2x, because estimates span such a wide range.
- Thomas Kwa 6 Feb 2026 21:57 UTC
  4 points
  0
  Parent
  Update on whether uplift is 2x already:
  - After some analysis by a colleague I now think our most-uplifted employee gets closer to 10x than 30x uplift on the best days. There are other employees we think are uplifted 2x or more, and maybe some who are uplifted less than 2x.
  - Anthropic employees estimated they had a median of 2x (100%) uplift in the Claude 4.6 system card. I couldn’t find any GPT-5.3-Codex uplift estimates from OpenAI.
    “Productivity uplift estimates from the use of Claude Opus 4.6 ranged from 30% to 700%, with a mean of 152% and median of 100%—more modest than previous surveys that focused on superusers.”
  So basically I still think 2x uplift is plausible.
- anaguma 27 Jan 2026 0:54 UTC
  4 points
  5
  Parent
  2x uplift is already happening at the most advanced AI lab
  
  This seems plausible to me, but would be good to have a new METR uplift study to have more confidence in this.
- Parker Whitfill 27 Jan 2026 8:36 UTC
  3 points
  0
  Parent
  Shouldn’t we start to see the METR trend bending upwards if this was the case? Let T be time Horizon, A algorithmic efficiency, C training compute, E experimental compute, L labor and S speedup.
  Suppose,
  $T = (A C)^{δ}$
  $˙ A = A^{1 - β} [(S L)^{α} E^{1 - α}]^{λ}$
  Then deriving the balanced growth path
  $g_{T} = δ r g_{S} + δ [r g_{L} + \frac{λ (1 - α)}{β} g_{E} + g_{C}]$
  So if $g_{S}$ starting growing rapidly, we should see time horizon trend increase, potentially by a lot. Now maybe Opus 4.5 is evidence of this, but I’m skeptical so far. Better argument is that there is some delay from getting better research to better models because of training time, so we haven’t seen it yet—though this rules out substantial speedup before maybe 4 months ago.
  - Thomas Kwa 27 Jan 2026 19:30 UTC
    8 points
    2
    Parent
    I think there are a variety of explanations consistent with there being 2x uplift already:
    The METR benchmark just isn’t precise enough
    One-time gains from RLVR that caused a steeper slope in 2024-2025 have petered out, but they’ve been replaced by uplift
    Models have reached some time horizon threshold where they’re increasingly useful
    In the past, problems like reward hacking or poor generalization have limited real-world uplift, but these are solved enough to get 2x uplift.
    My median guess would be something lower than 2x, but we just don’t have enough data.
- ryan_greenblatt 27 Jan 2026 20:05 UTC
  2 points
  0
  Parent
  Sorry, doesn’t this web app assume full automation of AI R&D as the starting point? I don’t buy that you can just translate this model to the pre full automation regime.
  - Thomas Kwa 27 Jan 2026 21:01 UTC
    4 points
    0
    Parent
    The web app just implements this model, which I think is general enough to apply both pre and post full automation.
    - ryan_greenblatt 28 Jan 2026 16:35 UTC
      4 points
      0
      Parent
      Yeah, I don’t buy that this model can/should be applied with r=1.6 to right now, though I agree it could be general enough in priciple.
- [ ]
  [deleted]

Thomas Kwa comments on Thomas Kwa’s Shortform

We could already be in takeoff: