Thomas Kwa comments on Thomas Kwa’s Shortform

Thomas Kwa 9 May 2025 20:14 UTC
18 points
0
The uplift equation:
What is required for AI to provide net speedup to a software engineering project, when humans write higher quality code than AIs? It depends how it’s used.
Cursor regime
In this regime, similar to how I use Cursor agent mode, the human has to read every line of code the AI generates, so we can write:
$Speedup factor = \frac{t_{H}}{t_{A I + H}} = \frac{t_{H}}{t_{p r o m p t} + t_{A I g e n} + t_{c h e c k} + p_{f a i l} \cdot t_{H}}$
Where
- $t_{H}$ is the time for the human to write the code, either from scratch or after rejecting an AI suggestion
- $t_{A I g e n}$ is the time for the AI to generate the code in tokens per second.
- $t_{c h e c k}$ is the time for the human to check the code, accept or reject it, and make any minor revisions, in order to bring the code quality and probability of bugs equal with human-written code ( $Δ p_{b u g} = 0$ )
- $p_{f a i l}$ is the fraction of AI suggestions that are rejected entirely.
Note this neglects other factors like code review time, code quality, bugs that aren’t caught by the human, or enabling things the human can’t do.
Autonomous regime
In this regime the AI is reliable enough that the human doesn’t check all the code for bugs, and instead eats the chance of costly bugs entering the codebase.
$Speedup factor = \frac{t_{H}}{t_{A I}} = \frac{t_{H}}{t_{p r o m p t} + t_{A I g e n} + Δ p_{b u g} \cdot t_{b u g c o s t} + p_{f a i l} \cdot t_{H}}$
- $Δ p_{b u g}$ is the added probability of a bug from the AI agent compared to a human
- $t_{b u g c o s t}$ is the expected cost of a bug, including revenue loss, compromising other projects, and time required to fix the bug. This can be higher than $t_{H}$
Verifiable regime
If task success is cheaply verifiable e.g. if comprehensive tests are already written or other AIs can verify the code, then bugs are impossible except through reward hacking.
$Speedup factor = \frac{t_{H}}{t_{A I}} = \frac{t_{H}}{t_{p r o m p t} + t_{A I g e n} + p_{r e w a r d h a c k} \cdot t_{b u g c o s t} + p_{f a i l} \cdot t_{H}}$
Here $t_{p r o m p t}$ is lower than in the other regimes because the verifier can help the generator AI understand the task, and $t_{A I g e n}$ can be faster too because you can parallelize.
Observations
Although the AI is always faster than writing the code than the human, overall speedup is subject to Amdahl’s law, i.e. limited by the slowest component. If AI generation is only 3x as fast as the human per line of code, speedup will never be faster than 3x. Even when AI mistake rate is low enough that we move to the autonomous regime, we still have $t_{p r o m p t}$ and $t_{A I g e n}$ , both of which can be significant fractions of $t_{H}$ currently, especially for expert humans who are fast at writing code and projects that are few lines of code. Therefore I expect >5x speedups to overall projects only when (a) AIs write higher-quality code than humans, or (b) tasks are cheaply verifiable and reward hacking is rare, or (c) both $t_{p r o m p t}$ and $t_{A I g e n}$ are much faster than in my current work.
I made a simple Desmos model here.
- faul_sname 9 May 2025 21:30 UTC
  6 points
  3
  Parent
  
  If AI generation is only 3x as fast as the human per line of code, speedup will never be faster than 3x.
  
  Unless the work can be parallelized.
- np_x 10 May 2025 5:49 UTC
  1 point
  0
  Parent
  This calculus changes when you can work on many things at once (similar to @faul_sname’s comment but this might be that you can work on many projects at once, even if they can’t each be parallelized well).

Thomas Kwa comments on Thomas Kwa’s Shortform

The uplift equation:

Observations