Oliver Sourbut comments on Better than logarithmic returns to reasoning?

Oliver Sourbut 30 Jul 2025 13:59 UTC
2 points
0
Some gestures which didn’t make the cut as they’re too woolly or not quite the right shape:
- ‘iteration’ or ‘refinement’ of proposals is the serial counterpart to best-of-k
  - if per-cycle marginal improvement to proposals is proportional to $1 / t$ (diminishing), then overall return is logarithmic
  - we have to demand constant marginal improvement to get overall linear returns, though there are intermediates
  - why and when are marginal reasoning iteration improvements constant vs diminishing around $1 / t$ vs something else?
- adversarial exponentials might force exponential expense per gain
  - e.g. combatting replicators
  - e.g. brute forcing passwords
- many empirical ‘learning curve’ effects appear to consume exponential observations per increment
  - Wright’s Law (which is the more general cousin of Moore’s Law) requires exponentially many production iterations per incremental efficiency gain
  - Deep learning scaling laws appear to consume exponential inputs per incremental gain
  - AlphaCode and AlphaZero appear to make uniform gains per runtime compute doubling
  - OpenAI’s o-series ‘reasoning models’ appear to improve accuracy on many benchmarks with logarithmic returns to more ‘test time’ compute
  - (in all of these examples, there’s some choice of what scale to represent ‘output’ on, which affects whether the gains look uniform or not, so the thesis rests on whether the choices made are ‘natural’ in some way)