Some gestures which didn’t make the cut as they’re too woolly or not quite the right shape:
‘iteration’ or ‘refinement’ of proposals is the serial counterpart to best-of-k
if per-cycle marginal improvement to proposals is proportional to 1/t (diminishing), then overall return is logarithmic
we have to demand constant marginal improvement to get overall linear returns, though there are intermediates
why and when are marginal reasoning iteration improvements constant vs diminishing around 1/t vs something else?
adversarial exponentials might force exponential expense per gain
e.g. combatting replicators
e.g. brute forcing passwords
many empirical ‘learning curve’ effects appear to consume exponential observations per increment
Wright’s Law (which is the more general cousin of Moore’s Law) requires exponentially many production iterations per incremental efficiency gain
Deep learning scaling laws appear to consume exponential inputs per incremental gain
AlphaCode and AlphaZero appear to make uniform gains per runtime compute doubling
OpenAI’s o-series ‘reasoning models’ appear to improve accuracy on many benchmarks with logarithmic returns to more ‘test time’ compute
(in all of these examples, there’s some choice of what scale to represent ‘output’ on, which affects whether the gains look uniform or not, so the thesis rests on whether the choices made are ‘natural’ in some way)
Some gestures which didn’t make the cut as they’re too woolly or not quite the right shape:
‘iteration’ or ‘refinement’ of proposals is the serial counterpart to best-of-k
if per-cycle marginal improvement to proposals is proportional to 1/t (diminishing), then overall return is logarithmic
we have to demand constant marginal improvement to get overall linear returns, though there are intermediates
why and when are marginal reasoning iteration improvements constant vs diminishing around 1/t vs something else?
adversarial exponentials might force exponential expense per gain
e.g. combatting replicators
e.g. brute forcing passwords
many empirical ‘learning curve’ effects appear to consume exponential observations per increment
Wright’s Law (which is the more general cousin of Moore’s Law) requires exponentially many production iterations per incremental efficiency gain
Deep learning scaling laws appear to consume exponential inputs per incremental gain
AlphaCode and AlphaZero appear to make uniform gains per runtime compute doubling
OpenAI’s o-series ‘reasoning models’ appear to improve accuracy on many benchmarks with logarithmic returns to more ‘test time’ compute
(in all of these examples, there’s some choice of what scale to represent ‘output’ on, which affects whether the gains look uniform or not, so the thesis rests on whether the choices made are ‘natural’ in some way)