Oliver Sourbut comments on AI Futures Timelines and Takeoff Model: Dec 2025 Update

Oliver Sourbut 6 Jan 2026 12:28 UTC
3 points
0
I’ve a simple model of research taste:
- Research is exploration: trying stuff to gain information about what happens and what works
- You’re planning experiments, the unit of that exploration
- This planning benefits from heuristics that generate, refine, and select better experiment plans: that’s taste
  - (As well as these heuristics, you can just plan for (effectively) longer if you have more thinkspeed, but I tentatively believe that falls off sharply per unit, until you get more feedback from reality, even when it’s serial thinkspeed)
- How do you get these heuristics? By necessity, they’re partially-generalising models based on experience of experiments
  - (That experience can be indirect, in the form of textbooks or expert interviews etc.)
  - (But the key point is that taste isn’t just a generic capacity or quantity you have; it comes from looking at the world, specifically getting a feel for high value-of-information interactions)
- So experimental throughput is crucial, as is sample efficiency (at improving your taste models)
- Taste is a stock; it depreciates due to movement of the frontier of the known
  - You learn stuff from your experiments, you enter (more or less) different regimes, your heuristics are that bit further from their solid base of generalisation
  - How fast this deprecation happens is therefore of great interest i.e. how generalising is research taste in a given domain?
  - (This deprecation also means that the one-time boost to taste stock by slurping up all textbooks and expert interviews etc. is limited, but it’s not clear how limited)
- Oliver Sourbut 6 Jan 2026 12:41 UTC
  3 points
  0
  Parent
  There are a bunch of parameters that look important on this view:
  - how ‘far’ does taste generalise (in the given domain)
    or equivalently (and perhaps easier to instrumentalise and estimate?) how fast does it depreciate as the frontier moves?
  - how fast does the return to extra reasoning for experiment design diminish?
  - what are sample efficiency scaling laws like? (Does this change for finetuning and in-context sample efficiency and the like?)
  - do returns to research on effective compute look different form returns to research on sample efficiency?
    I expect yes, in part because effective compute improvements are a bit more straightforward to verify