Research is exploration: trying stuff to gain information about what happens and what works
You’re planning experiments, the unit of that exploration
This planning benefits from heuristics that generate, refine, and select better experiment plans: that’s taste
(As well as these heuristics, you can just plan for (effectively) longer if you have more thinkspeed, but I tentatively believe that falls off sharply per unit, until you get more feedback from reality, even when it’s serial thinkspeed)
How do you get these heuristics? By necessity, they’re partially-generalising models based on experience of experiments
(That experience can be indirect, in the form of textbooks or expert interviews etc.)
(But the key point is that taste isn’t just a generic capacity or quantity you have; it comes from looking at the world, specifically getting a feel for high value-of-information interactions)
So experimental throughput is crucial, as is sample efficiency (at improving your taste models)
Taste is a stock; it depreciates due to movement of the frontier of the known
You learn stuff from your experiments, you enter (more or less) different regimes, your heuristics are that bit further from their solid base of generalisation
How fast this deprecation happens is therefore of great interest i.e. how generalising is research taste in a given domain?
(This deprecation also means that the one-time boost to taste stock by slurping up all textbooks and expert interviews etc. is limited, but it’s not clear how limited)
I’ve a simple model of research taste:
Research is exploration: trying stuff to gain information about what happens and what works
You’re planning experiments, the unit of that exploration
This planning benefits from heuristics that generate, refine, and select better experiment plans: that’s taste
(As well as these heuristics, you can just plan for (effectively) longer if you have more thinkspeed, but I tentatively believe that falls off sharply per unit, until you get more feedback from reality, even when it’s serial thinkspeed)
How do you get these heuristics? By necessity, they’re partially-generalising models based on experience of experiments
(That experience can be indirect, in the form of textbooks or expert interviews etc.)
(But the key point is that taste isn’t just a generic capacity or quantity you have; it comes from looking at the world, specifically getting a feel for high value-of-information interactions)
So experimental throughput is crucial, as is sample efficiency (at improving your taste models)
Taste is a stock; it depreciates due to movement of the frontier of the known
You learn stuff from your experiments, you enter (more or less) different regimes, your heuristics are that bit further from their solid base of generalisation
How fast this deprecation happens is therefore of great interest i.e. how generalising is research taste in a given domain?
(This deprecation also means that the one-time boost to taste stock by slurping up all textbooks and expert interviews etc. is limited, but it’s not clear how limited)
There are a bunch of parameters that look important on this view:
how ‘far’ does taste generalise (in the given domain)
or equivalently (and perhaps easier to instrumentalise and estimate?) how fast does it depreciate as the frontier moves?
how fast does the return to extra reasoning for experiment design diminish?
what are sample efficiency scaling laws like? (Does this change for finetuning and in-context sample efficiency and the like?)
do returns to research on effective compute look different form returns to research on sample efficiency?
I expect yes, in part because effective compute improvements are a bit more straightforward to verify