When writing this post, I think I was biased by the specific work I’d been doing for the 6-12 months prior, and generalised that too far.
I think I stand by the claim that tooling alone could speedup the research I was working on by 3-5x. But even on the same agenda now, the work is far less amenable to major speedups from tooling. Now, work on the agenda is far less “implement several minor algorithmic variants, run hyperparameter sweeps which take < 1 hour, evaluate a set of somewhat concrete metrics, repeat”, and more “think deeply about which variants make the most sense, run >3 hours jobs/sweeps, evaluate the more murky metrics”.
The main change was switching our experiments from toy models to real models, which massively loosened the iteration loop due to increased training time and less clear evaluations.
For the current state, I think 6-12 months of tooling progress might give 1.5-2x speedup from the baseline 1 year ago.
I still believe that I underestimated safety research speedups overall, but not by as much as I thought 2 months ago.
UPDATE
When writing this post, I think I was biased by the specific work I’d been doing for the 6-12 months prior, and generalised that too far.
I think I stand by the claim that tooling alone could speedup the research I was working on by 3-5x. But even on the same agenda now, the work is far less amenable to major speedups from tooling. Now, work on the agenda is far less “implement several minor algorithmic variants, run hyperparameter sweeps which take < 1 hour, evaluate a set of somewhat concrete metrics, repeat”, and more “think deeply about which variants make the most sense, run >3 hours jobs/sweeps, evaluate the more murky metrics”.
The main change was switching our experiments from toy models to real models, which massively loosened the iteration loop due to increased training time and less clear evaluations.
For the current state, I think 6-12 months of tooling progress might give 1.5-2x speedup from the baseline 1 year ago.
I still believe that I underestimated safety research speedups overall, but not by as much as I thought 2 months ago.