jacquesthibs comments on Shortform

jacquesthibs 18 Sep 2025 20:07 UTC
2 points
0
For those who haven’t seen, coming from the same place as OP, I describe my thoughts in Automating AI Safety: What we can do today.
Specifically in the side notes:
Should we just wait for research systems/models to get better?
[...] Moreover, once end-to-end automation is possible, it will still take time to integrate those capabilities into real projects, so we should be building the necessary infrastructure and experience now. As Ryan Greenblatt has said, “Further, it seems likely we’ll run into integration delays and difficulties speeding up security and safety work in particular[…]. Quite optimistically, we might have a year with 3× AIs and a year with 10× AIs and we might lose half the benefit due to integration delays, safety taxes, and difficulties accelerating safety work. This would yield 6 additional effective years[…].” Building automated AI safety R&D ecosystems early ensures we’re ready when more capable systems arrive.
Research automation timelines should inform research plans
It’s worth reflecting on scheduling AI safety research based on when we expect sub-areas of safety research will be automatable. For example, it may be worth putting off R&D-heavy projects until we can get AI agents to automate our detailed plans for such projects. If you predict that it will take you 6 months to 1 year to do an R&D-heavy project, you might get more research mileage by writing a project proposal for this project and then focusing on other directions that are tractable now. Oftentimes it’s probably better to complete 10 small projects in 6 months and then one big project in an additional 2 months, rather than completing one big project in 7 months.
This isn’t to say that R&D-heavy projects are not worth pursuing—big projects that are harder to automate may still be worth prioritizing if you expect them to substantially advance downstream projects (such as ControlArena from UK AISI). But research automation will rapidly transform what is ‘low-hanging fruit’. Research directions that are currently impossible due to the time or necessary R&D required may quickly go from intractable to feasible to trivial. Carefully adapting your code, your workflow, and your research plans for research automation is something you can—and likely should—do now.
I’m also very interested in having more discussions on what a defence-in-depth approach would look like for early automated safety R&D, so that we can get value from it for longer and point the system towards the specific kinds of projects that will lead to techniques that scale to the next scale-up / capability increase.

jacquesthibs comments on Shortform

Research automation timelines should inform research plans