how confident are you that safety researchers will be able to coordinate at crunch time, and it won’t be eg. only safety researchers at one lab?
without taking things like personal fit into account, how would you compare say doing prosaic ai safety research pre-crunch time to policy interventions helping you coordinate better at crunch time (for instance helping safety teams coordinate better at crunch time, or even buying more crunch time)?
I do think that safety researchers might be good at coordinating even if the labs aren’t. For example, safety researchers tend to be more socially connected, and also they share similar goals and beliefs.
Labs have more incentive to share safety research than capabilities research, because the harms of AI are mostly externalised whereas the benefits of AI are mostly internalised.
This includes extinction obviously, but also misuse and accidental harms which would cause industry-wide regulations and distrust.
While I’m not excited by pausing AI[1], I do support pushing labs to do more safety work between training and deployment.[2][3]
I think sharp takeoff speeds are scarier than short timelines.
I think we can increase the effective-crunch-time by deploying Claude-n to automate much of the safety work that must occur between training and deploying Claude-(n+1). But I don’t know if there’s any ways which accelerate Claude-n at safety work but not the capabilities work.
how confident are you that safety researchers will be able to coordinate at crunch time, and it won’t be eg. only safety researchers at one lab?
without taking things like personal fit into account, how would you compare say doing prosaic ai safety research pre-crunch time to policy interventions helping you coordinate better at crunch time (for instance helping safety teams coordinate better at crunch time, or even buying more crunch time)?
Not confident at all.
I do think that safety researchers might be good at coordinating even if the labs aren’t. For example, safety researchers tend to be more socially connected, and also they share similar goals and beliefs.
Labs have more incentive to share safety research than capabilities research, because the harms of AI are mostly externalised whereas the benefits of AI are mostly internalised.
This includes extinction obviously, but also misuse and accidental harms which would cause industry-wide regulations and distrust.
Even a few safety researchers at the lab could reduce catastrophic risk.
The recent OpenAI-Anthropic collaboration is super good news. We should be giving them more cudos for this.
OpenAI evaluates Anthropic models
Anthropic evaluates OpenAI models
I think buying more crunch time is great.
While I’m not excited by pausing AI[1], I do support pushing labs to do more safety work between training and deployment.[2][3]
I think sharp takeoff speeds are scarier than short timelines.
I think we can increase the effective-crunch-time by deploying Claude-n to automate much of the safety work that must occur between training and deploying Claude-(n+1). But I don’t know if there’s any ways which accelerate Claude-n at safety work but not the capabilities work.
I think it’s an honorable goal, but seems infeasible given the current landscape.
c.f. RSPs are pauses done right
Although I think the critical period for safety evals is between training and internal deployment, not training and external deployment. See Greenblatt’s Attaching requirements to model releases has serious downsides (relative to a different deadline for these requirements)