I do think that safety researchers might be good at coordinating even if the labs aren’t. For example, safety researchers tend to be more socially connected, and also they share similar goals and beliefs.
Labs have more incentive to share safety research than capabilities research, because the harms of AI are mostly externalised whereas the benefits of AI are mostly internalised.
This includes extinction obviously, but also misuse and accidental harms which would cause industry-wide regulations and distrust.
While I’m not excited by pausing AI[1], I do support pushing labs to do more safety work between training and deployment.[2][3]
I think sharp takeoff speeds are scarier than short timelines.
I think we can increase the effective-crunch-time by deploying Claude-n to automate much of the safety work that must occur between training and deploying Claude-(n+1). But I don’t know if there’s any ways which accelerate Claude-n at safety work but not the capabilities work.
Not confident at all.
I do think that safety researchers might be good at coordinating even if the labs aren’t. For example, safety researchers tend to be more socially connected, and also they share similar goals and beliefs.
Labs have more incentive to share safety research than capabilities research, because the harms of AI are mostly externalised whereas the benefits of AI are mostly internalised.
This includes extinction obviously, but also misuse and accidental harms which would cause industry-wide regulations and distrust.
Even a few safety researchers at the lab could reduce catastrophic risk.
The recent OpenAI-Anthropic collaboration is super good news. We should be giving them more cudos for this.
OpenAI evaluates Anthropic models
Anthropic evaluates OpenAI models
I think buying more crunch time is great.
While I’m not excited by pausing AI[1], I do support pushing labs to do more safety work between training and deployment.[2][3]
I think sharp takeoff speeds are scarier than short timelines.
I think we can increase the effective-crunch-time by deploying Claude-n to automate much of the safety work that must occur between training and deploying Claude-(n+1). But I don’t know if there’s any ways which accelerate Claude-n at safety work but not the capabilities work.
I think it’s an honorable goal, but seems infeasible given the current landscape.
c.f. RSPs are pauses done right
Although I think the critical period for safety evals is between training and internal deployment, not training and external deployment. See Greenblatt’s Attaching requirements to model releases has serious downsides (relative to a different deadline for these requirements)