I’m sympathetic to the fact that it might be costly (in terms of time and possibly other factors like reputation) to respond to some of these questions. With that in mind, I applaud DeepMind’s alignment team for engaging with some of these questions, I applaud OpenAI for publicly stating their alignment plan, and I’ve been surprised that Anthropic has engaged the least (at least publicly, to my knowledge) about these kinds of questions.
+1. A few other questions I’m interested in:
Which threat models is Anthropic most/least concerned about?
What are Anthropic’s thoughts on AGI ruin arguments?
Would Anthropic merge-and-assist if another safety-conscious project comes close to building AGI?
What kind of evidence would update Anthropic away from (or more strongly toward) their current focus on empiricism/iteration?
What are some specific observations that made/continue-to-make Anthropic leadership concerned about OpenAI’s commitment to safety?
What does Anthropic think about DeepMind’s commitment to safety?
What is Anthropic’s governance/strategy theory-of-change?
If Anthropic gets AGI, what do they want to do with it?
I’m sympathetic to the fact that it might be costly (in terms of time and possibly other factors like reputation) to respond to some of these questions. With that in mind, I applaud DeepMind’s alignment team for engaging with some of these questions, I applaud OpenAI for publicly stating their alignment plan, and I’ve been surprised that Anthropic has engaged the least (at least publicly, to my knowledge) about these kinds of questions.
A “Core Views on AI Safety” post is now available at https://www.anthropic.com/index/core-views-on-ai-safety
(Linkpost for that is here: https://www.lesswrong.com/posts/xhKr5KtvdJRssMeJ3/anthropic-s-core-views-on-ai-safety.)