Anthropic is indeed trying. Unfortunately, they are not succeeding, and they don’t appear to be on track to notice this fact and actually stop.
If Anthropic does not keep up with the reckless scaling of e.g. OpenAI, they will likely cease to attract investment and wither on the vine. But aligning superintelligence is harder than building it. A handful of alignment researchers working alongside capabilities folks aren’t going to cut it. Anthropic cannot afford to delay scaling; even if their alignment researchers advised against training the next model, Anthropic could not afford to heed them for long.
I’m primarily talking about the margin when I advise folks not to go work at Anthropic, but even if the company had literally zero dedicated alignment researchers, I question the claim that the capabilities folks would be unable to integrate publicly available alignment research. If they had a Manual of Flawless Alignment produced by diligent outsiders, they could probably use it. (Though even then, we would not be safe, since some labs would inevitably cut corners.)
I think the collective efforts of humanity can produce such a Manual given time. But in the absence of such a Manual, scaling is suicide. If Anthropic builds superintelligence at approximately the same velocity as everyone else while trying really really hard to align it, everyone dies anyway.
Anthropic is indeed trying. Unfortunately, they are not succeeding, and they don’t appear to be on track to notice this fact and actually stop.
If Anthropic does not keep up with the reckless scaling of e.g. OpenAI, they will likely cease to attract investment and wither on the vine. But aligning superintelligence is harder than building it. A handful of alignment researchers working alongside capabilities folks aren’t going to cut it. Anthropic cannot afford to delay scaling; even if their alignment researchers advised against training the next model, Anthropic could not afford to heed them for long.
I’m primarily talking about the margin when I advise folks not to go work at Anthropic, but even if the company had literally zero dedicated alignment researchers, I question the claim that the capabilities folks would be unable to integrate publicly available alignment research. If they had a Manual of Flawless Alignment produced by diligent outsiders, they could probably use it. (Though even then, we would not be safe, since some labs would inevitably cut corners.)
I think the collective efforts of humanity can produce such a Manual given time. But in the absence of such a Manual, scaling is suicide. If Anthropic builds superintelligence at approximately the same velocity as everyone else while trying really really hard to align it, everyone dies anyway.