CharlesL

Karma: 29

I am independent AI researcher focusing on interpretability. My day job is in technical consulting for an AI-enabled financial software firm.

CharlesL 25 Feb 2026 2:36 UTC
4 points
0
on: Responsible Scaling Policy v3
A thought about the section here where you describe your more current model for the ‘race to the top’:
In general, I’ve been updating away from a model like:
Companies ‘race to the top’ (competing with each other on safety measures) on the text of their RSPs
-> the text of the RSPs makes them more likely to do good things, regardless of other things going on
…and toward a model like
Companies adopt RSPs
-> then some of them do good things that are in the spirit of the RSP, because they have truly bought-in employees
-> then all companies “race to the top” to match the visible aspects of those good things.
Is there any suggestion that frontier labs other than Anthropic are likely to adopt RSPs before some catastrophic bad thing happens, if it does? This premise is the basis for the new model you write out here. And if they do, what makes you optimistic that the same employees who are currently advancing capabilities as fast as they can at other labs decide to reprioritize safety to a greater extent than they do currently? Or that safety-focused employees wouldn’t be pushed out of certain other frontier LLM developers? It’s something we are seemingly see happen already at some places, not to name names.
I have trouble finding the ‘race to the top’ in terms of safety/alignment credible when it seems like far more forces, whether it be investors or current government stakeholders, seem to be encouraging a ‘move as fast as possible no matter what’ regime by their words and actions. We saw it recently with the Anthropic-Pentagon dispute, most other labs do not seem particularly committed to particular principles in the face of political pressure or economic forces.