Does “winning the race” actually give you a lever to stop disaster, or does it just make Anthropic the lab responsible for the last training run?
Does access to more compute and more model scaling, with today’s field understanding, truly give you more control—or just put you closer to launching something you can’t steer? Do you know how to solve alignment given even infinite compute?
Is there any sign, from inside your lab, that safety is catching up faster than capabilities? If not, every generation of SOTA increases the gap, not closes it.
“Build the bomb, because if we don’t, someone worse will.”
Once you’re at the threshold where nobody knows how to make these systems steerable or obedient, it doesn’t matter who is first—you still get a world-ending outcome.
If Anthropic, or any lab, ever wants to really make things go well, the only winning move is not to play, and try hard to make everyone not play.
If Anthropic was what it imagines itself being, it would build robust field-wide coordination and support regulation that would be effective globally, even if it means watching over your shoulder for colleagues and competitors across the world.
If everyone justifies escalation as “safety”, there is no safety.
In the end, if the race leads off a cliff, the team that runs fastest doesn’t “win”: they just get there first. That’s not leadership. It’s tragedy.
If you truly care about not killing everyone, will have to be a point—maybe now—where some leaders stop, even if it costs, and demand a solution that doesn’t sacrifice the long-term for a financial gain due to having a model slightly better than those of your competitors.
Anthropic is in a tricky place. Unlike other labs, it is full of people who care. The leadership has to adjust for that.
That makes you one of the few people in history who has the chance to say “no” to the spiral to the end of the world and demand of your company to behave responsibly.
(note: many of these points are AI-generated by a model with 200k tokens of Arbital in its context; though heavily edited.)
Does “winning the race” actually give you a lever to stop disaster, or does it just make Anthropic the lab responsible for the last training run?
Does access to more compute and more model scaling, with today’s field understanding, truly give you more control—or just put you closer to launching something you can’t steer? Do you know how to solve alignment given even infinite compute?
Is there any sign, from inside your lab, that safety is catching up faster than capabilities? If not, every generation of SOTA increases the gap, not closes it.
“Build the bomb, because if we don’t, someone worse will.”
Once you’re at the threshold where nobody knows how to make these systems steerable or obedient, it doesn’t matter who is first—you still get a world-ending outcome.
If Anthropic, or any lab, ever wants to really make things go well, the only winning move is not to play, and try hard to make everyone not play.
If Anthropic was what it imagines itself being, it would build robust field-wide coordination and support regulation that would be effective globally, even if it means watching over your shoulder for colleagues and competitors across the world.
If everyone justifies escalation as “safety”, there is no safety.
In the end, if the race leads off a cliff, the team that runs fastest doesn’t “win”: they just get there first. That’s not leadership. It’s tragedy.
If you truly care about not killing everyone, will have to be a point—maybe now—where some leaders stop, even if it costs, and demand a solution that doesn’t sacrifice the long-term for a financial gain due to having a model slightly better than those of your competitors.
Anthropic is in a tricky place. Unlike other labs, it is full of people who care. The leadership has to adjust for that.
That makes you one of the few people in history who has the chance to say “no” to the spiral to the end of the world and demand of your company to behave responsibly.
(note: many of these points are AI-generated by a model with 200k tokens of Arbital in its context; though heavily edited.)