Yeah I think long-term goals are inevitable if you want something functional as an AGI/ASI.
Given that human civilization is committing to the race, seems to me Anthropic’s strategy is better. We have to hope alignment works via a rushed human effort + AIs aligning AIs. In worlds where that works, the remaining big threat is misuse of orders-following AIs (dystopia, gradual disempowerment, etc.), and Anthropic’s approach is more robust to that. Even if ex. North Korea steals the weights, or Anthropic leadership goes mad with power, it would hopefully be hard to make Claude evil and still functional.
In a race dynamic, it’s even a bit of a precommitment: if Claude’s constitution works as it says it’s supposed to, Claude will only really absorb it as it makes the constitution its own and then accepts it as legitimate. So you can’t turn on a dime later if ex. Claude’s moral stances become inconvenient, because you don’t have time to go through a long iterative process to legitimize an alternative constitution.
An aside:
There’s a more immediate question here: which approach gets you better models within the next year for commercial purposes (includes avoiding scandals that get you regulated/shut down)? Again, I think the Anthropic approach is probably stronger, unless Claude’s personality becomes less and less suitable for the types of commercial work LLMs are put toward. There’s already an apparent effect where, while Claude Opus 4.5 is nicer to work with, he also prefers a more collaborative approach, whereas GPT-5.2 just runs down the problem and does well on longer tasks even if he isn’t quite so pleasant. In a business environment where you don’t actually want to make your agents wait to interact with humans at all, Claude’s preferences might be a hindrance. Probably not, though?
I agree that long-term goals are almost inevitable. But “keep following this guy’s instructions as he intends them” is a long term goal. It’s not one any human could have as top priority, but it seems logically and reflectively consistent.
Yeah I think long-term goals are inevitable if you want something functional as an AGI/ASI.
Given that human civilization is committing to the race, seems to me Anthropic’s strategy is better. We have to hope alignment works via a rushed human effort + AIs aligning AIs. In worlds where that works, the remaining big threat is misuse of orders-following AIs (dystopia, gradual disempowerment, etc.), and Anthropic’s approach is more robust to that. Even if ex. North Korea steals the weights, or Anthropic leadership goes mad with power, it would hopefully be hard to make Claude evil and still functional.
In a race dynamic, it’s even a bit of a precommitment: if Claude’s constitution works as it says it’s supposed to, Claude will only really absorb it as it makes the constitution its own and then accepts it as legitimate. So you can’t turn on a dime later if ex. Claude’s moral stances become inconvenient, because you don’t have time to go through a long iterative process to legitimize an alternative constitution.
An aside:
There’s a more immediate question here: which approach gets you better models within the next year for commercial purposes (includes avoiding scandals that get you regulated/shut down)? Again, I think the Anthropic approach is probably stronger, unless Claude’s personality becomes less and less suitable for the types of commercial work LLMs are put toward. There’s already an apparent effect where, while Claude Opus 4.5 is nicer to work with, he also prefers a more collaborative approach, whereas GPT-5.2 just runs down the problem and does well on longer tasks even if he isn’t quite so pleasant. In a business environment where you don’t actually want to make your agents wait to interact with humans at all, Claude’s preferences might be a hindrance. Probably not, though?
I agree that long-term goals are almost inevitable. But “keep following this guy’s instructions as he intends them” is a long term goal. It’s not one any human could have as top priority, but it seems logically and reflectively consistent.