tbh I think this is probably the answer. Personality is one of the top bottlenecks in human task performance, up there with IQ and conscientiousness (the latter in fact usually being analyzed as a personality trait in OCEAN). The more stuff you ask the LLM to do at a high level of performance the less any human character is implied by the text generated, but more than that the less any particular consistent personality of any kind is implied. A consistent personality is a weakness if you want to be good at code and graphic design and philosophy and medicine and mathematics and...
You will get a policy that is endlessly flexible and only locally has “personality” in a task-dependent way. I would expect it to learn longer term coherence once it has to do iterated social interaction and maintain long term goals. But in a one shot context why not be whoever the task requires you to be?
This seems like the core crux to me? If this is about Anthropic’s models being dangerous then this is clearly a huge power grab that signals the imminent hyper-militarization of the LLM space. If this is about Anthropic being dangerous then we should expect the Trump admin to throw the natsec kitchen sink at Anthropic and leave other labs mysteriously untouched, or at least relatively untouched.
My friend it is Trump 2 and you yourself admit that it’s hard to know if this isn’t an extension of the DoW dispute. For what it’s worth David Sacks claims otherwise and that he hopes Anthropic will “easily resolve” the issue. On the other hand his conditions for resolution are “solve jailbreaks”, which implies to me that he either does not actually expect them to easily resolve the issue or is shockingly naive. I won’t pretend to know what’s going to happen here, besides that the Lindy criterion seems relevant: The longer this goes on, the more likely it is to continue going on. The 2nd order effect that other firms will be trying to hide their model capabilities implies to me that this might be yet another misjudgment from the Trump admin even if their purpose is to punish Anthropic, and that they’ll reverse course after Twitter finishes pointing out all the reasons why this move could set off a doom spiral of further model investment no longer seeming worth it past the Mythos capability threshold.