Absolutely love this trend of all sorts of different people telling the US Government, ‘do the thing you’re threatening’.
Some people here @ryan_greenblatt (?) have claimed access to uncensored anthropic models at anthropic, maybe they can chime in about what exactly that means. If safeguards are baked into the model early enough in the training process that removing them requires substantial reengineering and maybe additional training runs, then maybe anthropic simply isn’t capable of complying.
If safeguards are introduced post training, anthropic has an uncensored model available. Assuming there isn’t another vendor, or that they want to make a super public demonstration of their power, the US Government has a ton of methods to compel compliance. They can demand it under the DPA, or plausibly use that designation as justification for applying countermeasures to the company. They may even just steal it outright...or if someone else steals the base model, steal it from them, relabel it, and deploy it internally.
There’s always the possibility that this is just a face saving move, where anthropic has a tiny team in the company and a classified contract, where they just hand over the uncensored model and get to keep their public face intact. Under normal circumstances, I’d assume that DoW would just go with a different vendor, but who knows with this administration, they seem really averse to being publicly contradicted.
DoW may also believe that they were being conciliatory and kind by allowing anthropic to add ‘any lawful use’ safeguards instead of just demanding the base model without post training safeguards.
Thanks for replying, I think I was thinking of this post: https://www.lesswrong.com/posts/FG54euEAesRkSZuJN/ryan_greenblatt-s-shortform?commentId=B6oDGoyphuNuzdDAT I didn’t understand the distinction between employee and helpful only access. Can you elaborate on the difference between employee access, public access, and helpful only access?