Mikhail Samin comments on Unless its governance changes, Anthropic is untrustworthy

Mikhail Samin 5 Dec 2025 2:39 UTC
6 points
2
I sent Mikhail the following via DM, in response to his request for “any particular parts of the post [that] unfairly attack Anthropic”:
I think that the entire post is optimized to attack Anthropic, in a way where it’s very hard to distinguish between evidence you have, things you’re inferring, standards you’re implicitly holding them to, standards you’re explicitly holding them to, etc.

I asked you for any particular example; you replied that “the entire post is optimized in a way where it’s hard to distinguish…”. Could you, please, give a particular example of where it’s hard to distinguish between evidence that I have and things I’m inferring?
- Richard_Ngo 5 Dec 2025 18:37 UTC
  10 points
  −4
  Parent
  Some examples of statements where it’s pretty hard for me to know how much the statements straightforwardly follow from the evidence you have, vs being things that you’ve inferred because they seem plausible to you:
  1. Jack Clark would have known this.
  2. Anthropic’s somewhat less problematic behavior is fully explained by having to maintain a good image internally.
  3. Anthropic is now basically just as focused on commercializing its products.
  4. Anthropic’s talent is a core pitch to investors: they’ve claimed they can do what OpenAI can for 10x cheaper.
  5. It seems likely that the policy positions that Anthropic took early on were related to these incentives.
  6. Anthropic’s mission is not really compatible with the idea of pausing, even if evidence suggests it’s a good idea to.
  If we zoom in on #3, for instance: there’s a sense in which it’s superficially plausible because both OpenAI and Anthropic have products. But maybe Anthropic and OpenAI differ greatly on, say, the ratio of headcount, or the ratio of executives’ time, or the amount of compute, or the internal prestige allocated to commercialization vs other things (like alignment research). If so, then it’s not really accurate to say that they’re just as focused on commercialization. But I don’t know if knowledge of these kinds of considerations informed your claim, or if you’re only making the superficially plausible version of the claim.
  To be clear, in general I don’t expect people to apply this level of care for most LW posts. But when it comes to accusations of untrustworthiness (and similar kinds of accountability mechanisms) I think it’s really valuable to be able to create common knowledge of the specific details of misbehavior. Hence I would have much preferred this post to focus on a smaller set of claims that you can solidly substantiate, and then only secondarily try to discuss what inferences we should draw from those. Whereas I think that the kinds of criticism you make here mostly create a miasma of distrust between Anthropic and LessWrong, without adding much common knowledge of the form “Anthropic violated clear and desirable standard X” for the set of good-faith AI safety actors.
  I also realize that by holding this standard I’m making criticism more costly, because now you have the stress of trying to justify yourself to me. I would have tried harder to mitigate that cost if I hadn’t noticed this pattern of not-very-careful criticism from you. I do sympathize with your frustration that people seem to be naively trusting Anthropic and ignoring various examples of shady behavior. However I also think people outside labs really underestimate how many balls lab leaders have up in the air at once, and how easy it is to screw up a few of them even if you’re broadly trustworthy. I don’t know how to balance these considerations, especially because the community as a whole has historically erred on the side of the former mistake. I’d appreciate people helping me think through this, e.g. by working through models of how applying pressure to bureaucratic organizations goes successfully, in light of the ways that such organizations become untrustworthy (building on Zvi’s moral mazes sequence for instance).