Adrià Garriga-alonso comments on Unless its governance changes, Anthropic is untrustworthy

Adrià Garriga-alonso 30 Nov 2025 18:05 UTC
5 points
−5
I very strongly agree with this and think it should be the top objection people first see when scrolling down. In a low-P(doom) world, Anthropic has done lots of good. (They proved that you can have the best and the most aligned model, and also the leadership is more trustworthy than OpenAI, who would otherwise lead). This is my current view.

In a high-P(doom) world, none of that matters because they’ve raised the bar for capabilities when we really should be pausing AI. This was my previous view.

I’m grudgingly impressed with Anthropic leadership for getting this right when I did not (not that anyone other than me cares what I believed, having ~zero power).
- Bronson Schoen 1 Dec 2025 5:04 UTC
  56 points
  47
  Parent
  I’m confused about much of the discussion on this post being about whether Anthropic has done “net good”.
  
  The post is very specifically a deep dive into the fact that Anthropic, like any other company, should not have their leadership’s statements taken at face value. IMO this is a completely unrealistic way to treat companies in any field, and it’s a bit frustrating to see the rationalist presumption of good faith extended over and over by default in contexts where it’s so incredibly exploitable.
  
  Again this is not a specific criticism of Anthropic, if a new lab starts tomorrow promising to build Safe SuperIntelligence for example, we should not assume that we can trust all their leadership’s statements until they’ve mislead people publicly a few times and someone has a deep dive comprehensively documenting it.