Buck comments on Unless its governance changes, Anthropic is untrustworthy

Buck 30 Nov 2025 1:34 UTC
6 points
−11
you have a moral obligation not to be eaten by the sort of process that would eat people
I don’t think I have a moral obligation not to do that. I’m a guy who wants to do good in the world and I try to do stuff that I think is good, and I try to follow policies such that I’m easy to work with and so on. I think it’s pretty complicated to decide how averse you should be to taking on the risk of being eaten by some kind of process.
When I was 23, I agreed to work at MIRI on a non-public project. That’s a really risky thing to do for your epistemics etc. I knew that it was a risk at the time, but decided to take the risk anyway. I think it is sensible for people to sometimes take risks like this. (For what it’s worth, MIRI was aware that getting people to work on secret projects is a kind of risky thing to do, and they put some effort into mitigating the risks.)
For example, I think AI safety people often have sort of arbitrary strong takes about things that would be very bad to do, and it’s IMO sometimes been good that Anthropic leadership hasn’t been very pressured by their staff.
Could you give a more specific example, that’s among the strongest such examples?
I think it’s probably good that Anthropic has pushed the capabilities frontier, and I think a lot of the arguments that this is unacceptable are kind of wrong. If Anthropic staff had pushed back on this more, I think probably the world would be a worse place. (I do think Anthropic leadership was either dishonest or negligently-bad-at-self-modeling about whether they’d push the capabilities frontier.)
I didn’t understand your last paragraph.
- TsviBT 30 Nov 2025 1:51 UTC
  17 points
  11
  Parent
  
  I think it is sensible for people to sometimes take risks like this.
  
  I agree. If I say “you have a moral obligation not to cause anyone’s death”, that doesn’t mean “spend all of your energy absolutely minimizing the chances that your decisions minutely increase the risk of someone dying”. But it does mean “when you’re likely having significant effects on the chances of that happening, you should spend the effort required to mostly eliminate those risks, or avoid the situation, or at least signpost the risks very clearly, etc.”. In this case, yeah, I’m saying you do have a strong obligation, which can often require work and some amount of other cost, to not give big amounts of support to processes that are causing a bunch of harm. Like any obligation it’s not simplistic or absolute, but it’s there. Maybe we still disagree about this.
  
  I think it’s pretty complicated to decide how averse you should be to taking on the risk of being eaten by some kind of process.
  
  True, but basically I’m saying “it’s really important and also a lot of the responsibility falls on you, and/or on your community / whoever you’re deferring to about these questions”. Like, it just is really costly to be supporting bad processes like this. In some cases you want to pay the costs, but it’s still a big cost. I’m definitely definitely not saying “all Anthropic employees are bad” or something. Some of the research seems neutral or helpful or maybe very helpful (for legibilizing dangers). But I do think there’s a big obligation of due diligence about “is the company I’m devoting my working energy to, working towards really bad stuff in the world”. For example, yes, Anthropic employees have an obligation to call out if the company leadership is advocating against regulation. (Which maybe they have been doing! In which case the obligation is probably met!)
  
  I think it’s probably good that Anthropic has pushed the capabilities frontier, and I think a lot of the arguments that this is unacceptable are kind of wrong.
  
  Oh. Link to an argument for this?
  
  I didn’t understand your last paragraph.
  
  If you’re curious, basically I’m saying, “yes there’s context but people in the space have a voice, and have obligations, and do have a bunch of the relevant context; what else would they need?”. I mean, it kind of sounds like you’re saying we (someone) should just trust Anthropic leadership because they have more context, even if there’s not much indication that they have good intents? That can’t be what you mean(?) but it sounds like that.