David Scott Krueger (formerly: capybaralet) 21 Mar 2026 5:09 UTC
1 point
4
in reply to: Zac Hatfield-Dodds’s comment on: Anthropic’s leading researchers acted as moderate accelerationists
Thanks for sharing your thoughts.

So your condition is “Severe or willful violation of our RSP, or misleading the public about it”.

My guess is that most people understood the RSP, or at least the part about not releasing dangerous systems, as a COMMITMENT in the sense of “we won’t do this” not a commitment in the sense of “we won’t do this… unless we publicly change our mind first”. I do think it’s hard to get good data on this, but I wonder if you disagree with my guess? It seems like there was at least substantial confusion around this point within the AI safety community (who I’d consider part of “the public”), confusion which mostly could’ve been easily remedied by Anthropic—the failure to do so seems like at least “letting a significant fraction of the public be misled”, which I think counts as “misleading the public”.

Unless, or course, the RSP ought to have been interpretted as a COMMITMENT all along, in which case, this update seems like a violation of an implicit “meta-commitment” to honor the COMMITMENT in perpituity.

If you agree with the thrust of my argument, it seems like you’d have to either 1) agree that your condition is met or 2) argue that it was clear to the public that the commitment was not a COMMITMENT, or 3) argue that there is no such implicit meta-commitment.

I’d appreciate if you would clarify where exactly our disagreement lies.

David Scott Krueger (formerly: capybaralet) 16 Mar 2026 15:03 UTC
LW: 2 AF: 1
0
AF
on: What can be learned from scary demos? A snitching case study
- What happens if you merge the bash and the audit tool, just giving the AI a single bash tool from which it can
fragment?

David Scott Krueger (formerly: capybaralet) 16 Mar 2026 14:59 UTC
LW: 2 AF: 1
0
AF
on: What can be learned from scary demos? A snitching case study
For now, such evidence is not really relevant to takeover risk because models are weak and can’t execute on complex world domination plans, but I can imagine such arguments becoming more directly relevant in the future.
Maybe a nit RE phrasing, but the reasoning here doesn’t make sense. It’s relevant to takeover risk even if the model is known to be weak

David Scott Krueger (formerly: capybaralet) 12 Mar 2026 21:33 UTC
2 points
0
in reply to: Mo Putera’s comment on: What do we know about AI company employee giving?
Thanks for the pointers! I think there should probably be more, but I’m glad to know there’s more than I was aware of.

What do we know about AI company employee giving?

David Scott Krueger (formerly: capybaralet)10 Mar 2026 23:30 UTC

39 points

5 comments2 min readLW link

David Scott Krueger (formerly: capybaralet) 10 Mar 2026 1:02 UTC
15 points
1
on: Can you donate to AI advocacy?
My new organization https://evitable.com/ is fund-raising. I’m a long-time AI safety researcher and AI professor and initiated the one-sentence Statement on AI Risk.

Evitable’s mission is to inform and organize the public to confront societal-scale risks of AI, and put an end to the reckless race to develop superintelligence.

Our vision is that in ten years time, people will look back at the current race to build superintelligence as unthinkably terrible and wrongheaded, similar to how people view things like slavery.

You can donate at https://www.every.org/evitable or https://manifund.org/projects/evitable-a-new-public-facing-ai-risk-nonprofit-a1ll15pvkcb.