Neel Nanda comments on Safety researchers should take a public stance

Neel Nanda 21 Sep 2025 14:17 UTC
15 points
−3
I guess speaking out publicly just seems like a weird distraction to me. Most safety people don’t have a public profile! None of their capabilities colleagues are tracking the fact that they have or have not expressed specific opinions publicly. Some do, but it doesn’t feel like you’re exclusively targeting them. And eg If someone is in company wide slack channels leaving comments about their true views, I think that’s highly visible and achieves the same benefits of talking honestly, with fewer risks.

I’m not concerned about someone being fired for this kind of thing, that would be pretty unwise on the labs’ part as you risk creating a martyr. Rather, I’m concerned about eg senior figures thinking worse of safety researchers as a whole because it causes a PR headache, eg viewing them as radical troublemakers, and this making theories of impact around influencing specific senior decision makers harder (and I’m more optimistic about those, personally)
- Ishual 21 Sep 2025 19:14 UTC
  4 points
  0
  Parent
  Rather, I’m concerned about eg senior figures thinking worse of safety researchers as a whole because it causes a PR headache, eg viewing them as radical troublemakers, and this making theories of impact around influencing specific senior decision makers harder (and I’m more optimistic about those, personally)
  Thank you Neel for stating this explicitly. I think this is very valuable information. This matches what some of my friends told me privately also. I would appreciate it a lot if you could give a rough estimate of your confidence that this would happen (ideally some probability/percentage). Additionally, I would appreciate if you could say whether you’d expect such a consequence to be legible/visible or illegible (once it had happened). Finally, are there legible reasons you could share for your estimated credence that this would happen?
  (to be clear: I am sad that you are operating under such conditions. I consider this evidence against expecting meaningful impact from the inside at your lab.)
  - Neel Nanda 21 Sep 2025 20:59 UTC
    2 points
    0
    Parent
    It’s not a binary event—I’m sure it’s already happened somewhat. OpenAI has had what, 3 different safety exoduses by now, and (what was perceived to be) an attempted coup? I’m sure leadership at other labs have noticed. But it’s a matter of degree.
    
    I also don’t think this should be particularly surprising—this is just how I expect decision makers at any organisation that cares about its image to behave, unless it’s highly unusual. Even if the company decides to loudly sound the alarm, they likely want to carefully choose the messaging and go through their official channels, not have employees maybe going rogue and ruining message discipline. (There are advantages to the grassroots vibe in certain situations though). To be clear, I’m not talking about “would take significant retaliation”, I’m talking about “would prefer that employees didn’t, even if it won’t actually stop them”
    - Ishual 22 Sep 2025 10:40 UTC
      1 point
      0
      Parent
      This sounds to me like there would actually be specific opportunities to express some of your true beliefs that you wouldn’t worry would cost you a lot (and some other opportunities where you would worry and not do them). Would you agree with that?
- Ishual 21 Sep 2025 19:44 UTC
  1 point
  0
  Parent
  (optional: my other comment is more important imo)
  I’m not concerned about someone being fired for this kind of thing, that would be pretty unwise on the labs’ part as you risk creating a martyr
  
  I think you ascribe too much competence/foresight/focus/care to the labs. I’d be willing to bet that multiple (safety?) people have been fired from labs in a way that would make the lab look pretty bad. Labs make tactical mistakes sometimes. Wasn’t there a thing at OpenAI for instance (lol)? Of course it is possible(/probable?) that they would not fire in a given case due to sufficient “wisdom”, but we should not assign an extreme likelihood to that.
  - Neel Nanda 21 Sep 2025 21:04 UTC
    2 points
    0
    Parent
    Yeah, agreed that companies sometimes do dumb things, and I think this is more likely at less bureaucratic and more top down places like OpenAI—I do think Leopold went pretty badly for them though, and they’ve hopefully updated. I’m partly less concerned because there’s a lot of upside if the company makes a big screw up like that.