Neel Nanda comments on MichaelDickens’s Shortform

Neel Nanda 25 Apr 2025 8:40 UTC
4 points
2
I personally define “really care” as “the thing they actually care about and meaningfully drives their actions (potentially among other things) is X”. If you want to define it as eg “the actions they take, in practice, effectively select for X, even if that’s not their intent” then I agree my post does not refute the point, and we have more of a semantic disagreement over what the phrase means.

I interpret the post as saying “there are several examples of people in the AI safety community taking actions that made things worse. THEREFORE these people are actively malicious or otherwise insincere about their claims to care about safety and it’s largely an afterthought put to the side as other considerations dominate”. I personally agree with some examples, disagree with others, but think this is explained by a mix of strategic disagreements about how to optimise for safety, and SOME fraction of the alleged community really not caring about safety

People are often incompetent at achieving their intended outcome, so pointing towards failure to achieve an outcome does not mean this was what they intended. ESPECIALLY if there’s no ground truth and you have strategic disagreements with those people, so you think they failed and they think they succeeded
- MichaelDickens 25 Apr 2025 16:03 UTC
  5 points
  2
  Parent
  I don’t think “not really caring” necessarily means someone is being deceptive. I hadn’t really thought through the terminology before I wrote my original post, but I would maybe define 3 categories:
  1. claims to care about x-risk, but is being insincere
  2. genuinely cares about x-risk, but also cares about other things (making money etc.), so they take actions that fit their non-x-risk motivations and then come up with rationalizations for why those actions are good for x-risk
  3. genuinely cares about x-risk, and has pure motivations, but sometimes make mistakes and end up increasing x-risk
  I would consider #1 and #2 to be “not really caring”. #3 really cares. But from the outside it can be hard to tell the difference between the three. (And in fact, from the inside, it’s hard to tell whether you’re a #2 or a #3.)
  
  On a more personal note, I think in the past I was too credulous about ascribing pure motivations to people when I had disagreements with them, when in fact the reason for the disagreement was that I care about x-risk and they’re either insincere or rationalizing. My original post is something I think Michael!2018 would benefit from reading.
  - Neel Nanda 26 Apr 2025 7:20 UTC
    2 points
    0
    Parent
    Does 3 include “cares about x risk and other things, does a good job of evaluating the trade off of each action according to their values, but is sometimes willing to do things that are great according to their other values but slightly negative results x risk”?
    - yams 28 Apr 2025 17:42 UTC
      1 point
      0
      Parent
      This looks closer to 2 to me?
      Also, from the outside, can you describe how an observer would distinguish between [any of the items on the list] and the situation you lay out in your comment / what the downsides are to treating them similarly? I think Michael’s point is that it’s not useful/worth it to distinguish.
      Whether someone is dishonest, incompetent, or underweighting x-risk (by my lights) mostly doesn’t matter for how I interface with them, or how I think the field ought to regard them, since I don’t think we should brow beat people or treat them punitively. Bottom line is I’ll rely (as an unvalenced substitute for ‘trust’) on them a little less.
      I think you’re right to point out the valence of the initial wording, fwiw. I just think taxonomizing apparent defection isn’t necessary if we take as a given that we ought to treat people well and avoid claiming special knowledge of their internals, while maintaining the integrity of our personal and professional circles of trust.
      - Neel Nanda 28 Apr 2025 19:00 UTC
        2 points
        0
        Parent
        
        if we take as a given that we ought to treat people well and avoid claiming special knowledge of their internals, while maintaining the integrity of our personal and professional circles of trust.
        
        If we take this as a given, I’m happy for people to categorise others however they’d like! I haven’t noticed people other than you taking that perspective in this thread
        yams 29 Apr 2025 0:31 UTC
        1 point
        0
        Parent
        Oh man — I sure hope making ‘defectors’ and lab safety staff walk the metaphorical plank isn’t on the table. Then we’re really in trouble.
        Neel Nanda 29 Apr 2025 8:37 UTC
        4 points
        1
        Parent
        My read is that in practice many people in the online LW community are fairly hostile, and many people in the labs think the community doesn’t know what they’re talking about and totally ignores them/doesn’t really care if they’re made to walk the metaphorical plank.