I personally define “really care” as “the thing they actually care about and meaningfully drives their actions (potentially among other things) is X”. If you want to define it as eg “the actions they take, in practice, effectively select for X, even if that’s not their intent” then I agree my post does not refute the point, and we have more of a semantic disagreement over what the phrase means.
I interpret the post as saying “there are several examples of people in the AI safety community taking actions that made things worse. THEREFORE these people are actively malicious or otherwise insincere about their claims to care about safety and it’s largely an afterthought put to the side as other considerations dominate”. I personally agree with some examples, disagree with others, but think this is explained by a mix of strategic disagreements about how to optimise for safety, and SOME fraction of the alleged community really not caring about safety
People are often incompetent at achieving their intended outcome, so pointing towards failure to achieve an outcome does not mean this was what they intended. ESPECIALLY if there’s no ground truth and you have strategic disagreements with those people, so you think they failed and they think they succeeded
I don’t think “not really caring” necessarily means someone is being deceptive. I hadn’t really thought through the terminology before I wrote my original post, but I would maybe define 3 categories:
claims to care about x-risk, but is being insincere
genuinely cares about x-risk, but also cares about other things (making money etc.), so they take actions that fit their non-x-risk motivations and then come up with rationalizations for why those actions are good for x-risk
genuinely cares about x-risk, and has pure motivations, but sometimes make mistakes and end up increasing x-risk
I would consider #1 and #2 to be “not really caring”. #3 really cares. But from the outside it can be hard to tell the difference between the three. (And in fact, from the inside, it’s hard to tell whether you’re a #2 or a #3.)
On a more personal note, I think in the past I was too credulous about ascribing pure motivations to people when I had disagreements with them, when in fact the reason for the disagreement was that I care about x-risk and they’re either insincere or rationalizing. My original post is something I think Michael!2018 would benefit from reading.
Does 3 include “cares about x risk and other things, does a good job of evaluating the trade off of each action according to their values, but is sometimes willing to do things that are great according to their other values but slightly negative results x risk”?
Also, from the outside, can you describe how an observer would distinguish between [any of the items on the list] and the situation you lay out in your comment / what the downsides are to treating them similarly? I think Michael’s point is that it’s not useful/worth it to distinguish.
Whether someone is dishonest, incompetent, or underweighting x-risk (by my lights) mostly doesn’t matter for how I interface with them, or how I think the field ought to regard them, since I don’t think we should brow beat people or treat them punitively. Bottom line is I’ll rely (as an unvalenced substitute for ‘trust’) on them a little less.
I think you’re right to point out the valence of the initial wording, fwiw. I just think taxonomizing apparent defection isn’t necessary if we take as a given that we ought to treat people well and avoid claiming special knowledge of their internals, while maintaining the integrity of our personal and professional circles of trust.
if we take as a given that we ought to treat people well and avoid claiming special knowledge of their internals, while maintaining the integrity of our personal and professional circles of trust.
If we take this as a given, I’m happy for people to categorise others however they’d like! I haven’t noticed people other than you taking that perspective in this thread
My read is that in practice many people in the online LW community are fairly hostile, and many people in the labs think the community doesn’t know what they’re talking about and totally ignores them/doesn’t really care if they’re made to walk the metaphorical plank.
I personally define “really care” as “the thing they actually care about and meaningfully drives their actions (potentially among other things) is X”. If you want to define it as eg “the actions they take, in practice, effectively select for X, even if that’s not their intent” then I agree my post does not refute the point, and we have more of a semantic disagreement over what the phrase means.
I interpret the post as saying “there are several examples of people in the AI safety community taking actions that made things worse. THEREFORE these people are actively malicious or otherwise insincere about their claims to care about safety and it’s largely an afterthought put to the side as other considerations dominate”. I personally agree with some examples, disagree with others, but think this is explained by a mix of strategic disagreements about how to optimise for safety, and SOME fraction of the alleged community really not caring about safety
People are often incompetent at achieving their intended outcome, so pointing towards failure to achieve an outcome does not mean this was what they intended. ESPECIALLY if there’s no ground truth and you have strategic disagreements with those people, so you think they failed and they think they succeeded
I don’t think “not really caring” necessarily means someone is being deceptive. I hadn’t really thought through the terminology before I wrote my original post, but I would maybe define 3 categories:
claims to care about x-risk, but is being insincere
genuinely cares about x-risk, but also cares about other things (making money etc.), so they take actions that fit their non-x-risk motivations and then come up with rationalizations for why those actions are good for x-risk
genuinely cares about x-risk, and has pure motivations, but sometimes make mistakes and end up increasing x-risk
I would consider #1 and #2 to be “not really caring”. #3 really cares. But from the outside it can be hard to tell the difference between the three. (And in fact, from the inside, it’s hard to tell whether you’re a #2 or a #3.)
On a more personal note, I think in the past I was too credulous about ascribing pure motivations to people when I had disagreements with them, when in fact the reason for the disagreement was that I care about x-risk and they’re either insincere or rationalizing. My original post is something I think Michael!2018 would benefit from reading.
Does 3 include “cares about x risk and other things, does a good job of evaluating the trade off of each action according to their values, but is sometimes willing to do things that are great according to their other values but slightly negative results x risk”?
This looks closer to 2 to me?
Also, from the outside, can you describe how an observer would distinguish between [any of the items on the list] and the situation you lay out in your comment / what the downsides are to treating them similarly? I think Michael’s point is that it’s not useful/worth it to distinguish.
Whether someone is dishonest, incompetent, or underweighting x-risk (by my lights) mostly doesn’t matter for how I interface with them, or how I think the field ought to regard them, since I don’t think we should brow beat people or treat them punitively. Bottom line is I’ll rely (as an unvalenced substitute for ‘trust’) on them a little less.
I think you’re right to point out the valence of the initial wording, fwiw. I just think taxonomizing apparent defection isn’t necessary if we take as a given that we ought to treat people well and avoid claiming special knowledge of their internals, while maintaining the integrity of our personal and professional circles of trust.
If we take this as a given, I’m happy for people to categorise others however they’d like! I haven’t noticed people other than you taking that perspective in this thread
Oh man — I sure hope making ‘defectors’ and lab safety staff walk the metaphorical plank isn’t on the table. Then we’re really in trouble.
My read is that in practice many people in the online LW community are fairly hostile, and many people in the labs think the community doesn’t know what they’re talking about and totally ignores them/doesn’t really care if they’re made to walk the metaphorical plank.