Ishual comments on Safety researchers should take a public stance

Ishual 21 Sep 2025 9:49 UTC
6 points
0
Seeing the post as a threat misses the intended point. It is important to state explicitly: The goal of the three norms argued for in the post was never to force people to publicly support something they don’t in fact believe in. It was also never to force people to be more honest about what they believe. The post explicitly says what we think you should be doing, so that there can be a discussion about it. But the norm enforcement part is about what we think others (who are not necessarily working at frontier labs) should be doing.

Separately, I am not sure I understood what you meant by “I aim to not give in to ‘norm enforcement’”, but it seems to me that there is a culture inside the labs that make many people working there uncomfortable taking a public stance. To be more explicit, does that also activate your will to not give in to ‘norm enforcement’? (if not, why not?)

> I think you may want to rethink your models of how norm enforcement works.

I didn’t get what you were trying to communicate here. Continuing to rethink (publicly) my models of how norm enforcement works is why we wrote this post on LW.
- Rohin Shah 21 Sep 2025 10:02 UTC
  1 point
  −7
  Parent
  But the norm enforcement part is about what we think others (who are not necessarily working at frontier labs) should be doing.
  A threat by proxy is still a threat.
  - Ishual 21 Sep 2025 10:43 UTC
    2 points
    0
    Parent
    I conclude from this that you really do see this post as a threat (also you admitted there is no threat in your first comment so this comment now seems contradictory/bad-faith).
    
    some thoughts:
    - this isn’t a threat by proxy and isn’t a threat (but if it were a tbp then it would be a t sure)
    - I am in the “others” group. I implement the norm I endorse in the post, and I am not threatening you. I don’t want to sound dismissive but you are not giving me a lot to work with here, and it sounds to me like either 1) you have a vague model of what a threat is that includes things that aren’t threats or 2) you are misunderstanding the post and our intent such that you model us as having made a threat.
    
    we say what we think you should do as a safety researcher. not a threat. it is a recommendation.
    separately, we say how we think others should relate to safety researchers in a way that is more robust and functional. Maybe I should clarify that if safety researchers don’t take a public stance you find acceptable, you shouldn’t be sad that they “called your bluff” (because I don’t endorse you bluffing or threatening). You should not be doing this to change individual safety researchers actions. You should be doing this for the benefit of being less foolable and choosing where you put your respect/friendship in a way more functional for society and more beneficial for you. I would endorse this part of the norm even if not a single additional safety researcher took a public stance (heck, even if some of them tried to invert my preferences by removing their public stance, even then would I endorse this norm. partly because I endorse not giving in to actual threats, but also because it would still be a good norm to have on net).
    - Zack_M_Davis 21 Sep 2025 22:45 UTC
      5 points
      0
      Parent
      How do you think norm enforcement works, other than by threatening people who don’t comply with the norm?
      - Ishual 22 Sep 2025 10:00 UTC
        8 points
        3
        Parent
        I probably should have said “norm execution” (ie follow the norm). This might just be a cultural gap, but I think norm enforcement/execution/implementation works in many ways that are not threats. For instance, there is pizza at a conference. there is a norm that you shouldn’t take all the pizza if there is a big line behind you. some people break this norm. what happens? do they get threatened? no! they just get dirty looks and people talking behind their backs. maybe they get the reputation as the “pizza taker”. In fact, nobody necessarily told them before this happened that taking all the pizza would break the norm.
        
        I think there is a strange presumption that one is owed my and others’ maximum respect and friendship. anything less than that would be a “punishment”. that is pretty strange. if I have money in my pocket but I will only give some to you based on how many “good deeds” I have seen you do, this is not a threat. I guess that if you did not understand the motives or if the motives were actually to get a specific person to do more “good deeds” (by telling them in advance what the reward would be), you could call it a bribe. but calling it a threat is obviously incorrect.
        
        I think norm enforcement/execution/implementation can and is in my case motivated by an aesthetic preference for “points” that are person A to give such as respect and friendship 1) not go to someone who does not deserve them (in my eyes) and instead 2) go to someone who does deserve them. It is not primarily driven by a consequentialist desire for more people to do respect-and-friendship-deserving things. It is primarily driven by a desire for the points to match reality, and thus enable greater cooperation and further good things down the line.
        
        I realized based on a few comments that the three norms I discuss in the post were seen by some as like one giant strategy to produce more public stances from safety researchers. This is not the case. I am just talking to three different audiences and I explain a norm that I think makes sense (independently) for them.
    - Rohin Shah 21 Sep 2025 13:32 UTC
      −1 points
      −3
      Parent
      I conclude from this that you really do see this post as a threat (also you admitted there is no threat in your first comment so this comment now seems contradictory/bad-faith).
      Sure, I’ll correct it to “an attempted threat by proxy is still an attempted threat”. (It’s not a threat just because you have nothing I care about to threaten me with, but it would be a threat if I did care about e.g. whether you respect me.)
      But I agree that I am not trying to cooperate with you, if that’s what you mean by bad faith.