Zack_M_Davis comments on [Meta] New moderation tools and moderation guidelines

Zack_M_Davis 5 Jul 2025 23:36 UTC
1 point
−15

and there should not reliably be retribution or counter-punishment by other commenters for them moderating in that way.

Great, so all you need to do is make a rule specifying what speech constitutes “retribution” or “counterpunishment” that you want to censor on those grounds.

Maybe the rule could be something like, “No complaining about being banned by a specific user (but commenting on your own shortform strictly about the substance of a post that you’ve been banned from does not itself constitute complaining about the ban)” or “No arguing against the existence on the user ban feature except in designated moderation threads (which get algorithmically deprioritized in the new Feed).”

It’s your website! You have all the hard power! You can use the hard power to make the rules you want, and then the users of the website have a clear choice to either obey the rules or be banned from the site. Fine.

What I find hard to understand is why the mod team seems to think it’s good for them to try to shape culture by means other than clear and explicit rules that could be neutrally enforced. Telling people to “stop optimizing in a fairly deep way” is not a rule because of how vague and potentially all-encompassing it is. Telling people to avoid “mak[ing] people feel judged or not” is not a rule because I don’t have control over how other people feel.

“Don’t tell people ‘I’m judging you about X’” is a rule. I can do that.

What I can’t do is convincingly pretend to be a person with a completely different personality such that people who are smart about subtext can’t even guess from subtle details of my writing style that I might privately be judging them.

I mean, maybe I could if I tried very hard? But I have too much self-respect to try. If the mod team wants to force temperamentally judgemental people to convincingly pretend to be non-judgemental, that seems really crazy.

I know, the mods didn’t say “We want temperamentally judgemental people to convincingly pretend to have a completely different personality” in those words; rather, Habryka said he wanted to “avoid a passive aggressive culture tak[ing] hold”. I just don’t see what the difference is supposed to be in practice.
- Ben Pace 6 Jul 2025 0:45 UTC
  16 points
  2
  Parent
  Mm, I think sometimes I’d rather judge on the standard of whether the outcome is good, rather than exclusively on the rules of behavior.
  A key question is: Are authors comfortable using the mod tools the site gives them to garden their posts?
  You can write lots of judgmental comments criticizing an author’s posts, and then they can ban you from their comments because they find engaging with you to be exhausting, and then you can make a shortform where you and your friends call them a coward, and then they stop using the mod tools (and other authors do too) out of a fear that using the mod tools will result in a group of people getting together to bully and call them names in front of the author’s peers. That’s a situation where authors become uncomfortable using their mod tools. But I don’t know precisely what comment was wrong and what was wrong with it such that had it not happened the outcome would counterfactually not have obtained i.e. that you wouldn’t have found some other way to make the author uncomfortable using his mod tools (though we could probably all agree on some schelling lines).
  Also I am hesitant to fully outlaw behavior that might sometimes be appropriate. Perhaps there are some situations where it’s appropriate to criticize someone on your shortform after they banned you. Or perhaps sometimes you should call someone a coward for not engaging with your criticism.
  Overall I believe sometimes I will have to look at the outcome and see whether the gain in this situation was worth the cost, and directly give positive/negative feedback based on that.
  Related to other things you wrote, FWIW I think you have a personality that many people would find uncomfortable interacting with a lot. In-person I regularly read you as being deeply pained and barely able to contain strongly emotional and hostile outbursts. I think just trying to ‘follow the rules’ might not succeed at making everyone feel comfortable interacting with you, even via text, if they feel a deep hostility from you to them that is struggling to contain itself with rules like “no explicit insults”, and sometimes the right choice for them will just be to not engage with you directly. So I think it is a hypothesis worth engaging with that you should work to change your personality somewhat.
  To be clear I think (as Said has said) that it is worth people learning to be able to make space to engage with people like you who they find uncomfortable, because you raise many good ideas and points (and engaging with you is something I relatively happily do, and this is a way I have grown stronger relative to myself of 10 years ago), and I hope you find more success as I respect many of your contributions, but I think a great many people who have good points to contribute don’t have as much capacity as me to do this, and you will sometimes have to take some responsibility for navigating this.
  - Zack_M_Davis 8 Jul 2025 5:18 UTC
    7 points
    5
    Parent
    
    I’d rather judge on the standard of whether the outcome is good, rather than exclusively on the rules of behavior.
    
    A key reason to favor behavioral rules over trying to directly optimize outcomes (even granting that enforcement can’t be completely mechanized and there will always be some nonzero element of human judgement) is that act consequentialism doesn’t interact well with game theory, particularly when one of the consequences involved is people’s feelings.
    
    If the popular kids in the cool kids’ club don’t like Goldstein and your only goal is to make sure that the popular kids feel comfortable, then clearly your optimal policy is to kick Goldstein out of the club. But if you have some other goal that you’re trying to pursue with the club that the popular kids and Goldstein both have a stake in, then I think you do have to try to evaluate whether Goldstein “did anything wrong”, rather than just checking that everyone feels comfortable. Just ensuring that everyone feels comfortable at all costs, without regard to the reasons why people feel uncomfortable or any notion that some reasons aren’t legitimate grounds for intervention, amounts to relinquishing all control to anyone who feels uncomfortable when someone else doesn’t behave exactly how they want.
    
    Something I appreciate about the existing user ban functionality is that it is a rule-based mechanism. I have been persuaded by Achmiz and Dai’s arguments that it’s bad for our collective understanding that user bans prevent criticism, but at least it’s a procedurally “fair” kind of badness that I can tolerate, not completely arbitrary tyranny. The impartiality really helps. Do you really want to throw away that scrap of legitimacy in the name of optimizing outcomes even harder? Why?
    
    I think just trying to ‘follow the rules’ might not succeed at making everyone feel comfortable interacting with you
    
    But I’m not trying to make everyone feel comfortable interacting with me. I’m trying to achieve shared maps that reflect the territory.
    
    A big part of the reason some of my recent comments in this thread appeal to an inability or justified disinclination to convincingly pretend to not be judgmental is because your boss seems to disregard with prejudice Achmiz’s denials that his comments are “intended to make people feel judged”. In response to that, I’m “biting the bullet”: saying, okay, let’s grant that a commenter is judging someone; to what lengths must they go to conceal that, in order to prevent others from predictably feeling judged, given that people aren’t idiots and can read subtext?
    
    I think there’s something much more fundamental at stake here, which is that an intellectual forum that’s being held hostage to people’s feelings is intrinsically hampered and can’t be at the forefront of advancing the art of human rationality. If my post claims X, and a commenter says, “No, that’s wrong, actually not-X because Y”, it would be a non-sequitur for me to reply, “I’d prefer you engage with what I wrote with more curiosity and kindness.” Curiosity and kindness are just not logically relevant to the claim! (If I think the commenter has misconstrued what I wrote, I could just say that.) It needs to be possible to discuss ideas without getting tone-policed to death. Once you start playing this game of litigating feelings and feelings about other people’s feelings, there’s no end to it. The only stable Schelling point that doesn’t immediately dissolve into endless total war is to have rules and for everyone to take responsibility for their own feelings within the rules.
    
    I don’t think this is an unrealistic superhumanly high standard. As you’ve noticed, I am myself a pretty emotional person and tend to wear my heart on my sleeve. There are definitely times as recently as, um, yesterday, when I procrastinate checking this website because I’m scared that someone will have said something that will make me upset. In that sense, I think I do have some empathy for people who say that bad comments make them less likely to use the website. It’s just that, ultimately, I think that my sensitivity and vulnerability is my problem. Censoring voices that other people are interested in hearing would be making it everyone else’s problem.
    - Jiro 18 Jul 2025 6:09 UTC
      9 points
      −3
      Parent
      
      I think there’s something much more fundamental at stake here, which is that an intellectual forum that’s being held hostage to people’s feelings is intrinsically hampered and can’t be at the forefront of advancing the art of human rationality.
      
      An intellectual forum that is not being “held hostage” to people’s feelings will instead be overrun by hostile actors who either are in it just to hurt people’s feelings, or who want to win through hurting people’s feelings.
      
      It’s just that, ultimately, I think that my sensitivity and vulnerability is my problem.
      
      Some sensitivity is your problem. Some sensitivity is the “problem” of being human and not reacting like Spock. It is unreasonable to treat all sensitivity as being the problem of the sensitive person.
  - Elizabeth 6 Jul 2025 1:08 UTC
    5 points
    0
    Parent
    Mm, I think sometimes I’d rather judge on the standard of whether the outcome is good, rather than exclusively on the rules of behavior.
    This made my blood go cold, despite thinking it would be good if Said left LessWrong.
    My first thoughts when I read “judge on the standard of whether the outcome is good” is that this lets you cherrypick your favorite outcomes without justifying them. My second is that it knowing if something is good can be very complicated even after the fact, so predicting it ahead of time is challenging even if you are perfectly neutral.
    I think it’s good LessWrong(’s admins) allows authors to moderate their own posts (and I’ve used that to ban Said from my own posts). I think it’s good LessWrong mostly doesn’t allow explicit insults (and wish this was applied more strongly). I think it’s good LessWrong evaluates commenting patterns, not just individual comments. But “nothing that makes authors feel bad about bans” is way too far.
    - habryka 6 Jul 2025 1:30 UTC
      6 points
      8
      Parent
      It’s extremely common for all judicial systems to rely on outcome assessments instead of process assessments! In many domains this is obviously the right standard! It is very common to create environments where someone can sue for damages and not just have the judgement be dependent on negligence (and both thresholds are indeed commonly relevant for almost any civil case).
      Like sure, it comes with various issues, but it seems obviously wrong to me to request that no part of the LessWrong moderation process relies on outcome assessments.
    - Ben Pace 6 Jul 2025 1:48 UTC
      3 points
      0
      Parent
      Okay. But I nonetheless believe it’s necessary that we have to judge communication sometimes by outcomes rather than by process.
      Like, as a lower stakes examples, sometimes you try to teasingly make a joke at your friend’s expense, but they just find it mean, and you take responsibility for that and apologize. Just because you thought you were behaving right and communicating well doesn’t mean you were, and sometimes you accept feedback from others that says you misjudged a situation. I don’t have all the rules written down such that if you follow them your friend will read your comments as intended, sometimes I just have to check.
      Similarly sometimes you try to criticize an author, but they take it as implying you’ll push back whenever they enforce boundaries on LessWrong, and then you apologize and clarify that you do respect them enforcing boundaries in general but stand by the local criticism. (Or you don’t and then site-mods step in.) I don’t have all the rules written down such that if you follow them the author will read your comments as intended, sometimes I just have to check.
      Obviously mod powers can be abused, and having to determine on a case by case basis is a power that can be abused. Obviously it involves judgment calls. I did not disclaim this, I’m happy for anyone to point it out, perhaps nobody has mentioned it so far in this thread so it’s worth making sure the consideration is mentioned. And yeah, if you’re asking, I don’t endorse “nothing that makes authors feel bad about bans”, and there are definitely situations where I think it would be appropriate for us to reverse someone’s bans (e.g. if someone banned all of the top 20 authors in the LW review, I would probably think this is just not workable on LW and reverse that).
      - Elizabeth 6 Jul 2025 4:17 UTC
        2 points
        0
        Parent
        Sure, but “is my friend upset” is very different than “is the sum total of all the positive and negative effects of this, from first order until infinite order, positive”
        Ben Pace 6 Jul 2025 4:42 UTC
        2 points
        0
        Parent
        I don’t really know what we’re talking about right now.
  - habryka 6 Jul 2025 1:34 UTC
    2 points
    0
    Parent
    Said, you reacted to this:
    In-person I regularly read you as being deeply pained and barely able to contain strongly emotional and hostile outbursts.
    with “Disagree”.
    I have no idea how you could remotely know whether this is true, as I think you have never interacted with either Ben or Zack in person!
    Also, it’s really extremely obviously true. Indeed, Zack frequently has the corresponding emotional and hostile outbursts, so it’s really extremely evident they are barely contained during a lot of it (since sometimes they do not end up contained, and then Zack apologizes for containing them and explains that this is difficult for him).
    What links here?
    Said Achmiz's comment on Said Achmiz’s Shortform by Said Achmiz (6 Jul 2025 1:49 UTC; 7 points)
  - Said Achmiz 6 Jul 2025 1:00 UTC
    −7 points
    −13
    Parent
    
    You can write lots of judgmental comments criticizing an author’s posts, and then they can ban you from their comments because they find engaging with you to be exhausting, and then you can make a shortform where you and your friends call them a coward, and then they stop using the mod tools (and other authors do too) out of a fear that using the mod tools will result in a group of people getting together to bully and call them names in front of the author’s peers. That’s a situation where authors become uncomfortable using their mod tools.
    
    Here’s what confuses me about this stance: do an author’s posts on Less Wrong (especially non-frontpage posts) constitute “the author’s private space”, or do they constitute “public space”?
    
    If the former, then the idea that things that Alice writes about Bob on her shortform (or in non-frontpage posts) can constitute “bullying”, or are taking place “in front of” third parties (who aren’t making the deliberate choice to go to Alice’s private space), is nonsense.
    
    If the latter, then the idea that authors should have the right to moderate discussions that are happening in a public space is clearly inappropriate.
    
    I understood the LW mods’ position to be the former—that an author’s posts are their own private space, within the LW ecosystem (which is why it makes sense to let them set their own separate moderation policy there). But then I can’t make any sense of this notion of “bullying”, as applied to comments written on an author’s shortform (or non-frontpage posts).
    
    It seems to me that these two ideas are incompatible.
- habryka 6 Jul 2025 1:26 UTC
  5 points
  2
  Parent
  What I find hard to understand is why the mod team seems to think it’s good for them to try to shape culture by means other than clear and explicit rules that could be neutrally enforced.
  No judicial system in the world has ever arrived at the ability to have “neutrally enforced rules”, at least the way I interpret you to mean this. Case law is the standard in almost every legal tradition, and the US legal system relies heavily on things like “jury of your peers” type stuff to make judgements.
  Intent frequently matters in legal decision. Cognitive state of mind matters for legal decisions. Judges go through years of training and are part of a long lineage of people who have built up various heuristics and principles about how to judge cases. Individual courts have their own culture and track record.
  And that is for the US legal system, which is absolutely not capable of operating remotely to the kind of standard that allows people to curate social spaces or deal with tricky kinds of social rulings. No company could make cultural or hiring or business decisions based on the standard of the US legal system. Neither could any internet forum.
  There is absolutely no chance we will ever be able to encodify LessWrong rules of conduct into a set of specific rules that can be neutrally judged by a third party. Zero chance. Give up. If that is something you need here, leave now. Feel free to try to build it for yourself.