Ben Pace comments on [Meta] New moderation tools and moderation guidelines

Ben Pace 17 Jun 2025 18:39 UTC
13 points
8
As an aside, I think one UI preference I suspect Habryka has more strongly than Wei Dai does here is that the UI look the same to all users. For similar reasons why WYSIWYG is helpful for editing, when it comes to muting/threading/etc it’s helpful for ppl to all be looking at the same page so they can easily model what others are seeing. Having some ppl see a user’s comments but the author not, or key commenters not, is quite costly for social transparency, and understanding social dynamics.
- Wei Dai 17 Jun 2025 22:19 UTC
  8 points
  2
  Parent
  My proposal was meant to address the requirement that some authors apparently have to avoid interacting with certain commenters. All proposals dealing with this imply multiple conversations and people having to model different states of knowledge in others, unless those commenters are just silenced altogether, so I’m confused why it’s more confusing to have multiple conversations happening in the same place when those conversations are marked clearly.
  
  It seems to me like the main difference is that Habryka just trusts authors to “garden their spaces” more than I do, and wants to actively encourage this, whereas I’m reluctantly trying to accommodate such authors. I’m not sure what’s driving this difference though. People on Habryka’s side (so far only he has spoken up, but there’s clearly more given voting patterns) seem very reluctant to directly address the concern that people like me have that even great authors are human and likely biased quite strongly when it comes to evaluating strong criticism, unless they’ve done so somewhere I haven’t seen.
  
  Maybe it just comes down to differing intuitions and there’s not much to say? There’s some evidence available though, like Said’s highly upvoted comment nevertheless triggering a desire to ban Said. Has Habryka seen more positive evidence that I haven’t?
  - habryka 17 Jun 2025 22:42 UTC
    5 points
    2
    Parent
    All proposals dealing with this imply multiple conversations and people having to model different states of knowledge in others, unless those commenters are just silenced altogether
    No, what are you talking about? The current situation, where people can make new top level posts, which get shown below the post itself via the pingback system, does not involve any asymmetric states of knowledge?
    Indeed, there are lots of ways to achieve this without requiring asymmetric states of knowledge. Having the two comment sections, with one marked as “off-topic” or something like that also doesn’t require any asymmetric states of knowledge.
    seem very reluctant to directly address the concern that people like me have that even great authors are human and likely biased quite strongly
    Unmoderated discussion spaces are not generally better than moderated discussion spaces, including on the groupthink dimension! There is no great utopia of discourse that can be achieved simply by withholding moderation tools from people. Bandwidth is limited and cultural coordination is hard and this means that there are harsh tradeoffs to be made about which ideas and perspectives will end up presented.
    I am not hesitant to address the claim directly, it is just the case that on LessWrong, practically no banning of anyone ever takes place who wouldn’t also end up being post-banned by the moderators and so de-facto this effect just doesn’t seem real. Yes, maybe there are chilling effects that don’t produce observable effects, which is always important to think about with this kind of stuff, but I don’t currently buy it.
    The default thing that happens is when you leave a place unmoderated is just that the conversation gets dominated by whoever has the most time and stamina and social resilience, and the overall resulting diversity of perspectives trends to zero. Post authors are one obvious group to moderate spaces, especially with supervision from site moderators.
    There are lots of reasonable things to try here, but a random blanket “I don’t trust post authors to moderate” is simply making an implicit statement that unmoderated spaces are better, because on the margin LW admins don’t have either the authority or the time to moderate everyone’s individual posts. Authors are rightly pissed if we just show up and ban people from their posts, or delete people’s content without checking in with them, and the moderator-author communication channel is sufficiently limited that if you want most posts to be moderated you will need to give the authors some substantial power to do that.
    There maybe are better ways of doing it, but I just have really no patience or sympathy for people who appeal to some kind of abstract “I don’t trust people to moderate” intuition. Someone has to moderate if you want anything nice. Maybe you would like the LW admins to moderate much more, though I think the marginal capacity we have for that is kind of limited, and it’s not actually the case that anyone involved in this conversation wouldn’t also go and scream “CENSORSHIP CENSORSHIP CENSORSHIP” if the site admins just banned people directly instead.
    Overall the post authors having more moderation control means I will ban fewer people because it means we get to have more of an archipelago. If you want a more centralized culture, we can do that, but I think it will overall mean more people getting banned because I have blunter tools and much much less time than the aggregate of all LessWrong post authors. In my ideal world post authors would ban and delete much more aggressively so that we would actually get an archipelago of cultures and perspectives, but alas, threads like this one, and constant social attacks on anyone who tries to use any moderation tools generally guarantee that nobody wants to deal with the hassle.
    And to be clear, I really value the principle of “If anyone says anything wrong on LessWrong, you can find a refutation of it right below the post”, and have always cared about somehow maintaining it. But that principle is just achieved totally fine via the pingback system. I think de-facto again almost no one is banned from almost anywhere else so things end up via the comment system, and I would probably slightly change the UI for the pingback system to work better in contexts like mobile if it became more load-bearing, but it seems to me to work fine as an escape valve that maintains that relationship pretty well.
    I do think there is a bit of a hole in that principle for what one should do if someone says something wrong in a comment. I have been kind of into adding comment-level pingbacks for a while, and would be kind of sold that if more banning happens, we should add comment-level pingbacks in some clear way (I would also find the information otherwise valuable).
    - Wei Dai 17 Jun 2025 22:54 UTC
      4 points
      4
      Parent
      In the discussion under the original post, some people will have read the reply post, and some won’t (perhaps including the original post’s author, if they banned the commenter in part to avoid having to look at their content), so I have to model this.
      
      Sure, let’s give people moderation tools, but why trust authors with unilateral powers that can’t be overriden by the community, such as banning and moving comments/commenters to a much less visible section?
      - habryka 17 Jun 2025 23:00 UTC
        4 points
        0
        Parent
        “Not being able to get the knowledge if you are curious” and “some people have of course read different things” are quite different states of affairs!
        I am objecting to the former. I agree that of course any conversation with more than 10 participants will have some variance in who knows what, but that’s not what I am talking about.
        Wei Dai 17 Jun 2025 23:09 UTC
        3 points
        3
        Parent
        It would be easy to give authors a button to let them look at comments that they’ve muted. (This seems so obvious that I didn’t think to mention it, and I’m confused by your inference that authors would have no ability to look at the muted comments at all. At the very least they can simply log out.)
        habryka 17 Jun 2025 23:16 UTC
        4 points
        2
        Parent
        I mean, kind of. The default UI experience of everyone will still differ by a lot (and importantly between people who will meaningfully be “in the same room”) and the framing of the feature as “muted comments” does indeed not communicate that.
        The exact degree of how much it would make the dynamics more confusing would end up depending on the saliency of the author UI, but of course commenters will have no idea what the author UI looks like, and so can’t form accurate expectations about how likely the author is to end up making the muted comments visible to them.
        Contrast to a situation with two comment sections. The default assumption is that the author and the users just see the exact same thing. There is no uncertainty about whether maybe the author has things by default collapsed whereas the commenters do not. People know what everyone else is seeing, and it’s communicated in the most straightforward way. I don’t even really know what I would do to communicate to commenters what the author sees (it’s not an impossible UI challenge, you can imagine a small screenshot on the tooltip of the “muted icon” that shows what the author UI looks like, but that doesn’t feel to me like a particularly elegant solution).
        One of the key things I mean by “the UI looking the same for all users” is maintaining common knowledge about who is likely to read what, or at least the rough process that determines what people read and what context they have. If I give the author some special UI where some things are hidden, then in order to maintain common knowledge I now need to show the users what the author’s UI looks like (and show the author what the users are being shown about the author UI, but this mostly would take care of itself since all authors will be commenters in other contexts).
  - Ben Pace 17 Jun 2025 22:54 UTC
    3 points
    2
    Parent
    It seems to me like the main difference is that Habryka just trusts authors to “garden their spaces” more than I do, and wants to actively encourage this, whereas I’m reluctantly trying to accommodate such authors. I’m not sure what’s driving this difference though.
    I’m not certain that this is the crux, but I’ll try again to explain that why I think it’s good to give people that sort of agency. I am probably repeating myself somewhat.
    
    I think incompatibilities often drive people away (e.g. at LessOnline I have let ppl know they can ask certain ppl not to come to their sessions, as it would make them not want to run the sessions, and this is definitely not due to criticism but to conflict between the two people). That’s one reason why I think this should be available.
    I think bad commenters also drive people away. There are bad commenters who seem fine when inspecting any single comment but when inspecting longer threads and longer patterns they’re draining energy and provide no good ideas or arguments. Always low quality criticisms, stated maximally aggressively, not actually good at communication/learning. I can think of many examples.
    I think it’s good to give individuals some basic level of agency over these situations, and not require active input from mods each time. This is for cases where the incompatibility is quite individual, or where the user’s information comes from off-site interactions, and also just because there are probably a lot of incompatibilities and we already spend a lot of time each week on site-moderation. And furthermore ppl are often quite averse to bringing up personal incompatibilities with strangers (i.e. in a DM to the mods who they’ve never interacted with before and don’t know particularly well).
    Some people will not have the principles to tend their garden appropriately, and will inappropriately remove people with good critiques. That’s why it’s important that they cannot prevent the user from writing posts or quick takes about their content. Most substantial criticisms on this site have come in post and quick takes form, such as Wentworth’s critiques of other alignment strategies, or the sharp left turn discourse, or Natalia’s critiques of Guzey’s sleep hypotheses / SMTM’s lithium hypothesis, or Eliezer’s critique of the bioanchors report.
    So it seems to me like it’s needed for several reasons, and basically won’t change the deep character of the site where there’s tons of aggressive and harsh criticism on LW. And I also basically expect most great and critical users not to get restricted in this particular way (e.g. Gwern, Habryka, Wentworth, more). So while I acknowledge there will be nonzero inappropriate uses of it that increase the friction of legitimate criticism, I think it won’t be a big effect size overall on the ability and frequency of criticism, and it will help a great deal with a common class of very unpleasant scenarios that drive good writers away.
    - Wei Dai 18 Jun 2025 0:07 UTC
      6 points
      2
      Parent
      
      I think incompatibilities often drive people away (e.g. at LessOnline I have let ppl know they can ask certain ppl not to come to their sessions, as it would make them not want to run the sessions, and this is definitely not due to criticism but to conflict between the two people). That’s one reason why I think this should be available.
      
      This is something I currently want to accommodate but not encourage people to use moderation tools for, but maybe I’m wrong. How can I get a better sense of what’s going on with this kind of incompatibility? Why do you think “definitely not due to criticism but to conflict”?
      
      I think bad commenters also drive people away. There are bad commenters who seem fine when inspecting any single comment but when inspecting longer threads and longer patterns they’re draining energy and provide no good ideas or arguments. Always low quality criticisms, stated maximally aggressively, not actually good at communication/learning. I can think of many examples.
      
      It seems like this requires a very different kind of solution than either local bans or mutes, which most people don’t or probably won’t use, so can’t help in most places. Like maybe allow people to vote on commenters instead of just comments, and then their comments get a default karma based on their commenter karma (or rather the direct commenter-level karma would contribute to the default karma, in addition to their total karma which currently determines the default karma).
      
      Most substantial criticisms on this site have come in post and quick takes form, such as Wentworth’s critiques of other alignment strategies, or the sharp left turn discourse, or Natalia’s critiques of Guzey’s sleep hypotheses / SMTM’s lithium hypothesis, or Eliezer’s critique of the bioanchors report.
      
      I’m worried about less “substantial” criticisms that are unlikely to get their own posts, like just pointing out a relatively obvious mistake in the OP, or lack of clarity, or failure to address some important counterargument.
      - Ben Pace 18 Jun 2025 1:02 UTC
        5 points
        0
        Parent
        This is something I currently want to accommodate but not encourage people to use moderation tools for, but maybe I’m wrong. How can I get a better sense of what’s going on with this kind of incompatibility? Why do you think “definitely not due to criticism but to conflict”?
        I mean I’ve mostly gotten a better sense of it by running lots of institutions and events and had tons of complaints bubble up. I know it’s not just because of criticism because (a) I know from first-principles that conflicts exist for reasons other than criticism of someone’s blogposts, and (b) I’ve seen a bunch of these incompatibilities. Things like “bad romantic breakup” or “was dishonorable in a business setting” or “severe communication style mismatch”, amongst other things.
        You say you’re not interested in using “moderation tools” for this. What do you have in mind for how to deal with this, other than tools for minimizing interaction between two people?
        Like maybe allow people to vote on commenters instead of just comments, and then their comments get a default karma based on their commenter karma (or rather the direct commenter-level karma would contribute to the default karma, in addition to their total karma which currently determines the default karma).
        It’s a good idea, and maybe we should do it, but I think doesn’t really address the thing of unique / idiosyncratic incompatibilities. Also it would be quite socially punishing for someone to know that they’re publicly labelled net negative as a commenter, rather than simply that their individual comments so-far have been considered poor contributions, and making a system this individually harsh is a cost to be weighed, and it might make it overall push away high-quality contributors more than it helps.
        I’m worried about less “substantial” criticisms that are unlikely to get their own posts, like just pointing out a relatively obvious mistake in the OP, or lack of clarity, or failure to address some important counterargument.
        This seems then that making it so that a short list of users are not welcome to comment on a single person’s post is much less likely to cause these things to be missed. The more basic mistakes can be noticed by a lot of people. If it’s a mistake that only one person can notice due to their rare expertise or unique perspective, I think they can get a lot of karma by making it a whole quick take or post.
        Like, just to check, are we discussing a potential bad future world if this feature gets massively more use? Like, right now there are a ton of very disagreeable and harsh critics on LessWrong and there’s very few absolute bans. I’d guess absolute bans being on the order of 30-100 author-commenter pairs over the ~7 years we’ve had this, and weekly logged-in users being ~4,000 these days. The effect size so far has been really quite tiny. My guess is that it could probably increase like 10x and still not be a very noticeable friction for criticism on LessWrong for basically all good commenters.
      - habryka 18 Jun 2025 0:42 UTC
        3 points
        0
        Parent
        It seems like this requires a very different kind of solution than either local bans or mutes, which most people don’t or probably won’t use, so can’t help in most places. Like maybe allow people to vote on commenters instead of just comments, and then their comments get a default karma based on their commenter karma (or rather the direct commenter-level karma would contribute to the default karma, in addition to their total karma which currently determines the default karma).
        I think better karma systems could potentially be pretty great, though I’ve historically always found it really hard to find something much better, mostly for complexity reasons. See this old shortform of mine on a bunch of stuff that a karma system has to do simultaneously:
        https://www.lesswrong.com/posts/EQJfdqSaMcJyR5k73/habryka-s-shortform-feed?commentId=8meuqgifXhksp42sg