habryka comments on [Meta] New moderation tools and moderation guidelines

habryka 16 Jun 2025 20:34 UTC
10 points
0
In my mind things aren’t neatly categorized into “top N reasons”, but here are some quick thoughts:
(I.) I am generally very averse to having any UI element that shows on individual comments. It just clutters things up quickly and requires people to scan each individual comment. I have put an enormous amount of effort into trying to reduce the number of UI elements on comments. I much prefer organizing things into sections which people can parse once, and then assume everything has the same type signature.
(II.) I think a core thing I want UI to do in the space is to hit the right balance between “making it salient to commenters that they are getting more filtered evidence” and “giving the author social legitimacy to control their own space, combined with checks and balances”.
I expect this specific proposal to end up feeling like a constant mark of shame that authors are hesitant to use because they don’t feel the legitimacy to use it, and most importantly, make it very hard for them to get feedback on whether others judge them for how they use it, inducing paranoia and anxiety, which I think would make the feature largely unused. I think in that world it isn’t really helping anyone, though it will make authors feel additionally guilty by having technically handed them a tool for the job, but one that they expect will come with social censure after being used, and so we will hear fewer complaints and have less agency to address the underlying problems.
(III.) I think there is a nearby world where you have n-directional muting (i.e. any user can mute any other user), and I expect people to conceptually confuse that with what is going on here, and there is no natural way to extend this feature into the n-directional direction.

I generally dislike n-directional muting for other reasons, though it’s something I’ve considered over the years.
(IV.) I think it’s good to have different in-flows of users into different conversations. I think the mute-thread structure would basically just cause every commenter to participate in two conversations, one with the author, and one without the author, and I expect that to be a bunch worse than to have something like two separate comment sections, or a top-level response post where the two different conversations can end up with substantially non-overlapping sets of participants.
(V.) The strongest argument against anything in the space is just the complexity it adds. The ban system IMO currently is good because mostly you basically don’t have to track it. Almost nobody ever gets banned, but it helps with the most extreme cases, and the moderators keep track of things not getting out of control with lots of unreasonable seeming bans. Either this or an additional comment section, or any of the other solutions discussed is one additional thing to keep track off for how LessWrong works, and there really is already a lot to keep track off, and we should have a very very strong prior that we should generally not add complexity but remove it.
(VI.) Relatedly, I think the mark of a good feature on LessWrong is something that solves multiple problems at once, not just one problem. Whenever I’ve felt happy about a feature decision it’s usually been after having kept a bag of problems in the back of my mind for many months or years, and then at some point trying to find a solution to a new problem, and noticing that it would also solve one or multiple other problems at the same time. This solution doesn’t have that hallmark, and I’ve mostly regretted whenever I’ve done that, ending up adding complexity to both the codebase and the UI that didn’t pay off.
- Wei Dai 17 Jun 2025 11:21 UTC
  1 point
  −2
  Parent
  To reduce clutter you can reuse the green color bars that currently indicate new comments, and make it red for muted comments.
  
  Authors might rarely ban commenters because the threat of banning drives them away already. And if the bans are rare then what’s the big deal with requiring moderator approval first?
  
  giving the author social legitimacy to control their own space, combined with checks and balance
  
  I would support letting authors control their space via the mute and flag proposal, adding my weight to its social legitimacy, and I’m guessing others who currently are very much against the ban system (thus helping to deprive it of social legitimacy) would also support or at least not attack it much in the future. I and I think others would be against any system that lets authors unilaterally exert very strong control of visibility of comments such as by moving them to a bottom section.
  
  But I guess you’re actually talking about something else, like how comfortable does the UX make the author, thus encouraging them to use it more. It seems like you’re saying you don’t want to make the muting to be too in your face, because that makes authors uncomfortable and reluctant to use it? Or you simultaneously want authors to have a lot of control over comment visibility, but don’t want that fact to be easily visible (and the current ban system accomplishes this)? I don’t know, this just seems very wrong to me, like you want authors to feel social legitimacy that doesn’t actually exist, ie if most people support giving authors more control then why would it be necessary to hide it.
  - habryka 17 Jun 2025 18:14 UTC
    19 points
    0
    Parent
    To reduce clutter you can reuse the green color bars that currently indicate new comments, and make it red for muted comments.
    No, the whole point of the green bars is to be a very salient indicator that only shows in the relatively rare circumstance where you need it (which is when you revisit a comment thread you previously read and want to find new comments). Having a permanent red indicator would break in like 5 different ways:
    It would communicate a temporary indicator, because that’s the pattern that we established with colored indicators all across the site. All color we have is part of dynamic elements.
    It would introduce a completely new UI color which has so far only been used in the extremely narrow context of downvoting
    Because the color has only been used in downvoting it would feel like a mark of shame making the social dynamics a bunch worse
    How would you now indicate that a muted comment is new?
    The green bar is intentionally very noticeable, and red is even more attention grabbing, making it IMO even worse than a small icon somewhere on the comment in terms of clutter.
    To be clear, I still appreciate the suggestion, but I don’t think it’s a good one in this context.
    I would support letting authors control their space via the mute and flag proposal, adding my weight to its social legitimacy, and I’m guessing others who currently are very much against the ban system (thus helping to deprive it of social legitimacy) would also support or at least not attack it much in the future.
    I’ve received much more pushback for mute-like proposals than for ban-like proposals on LW (though this one is quite different and things might be different).
    I appreciate the offer to provide social legitimacy, but I don’t really see a feasible way for you to achieve that, as authors will rightfully be concerned that the people who will judge them will be people who don’t know your opinions on this, and there is no natural way for them to see your support. As I mentioned one central issue with this proposal is that authors cannot see the reaction others have to the muted comments (whereas they know that if they ban someone then the state of knowledge they have about what conversation is going on on the site is the same as the state other people have, which makes the social situation much easier to model).
    Or you simultaneously want authors to have a lot of control over comment visibility, but don’t want that fact to be easily visible (and the current ban system accomplishes this)? I don’t know, this just seems very wrong to me, like you want authors to feel social legitimacy that doesn’t actually exist, ie if most people support giving authors more control then why would it be necessary to hide it.
    No, I don’t mind visibility at all really. I think public ban lists are great, and as I mentioned I wouldn’t mind having the number of people banned and comments deleted shown at the bottom of the comment section for each author (as well as the deleted content and the author names themselves still visible via the moderation log).
    But legitimacy is a fickle thing on the internet where vigorous calls against censorship are as easy to elicit as friendly greetings at your neighborhood potluck, and angry mobs do frequently roam the streets, and you have to think about how both readers and authors will think about the legitimacy of a tool in a situation where you didn’t just have a long multi-dozen paragraph conversation about the merits of different moderation systems.
    I think this specific proposal fails on communicating the right level of legitimacy. I think others fare better (like the ban system with a visible moderation log), though are also not ideal. I think we can probably do something better than both, which is why I am interested in discussing this, but my intuitions about how these things go say this specific proposal probably will end up in the wrong social equilibria (and like, to be clear, I am not super confident in this, but understanding these social dynamics is among the top concerns for designing systems and UI like this).
  - Ben Pace 17 Jun 2025 18:39 UTC
    13 points
    8
    Parent
    As an aside, I think one UI preference I suspect Habryka has more strongly than Wei Dai does here is that the UI look the same to all users. For similar reasons why WYSIWYG is helpful for editing, when it comes to muting/threading/etc it’s helpful for ppl to all be looking at the same page so they can easily model what others are seeing. Having some ppl see a user’s comments but the author not, or key commenters not, is quite costly for social transparency, and understanding social dynamics.
    - Wei Dai 17 Jun 2025 22:19 UTC
      8 points
      2
      Parent
      My proposal was meant to address the requirement that some authors apparently have to avoid interacting with certain commenters. All proposals dealing with this imply multiple conversations and people having to model different states of knowledge in others, unless those commenters are just silenced altogether, so I’m confused why it’s more confusing to have multiple conversations happening in the same place when those conversations are marked clearly.
      
      It seems to me like the main difference is that Habryka just trusts authors to “garden their spaces” more than I do, and wants to actively encourage this, whereas I’m reluctantly trying to accommodate such authors. I’m not sure what’s driving this difference though. People on Habryka’s side (so far only he has spoken up, but there’s clearly more given voting patterns) seem very reluctant to directly address the concern that people like me have that even great authors are human and likely biased quite strongly when it comes to evaluating strong criticism, unless they’ve done so somewhere I haven’t seen.
      
      Maybe it just comes down to differing intuitions and there’s not much to say? There’s some evidence available though, like Said’s highly upvoted comment nevertheless triggering a desire to ban Said. Has Habryka seen more positive evidence that I haven’t?
      - habryka 17 Jun 2025 22:42 UTC
        5 points
        2
        Parent
        All proposals dealing with this imply multiple conversations and people having to model different states of knowledge in others, unless those commenters are just silenced altogether
        No, what are you talking about? The current situation, where people can make new top level posts, which get shown below the post itself via the pingback system, does not involve any asymmetric states of knowledge?
        Indeed, there are lots of ways to achieve this without requiring asymmetric states of knowledge. Having the two comment sections, with one marked as “off-topic” or something like that also doesn’t require any asymmetric states of knowledge.
        seem very reluctant to directly address the concern that people like me have that even great authors are human and likely biased quite strongly
        Unmoderated discussion spaces are not generally better than moderated discussion spaces, including on the groupthink dimension! There is no great utopia of discourse that can be achieved simply by withholding moderation tools from people. Bandwidth is limited and cultural coordination is hard and this means that there are harsh tradeoffs to be made about which ideas and perspectives will end up presented.
        I am not hesitant to address the claim directly, it is just the case that on LessWrong, practically no banning of anyone ever takes place who wouldn’t also end up being post-banned by the moderators and so de-facto this effect just doesn’t seem real. Yes, maybe there are chilling effects that don’t produce observable effects, which is always important to think about with this kind of stuff, but I don’t currently buy it.
        The default thing that happens is when you leave a place unmoderated is just that the conversation gets dominated by whoever has the most time and stamina and social resilience, and the overall resulting diversity of perspectives trends to zero. Post authors are one obvious group to moderate spaces, especially with supervision from site moderators.
        There are lots of reasonable things to try here, but a random blanket “I don’t trust post authors to moderate” is simply making an implicit statement that unmoderated spaces are better, because on the margin LW admins don’t have either the authority or the time to moderate everyone’s individual posts. Authors are rightly pissed if we just show up and ban people from their posts, or delete people’s content without checking in with them, and the moderator-author communication channel is sufficiently limited that if you want most posts to be moderated you will need to give the authors some substantial power to do that.
        There maybe are better ways of doing it, but I just have really no patience or sympathy for people who appeal to some kind of abstract “I don’t trust people to moderate” intuition. Someone has to moderate if you want anything nice. Maybe you would like the LW admins to moderate much more, though I think the marginal capacity we have for that is kind of limited, and it’s not actually the case that anyone involved in this conversation wouldn’t also go and scream “CENSORSHIP CENSORSHIP CENSORSHIP” if the site admins just banned people directly instead.
        Overall the post authors having more moderation control means I will ban fewer people because it means we get to have more of an archipelago. If you want a more centralized culture, we can do that, but I think it will overall mean more people getting banned because I have blunter tools and much much less time than the aggregate of all LessWrong post authors. In my ideal world post authors would ban and delete much more aggressively so that we would actually get an archipelago of cultures and perspectives, but alas, threads like this one, and constant social attacks on anyone who tries to use any moderation tools generally guarantee that nobody wants to deal with the hassle.
        And to be clear, I really value the principle of “If anyone says anything wrong on LessWrong, you can find a refutation of it right below the post”, and have always cared about somehow maintaining it. But that principle is just achieved totally fine via the pingback system. I think de-facto again almost no one is banned from almost anywhere else so things end up via the comment system, and I would probably slightly change the UI for the pingback system to work better in contexts like mobile if it became more load-bearing, but it seems to me to work fine as an escape valve that maintains that relationship pretty well.
        I do think there is a bit of a hole in that principle for what one should do if someone says something wrong in a comment. I have been kind of into adding comment-level pingbacks for a while, and would be kind of sold that if more banning happens, we should add comment-level pingbacks in some clear way (I would also find the information otherwise valuable).
        Wei Dai 17 Jun 2025 22:54 UTC
        4 points
        4
        Parent
        In the discussion under the original post, some people will have read the reply post, and some won’t (perhaps including the original post’s author, if they banned the commenter in part to avoid having to look at their content), so I have to model this.
        
        Sure, let’s give people moderation tools, but why trust authors with unilateral powers that can’t be overriden by the community, such as banning and moving comments/commenters to a much less visible section?
        habryka 17 Jun 2025 23:00 UTC
        4 points
        0
        Parent
        “Not being able to get the knowledge if you are curious” and “some people have of course read different things” are quite different states of affairs!
        I am objecting to the former. I agree that of course any conversation with more than 10 participants will have some variance in who knows what, but that’s not what I am talking about.
        Wei Dai 17 Jun 2025 23:09 UTC
        3 points
        3
        Parent
        It would be easy to give authors a button to let them look at comments that they’ve muted. (This seems so obvious that I didn’t think to mention it, and I’m confused by your inference that authors would have no ability to look at the muted comments at all. At the very least they can simply log out.)
        habryka 17 Jun 2025 23:16 UTC
        4 points
        2
        Parent
        I mean, kind of. The default UI experience of everyone will still differ by a lot (and importantly between people who will meaningfully be “in the same room”) and the framing of the feature as “muted comments” does indeed not communicate that.
        The exact degree of how much it would make the dynamics more confusing would end up depending on the saliency of the author UI, but of course commenters will have no idea what the author UI looks like, and so can’t form accurate expectations about how likely the author is to end up making the muted comments visible to them.
        Contrast to a situation with two comment sections. The default assumption is that the author and the users just see the exact same thing. There is no uncertainty about whether maybe the author has things by default collapsed whereas the commenters do not. People know what everyone else is seeing, and it’s communicated in the most straightforward way. I don’t even really know what I would do to communicate to commenters what the author sees (it’s not an impossible UI challenge, you can imagine a small screenshot on the tooltip of the “muted icon” that shows what the author UI looks like, but that doesn’t feel to me like a particularly elegant solution).
        One of the key things I mean by “the UI looking the same for all users” is maintaining common knowledge about who is likely to read what, or at least the rough process that determines what people read and what context they have. If I give the author some special UI where some things are hidden, then in order to maintain common knowledge I now need to show the users what the author’s UI looks like (and show the author what the users are being shown about the author UI, but this mostly would take care of itself since all authors will be commenters in other contexts).
      - Ben Pace 17 Jun 2025 22:54 UTC
        3 points
        2
        Parent
        It seems to me like the main difference is that Habryka just trusts authors to “garden their spaces” more than I do, and wants to actively encourage this, whereas I’m reluctantly trying to accommodate such authors. I’m not sure what’s driving this difference though.
        I’m not certain that this is the crux, but I’ll try again to explain that why I think it’s good to give people that sort of agency. I am probably repeating myself somewhat.
        
        I think incompatibilities often drive people away (e.g. at LessOnline I have let ppl know they can ask certain ppl not to come to their sessions, as it would make them not want to run the sessions, and this is definitely not due to criticism but to conflict between the two people). That’s one reason why I think this should be available.
        I think bad commenters also drive people away. There are bad commenters who seem fine when inspecting any single comment but when inspecting longer threads and longer patterns they’re draining energy and provide no good ideas or arguments. Always low quality criticisms, stated maximally aggressively, not actually good at communication/learning. I can think of many examples.
        I think it’s good to give individuals some basic level of agency over these situations, and not require active input from mods each time. This is for cases where the incompatibility is quite individual, or where the user’s information comes from off-site interactions, and also just because there are probably a lot of incompatibilities and we already spend a lot of time each week on site-moderation. And furthermore ppl are often quite averse to bringing up personal incompatibilities with strangers (i.e. in a DM to the mods who they’ve never interacted with before and don’t know particularly well).
        Some people will not have the principles to tend their garden appropriately, and will inappropriately remove people with good critiques. That’s why it’s important that they cannot prevent the user from writing posts or quick takes about their content. Most substantial criticisms on this site have come in post and quick takes form, such as Wentworth’s critiques of other alignment strategies, or the sharp left turn discourse, or Natalia’s critiques of Guzey’s sleep hypotheses / SMTM’s lithium hypothesis, or Eliezer’s critique of the bioanchors report.
        So it seems to me like it’s needed for several reasons, and basically won’t change the deep character of the site where there’s tons of aggressive and harsh criticism on LW. And I also basically expect most great and critical users not to get restricted in this particular way (e.g. Gwern, Habryka, Wentworth, more). So while I acknowledge there will be nonzero inappropriate uses of it that increase the friction of legitimate criticism, I think it won’t be a big effect size overall on the ability and frequency of criticism, and it will help a great deal with a common class of very unpleasant scenarios that drive good writers away.
        Wei Dai 18 Jun 2025 0:07 UTC
        6 points
        2
        Parent
        
        I think incompatibilities often drive people away (e.g. at LessOnline I have let ppl know they can ask certain ppl not to come to their sessions, as it would make them not want to run the sessions, and this is definitely not due to criticism but to conflict between the two people). That’s one reason why I think this should be available.
        
        This is something I currently want to accommodate but not encourage people to use moderation tools for, but maybe I’m wrong. How can I get a better sense of what’s going on with this kind of incompatibility? Why do you think “definitely not due to criticism but to conflict”?
        
        I think bad commenters also drive people away. There are bad commenters who seem fine when inspecting any single comment but when inspecting longer threads and longer patterns they’re draining energy and provide no good ideas or arguments. Always low quality criticisms, stated maximally aggressively, not actually good at communication/learning. I can think of many examples.
        
        It seems like this requires a very different kind of solution than either local bans or mutes, which most people don’t or probably won’t use, so can’t help in most places. Like maybe allow people to vote on commenters instead of just comments, and then their comments get a default karma based on their commenter karma (or rather the direct commenter-level karma would contribute to the default karma, in addition to their total karma which currently determines the default karma).
        
        Most substantial criticisms on this site have come in post and quick takes form, such as Wentworth’s critiques of other alignment strategies, or the sharp left turn discourse, or Natalia’s critiques of Guzey’s sleep hypotheses / SMTM’s lithium hypothesis, or Eliezer’s critique of the bioanchors report.
        
        I’m worried about less “substantial” criticisms that are unlikely to get their own posts, like just pointing out a relatively obvious mistake in the OP, or lack of clarity, or failure to address some important counterargument.
        Ben Pace 18 Jun 2025 1:02 UTC
        5 points
        0
        Parent
        This is something I currently want to accommodate but not encourage people to use moderation tools for, but maybe I’m wrong. How can I get a better sense of what’s going on with this kind of incompatibility? Why do you think “definitely not due to criticism but to conflict”?
        I mean I’ve mostly gotten a better sense of it by running lots of institutions and events and had tons of complaints bubble up. I know it’s not just because of criticism because (a) I know from first-principles that conflicts exist for reasons other than criticism of someone’s blogposts, and (b) I’ve seen a bunch of these incompatibilities. Things like “bad romantic breakup” or “was dishonorable in a business setting” or “severe communication style mismatch”, amongst other things.
        You say you’re not interested in using “moderation tools” for this. What do you have in mind for how to deal with this, other than tools for minimizing interaction between two people?
        Like maybe allow people to vote on commenters instead of just comments, and then their comments get a default karma based on their commenter karma (or rather the direct commenter-level karma would contribute to the default karma, in addition to their total karma which currently determines the default karma).
        It’s a good idea, and maybe we should do it, but I think doesn’t really address the thing of unique / idiosyncratic incompatibilities. Also it would be quite socially punishing for someone to know that they’re publicly labelled net negative as a commenter, rather than simply that their individual comments so-far have been considered poor contributions, and making a system this individually harsh is a cost to be weighed, and it might make it overall push away high-quality contributors more than it helps.
        I’m worried about less “substantial” criticisms that are unlikely to get their own posts, like just pointing out a relatively obvious mistake in the OP, or lack of clarity, or failure to address some important counterargument.
        This seems then that making it so that a short list of users are not welcome to comment on a single person’s post is much less likely to cause these things to be missed. The more basic mistakes can be noticed by a lot of people. If it’s a mistake that only one person can notice due to their rare expertise or unique perspective, I think they can get a lot of karma by making it a whole quick take or post.
        Like, just to check, are we discussing a potential bad future world if this feature gets massively more use? Like, right now there are a ton of very disagreeable and harsh critics on LessWrong and there’s very few absolute bans. I’d guess absolute bans being on the order of 30-100 author-commenter pairs over the ~7 years we’ve had this, and weekly logged-in users being ~4,000 these days. The effect size so far has been really quite tiny. My guess is that it could probably increase like 10x and still not be a very noticeable friction for criticism on LessWrong for basically all good commenters.
        habryka 18 Jun 2025 0:42 UTC
        3 points
        0
        Parent
        It seems like this requires a very different kind of solution than either local bans or mutes, which most people don’t or probably won’t use, so can’t help in most places. Like maybe allow people to vote on commenters instead of just comments, and then their comments get a default karma based on their commenter karma (or rather the direct commenter-level karma would contribute to the default karma, in addition to their total karma which currently determines the default karma).
        I think better karma systems could potentially be pretty great, though I’ve historically always found it really hard to find something much better, mostly for complexity reasons. See this old shortform of mine on a bunch of stuff that a karma system has to do simultaneously:
        https://www.lesswrong.com/posts/EQJfdqSaMcJyR5k73/habryka-s-shortform-feed?commentId=8meuqgifXhksp42sg