gwern comments on [Meta] New moderation tools and moderation guidelines

gwern 28 Jun 2025 20:01 UTC
41 points
26

It’s not my view at all. I think a community will achieve much better outcomes if being bothered by the example message is considered normal and acceptable, and writing the example message is considered bad.

That’s a strange position to hold on LW, where it has long been a core tenet that one should not be bothered by messages like that. And that has always been the case, whether it was LW2, LW1 (remember, say, ‘babyeaters’? or ‘decoupling’? or Methods of Rationality), Overcoming Bias (Hanson, ‘politics is the mindkiller’), SL4 (‘Crocker’s Rules’) etc.

I can definitely say on my own part that nothing of major value I have done as a writer online—whether it was popularizing Bitcoin or darknet markets or the embryo selection analysis or writing ‘The Scaling Hypothesis’—would have been done if I had cared too much about “vibes” or how it made the reader feel. (Many of the things I have written definitely did make a lot of readers feel bad. And they should have. There is something wrong with you if you can read, say, ‘Scaling Hypothesis’ and not feel bad. I myself regularly feel bad about it! But that’s not a bad thing.) Even my Wikipedia editing earned me doxes and death threats.

And this is because (among many other reasons) emotional reactions are inextricably tied up with manipulation, politics, and status—which are the very last things you want in a site dedicated to speculative discussion and far-out unpopular ideas, which will definitionally be ‘creepy’, ‘icky’, ‘cringe’, ‘fringe’, ‘evil’, ‘bad vibes’ etc. (Even the most brutal totalitarian dictatorships concede this when they set up free speech zones and safe spaces like the ‘science cities’.)

Someone once wrote, upon being newly arrived to LW, a good observation of the local culture about how this works:

Could being “status-blind” in the sense that Eliezer claims to be (or perhaps some other not yet well-understood status-related property) be strongly correlated to managing to create lots of utility? (In the sense of helping the world a lot).

Currently I consider Yudkowsky, Scott Alexander, and Nick Bostrom to be three of the most important people. After reading Superintelligence and watching a bunch of interviews, one of first things I said about Nick Bostrom to a friend was that I felt like he legitimately has almost no status concerns (that was well before LW 2.0 launched). In case of S/A it’s less clear, but I suspect similar things.

Many of our ideas and people are (much) higher status than they used to be. It is no surprise people here might care more about status than they used to, in the same way that rich people care more about taxes than poor people.

But they were willing to be status-blind and not prize emotionality, and that is why they could become high-status. And barring the sudden discovery of an infallible oracle, we can continue to expect future high-status things to start off low-status...
What links here?
- Rafael Harth's comment on [Meta] New moderation tools and moderation guidelines by habryka (30 Jun 2025 12:07 UTC; 19 points)
- Rafael Harth 30 Jun 2025 12:19 UTC
  6 points
  2
  Parent
  This doesn’t feel like it engages with anything I believe. None of the things you listed are things I object to. I don’t object to how you wrote the the Scaling Hypothesis post, I don’t object to the Baby Eaters, I super don’t object to decoupling, and I super extra don’t object to ‘politics is the mind-killer’. The only one I’d even have to think about is Crocker’s Rules, but I don’t think I have an issue with those, either. They’re notably something you opt into.
  
  I can definitely say on my own part that nothing of major value I have done as a writer online—whether it was popularizing Bitcoin or darknet markets or the embryo selection analysis or writing ‘The Scaling Hypothesis’—would have been done if I had cared too much about “vibes” or how it made the reader feel. (Many of the things I have written definitely did make a lot of readers feel bad. And they should have. There is something wrong with you if you can read, say, ‘Scaling Hypothesis’ and not feel bad. I myself regularly feel bad about it! But that’s not a bad thing.) Even my Wikipedia editing earned me doxes and death threats.
  
  I claim that Said’s post is bad because it can be rewritten into a post that fulfills the same function but doesn’t feel as offensive.^[1] Nothing analogous is true for the Scaling Hypothesis. And it’s not just that you couldn’t rewrite it to be less scary but convey the same ideas; rather the whole comparison in a non-starter because I don’t think that your post on the scaling hypothesis has bad vibes, at all. If memory serves (I didn’t read your post in its entirety back then, but I read some of it and I have some memory of how I reacted), it sparks a kind of “holy shit this is happening and extremely scary ---(.Ó﹏Ò.)” reaction. This is, like, actively good. It’s not in the same category as Said’s comment in any way whatsoever.
  
  [...] on LW, where it has long been a core tenet that one should not be bothered by messages like that.
  
  I agree that it is better to to not be bothered. My position is not “you should be more influenced by vibes”, it’s something like “in the real world vibes are about 80% of the causal factors behind most people’s comments on LW and about 95% outside of LW, and considering this fact about how brains work in how you write is going to be good, not bad”. In particular, as I described in my latest response to Zack, I claim that the comments that I actually end up leaving on this site are significantly less influenced by vibes than Said’s because recognizing what my brain does allows me to reject it if I want to. Someone who earnestly believes to be vibe-blind while not being vibe-blind at all can’t do that.
  
  Someone once wrote, upon being newly arrived to LW, a good observation of the local culture about how this works [...]
  
  This honestly just doesn’t seem related, either. Status-blindness is more specific than vibe-blindness, and even if vibe-blindness were a thing, it wouldn’t contradict anything I’ve argued for.
  ↩︎
  it is not identical in terms of content, as Zack pointed out, but here I’m using function in the sense of the good thing the post comment achieves, which is to leave a strongly worded and valid criticism of the post. (In actual fact, I think my version is significantly more effective at doing that.)
  - gwern 1 Jul 2025 23:11 UTC
    10 points
    5
    Parent
    
    I claim that Said’s post is bad because it can be rewritten into a post that fulfills the same function but doesn’t feel as offensive.[1] Nothing analogous is true for the Scaling Hypothesis. And it’s not just that you couldn’t rewrite it to be less scary but convey the same ideas; rather the whole comparison in a non-starter because I don’t think that your post on the scaling hypothesis has bad vibes, at all. If memory serves (I didn’t read your post in its entirety back then, but I read some of it and I have some memory of how I reacted), it sparks a kind of “holy shit this is happening and extremely scary ---(.Ó﹏Ò.)” reaction. This is, like, actively good
    
    This description of ‘bad vibes’ vs ‘good vibes’ and what could be ‘be rewritten into a post that fulfills the same function’, is confusing to me because I would have said that that is obviously untrue of Scaling Hypothesis (and as the author, I should hope I would know), and that was why I highlighted it as an example: aside from the bad news being delivered in it, I wrote a lot of it to be deliberately rude and offensive—and those were some of the most effective parts of it! (And also, yes, made people mad at me.) Just because the essay was effective and is now high-status doesn’t change that. It couldn’t’ve been rewritten and achieved the same outcome, because that was much of the point.
    - habryka 1 Jul 2025 23:22 UTC
      3 points
      0
      Parent
      (To be clear, my take on all of this is that it is often appropriate to be rude and offensive, and often inappropriate. What has made these discussions so frustrating is that Said continues to insist that no rudeness or offensiveness is present in any of his writing, which makes it impossible to have a conversation about whether the rudeness of offensiveness is appropriate in the relevant context.
      Like, yeah, LessWrong has a culture, a lot of which is determined by what things people are rude and offensive towards. One of my jobs as a moderator is to steer where that goes. If someone keeps being rude and offensive towards things I really want to cultivate on the site, I will tell them to stop, or at least provide arguments for why this thing that I do not think is worth scorn, deserves scorn.
      But if that person then insists that no rudeness or offensiveness was present in any of their writing, despite an overwhelming fraction of readers reading it as such, then they are either a writer so bad at communication as to not belong on the site, or trying to avoid accountability for the content of their messages, both of which leave little room but to take moderation action that limits their contributions to the site)
      - Said Achmiz 2 Jul 2025 1:28 UTC
        3 points
        11
        Parent
        When you say that “it is often appropriate to be rude and offensive”, and that LW culture admits of things toward which it is acceptable to be “rude and offensive”, this would seem to imply that the alleged rudeness and offensiveness as such is not the problem with my comments, but rather that the problem is what I am supposedly being rude and offensive towards; and that the alleged “rudeness and offensiveness” would not itself ever be used against me (and that if a moderator tried to claim that “rudeness and offensiveness” is itself punishable regardless of target, or if a user tried to claim that LW norms forbid being rude and offensive, then you’d show up and say “nope, wrong, actually being rude and offensive is fine as long as it’s toward the right things, so kindly withdraw that particular criticism; Said has violated no rules or norms being being rude and offensive as such”). True? Or not?
        habryka 2 Jul 2025 3:12 UTC
        7 points
        1
        Parent
        Yep, though of course there are priors. The thing I am saying is that there are at least some things (and not just an extremely small set of things) that it is OK to be rude towards, not that the average quality/value-produced of rude and non-rude content is the same.
        For enforcement efficiency reasons, culture schelling point reasons, and various other reasons, it might still make sense to place something like a burden of proof on the person who claims that in this case rudeness and offensiveness is appropriate, so enforcement for rudeness without justification might still make sense, and my guess is does indeed make sense.
        Also, for you in-particular, I have seen the things that you tend to be rude and offensive towards, at least historically, and haven’t been very happy about that, and so the prior is more skewed against that. My guess is I would tell you in-particular that you have a bad track of aiming it well, and so would request additional justification on the marginal case from your side (similar to how we generally treat repeat criminal offenders different from first-time offenders, and often declare whole sets of actions that are otherwise completely legal from their option pool in prevention of future harm).
        Said Achmiz 2 Jul 2025 4:27 UTC
        5 points
        1
        Parent
        
        Yep
        
        Ok, cool, I’ll definitely…
        
        For enforcement efficiency reasons, culture schelling point reasons, and various other reasons, it might still make sense to place something like a burden of proof on the person who claims that in this case rudeness and offensiveness is appropriate, so enforcement for rudeness without justification might still make sense, and my guess is does indeed make sense.
        
        … ah. So, less “yep” and more “nope”.
        
        On the other hand, maybe this “burden of proof” business isn’t so bad. Actually, I was just reading your comments on the recent post about eating honey, including this top-level comment where you say that the ideas in the OP “sound approximately insane”, that they’re “so many orders of magnitude away from what sounds reasonable” that you cannot but seriously entertain the notion that said ideas were not motivated by reasonably thinking about the topic, but rather by “social signaling madness where someone is trying to signal commitment to some group standard of dedication”.
        
        I thought that it was a good comment, personally. (Actually, I found basically all your comments on that post to be upvote-worthy.) That comment is currently at 47 karma, so it would seem that there’s more or less a consensus among LW users that it’s a good comment. I did see that you edited the comment (after I’d initially read and upvoted it) to include somewhat of a disclaimer:
        
        Edit: And to avoid a slipping of local norms here. I am only leaving this comment here now after I have seriously entertained the hypothesis that I might be wrong, that maybe there do exist good arguments for moral weights that seem crazy to from where I was originally, but no, after looking into the arguments for quite a while, they still seem crazy to me, and so now I feel comfortable moving on and trying to think about what psychological or social process produces posts like this. And still, I am hesitant about it, because many readers have probably not gone through the same journey, and I don’t want a culture of dismissing things just because they are big and would imply drastic actions.
        
        Is this the sort of thing that you have in mind, when you talk about burden of proof?
        
        If I include disclaimers like this at the end of all of my comments, does that suffice to solve of all of the problems that you perceive in said comments? (And can I then be as “rude and offensive” as I like? Hypothetically, that is. If I were inclined to be “rude and offensive”.)
        habryka 2 Jul 2025 6:10 UTC
        2 points
        0
        Parent
        Is this the sort of thing that you have in mind, when you talk about burden of proof?
        Yes-ish, though I doubt we have a shared understanding of what “that sort of thing” is.
        If I include disclaimers like this at the end of all of my comments, does that suffice to solve of all of the problems that you perceive in said comments? (And can I then be as “rude and offensive” as I like? Hypothetically, that is. If I were inclined to be “rude and offensive”.)
        No, of course not. As I explained, as moderator and admin I will curate or at least apply heavy pressure on which things receive scorn and rudeness on LW.
        A disclaimer is the start of an argument. If the argument is wrong by my lights, you will still get told off. The standard is not “needs to make an argument”, it’s (if anything) “needs to make an argument that I^[1] think is good”. Making an argument is not in itself something that does something.
        ^
        (Not necessarily just me, there are other mods, and a kind of complicated social process that involves many stakeholders that can override me, or I will try to take into account and integrate, but for the sake of conversation we can assume it’s “me”)
        Said Achmiz 2 Jul 2025 6:16 UTC
        4 points
        0
        Parent
        Who decides if the argument suffices? You and the other mods, presumably? (EDIT: Confirmed by subsequent edit to parent comment.)
        
        If so, then could you explain how this doesn’t end up amounting to “the LW mods have undertaken to unilaterally decide, in advance, what are the correct views on all topics and the correct positions in all arguments”? Because that’s what it seems like you have to do, in order for your policy to make any sense.
        
        EDIT: Could you expand on “a kind of complicated social process that involves many stakeholders that can override me”? I don’t know what you mean by this.
        habryka 2 Jul 2025 6:39 UTC
        7 points
        2
        Parent
        At the end of the day, I^[1] have the keys to the database and the domain, so in some sense anything that leaves me with those keys can be summarized as “the LW mods have undertaken to unilaterally decide, in advance, what are the correct views on all topics and the correct positions in all arguments”.
        But of course, that is largely semantic. It is of course not the case that I have or would ever intend to make a list of allowed or forbidden opinions on LessWrong. In contrast, I have mostly procedural models about how LessWrong should function, including the importance of LW as a free marketplace of ideas, a place where contradicting ideas can be discussed and debated, and many other aspects of what will cause the whole LW project to go well. Expanding on all of them would of course far exceed this comment thread.
        On the specific topic of which things deserve scorn or ridicule or rudeness, I also find it hard to give a very short summary of what I believe. We have litigated some past disagreements in the space (such as whether people using their moderation tools to ban others from their blogpost should be subject to scorn or ridicule in most cases), which can provide some guidance, though the breadth of things we’ve covered is fairly limited. It is also clear to me that the exact flavor of rudeness and aggressiveness matters quite a bit. I favor straightforward aggression over passive aggression, and have expressed my model that “sneering” as a mental motion is almost never appropriate (though not literally never, as I expanded on).
        And on most topics, I simply don’t know yet, and I’ll have to figure it out as it comes up. The space of ways people can be helpfully or unhelpfully judgmental and aggressive is very large and big, and I do not have most of it precomputed. I do have many more principles I could expand on, and would like to do sometime, but this specific comment thread does not seem like the time.
        ^
        Again, not just me, but also other mods and stakeholders and stuff
        What links here?
        Said Achmiz's comment on [Meta] New moderation tools and moderation guidelines by habryka (2 Jul 2025 7:14 UTC; -10 points)
        Said Achmiz 2 Jul 2025 7:14 UTC
        −10 points
        0
        Parent
        
        At the end of the day, I[1] have the keys to the database and the domain, so in some sense anything that leaves me with those keys can be summarized as “the LW mods have undertaken to unilaterally decide, in advance, what are the correct views on all topics and the correct positions in all arguments”.
        
        It seems clear that your “in some sense” is doing pretty much all the work here.
        
        Compare, again, to Data Secrets Lox: there, I have the keys to the database and the domain (and in the case of DSL, it really is just me, no one else—the domain is just mine, the database is just mine, the server config passwords… everything), and yet I don’t undertake to decide anything at all, because I have gone to great lengths to formally surrender all moderation powers (retaining only the power of deleting outright illegal content). I don’t make the rules; I don’t enforce the rules; I don’t pick the people who make or enforce the rules. (Indeed the moderators—which were chosen via to the system that I put into place—can even temp-ban me, from my own forum, that I own and run and pay for with my own personal money! And they have! And that is as it should be.)
        
        I say this not to suggest that LW should be run the way that DSL is run (that wouldn’t really make sense, or work, or be appropriate), but to point out that obviously there is a spectrum of the degree to which having “the keys to the database and the domain” can, in fact, be meaningfully and accurately talked about as “the … mods have undertaken to unilaterally decide, in advance, what are the correct views on all topics and the correct positions in all arguments”—and you are way, way further along that spectrum than the minimal possible value thereof. In other words, it is completely possible to hold said keys, and yet (compared to how you run LW) not, in any meaningful sense, undertake to unilaterally decide anything w.r.t. correctness of views and positions.
        
        It is of course not the case that I have or would ever intend to make a list of allowed or forbidden opinions on LessWrong. In contrast, I have mostly procedural models about how LessWrong should function, including the importance of LW as a free marketplace of ideas, a place where contradicting ideas can be discussed and debated, and many other aspects of what will cause the whole LW project to go well. Expanding on all of them would of course far exceed this comment thread.
        
        Yes, well… the problem is that this is the central issue in this whole dispute (such as it is). The whole point is that your preferred policies (the ones to which I object) directly and severely damage LW’s ability to be “a free marketplace of ideas, a place where contradicting ideas can be discussed and debated”, and instead constitute you effectively making a list of allowed or forbidden opinions on this forum. Like… that’s pretty much the whole thing, right there. You seem to want to make that list while claiming that you’re not making any such list, and to prevent the marketplace of ideas from happening while claiming that the marketplace of ideas is important. I don’t see how you can square this circle. Your preferred policies seem to be fundamentally at odds with your stated goals.
        Expand this thread
        habryka 2 Jul 2025 16:25 UTC
        3 points
        −2
        Parent
        Yes, well… the problem is that this is the central issue in this whole dispute (such as it is). The whole point is that your preferred policies (the ones to which I object) directly and severely damage LW’s ability to be “a free marketplace of ideas, a place where contradicting ideas can be discussed and debated”, and instead constitute you effectively making a list of allowed or forbidden opinions on this forum.
        I don’t see where I am making any such list, unless you mean “list” in a weird way that doesn’t involve any actual lists, or even things that are kind of like lists.
        in any meaningful sense, undertake to unilaterally decide anything w.r.t. correctness of views and positions.
        I don’t think that’s an accurate description of DSL, indeed it appears to me that what the de-facto list of the kind of policy you have chosen is is pretty predictable (and IMO does not result in particular good outcomes). Just because you have some other people make the choices doesn’t change the predictability of the actual outcome, or who is responsible for it.
        I already made the obvious point that of course, in some sense, I/we will define what is OK on LessWrong via some procedural way. You can dislike the way I/we do it.
        There is definitely no “fundamentally at odds”, there is a difference in opinion about what works here, which you and me have already spent hundreds of hours trying to resolve, and we seem unlikely to resolve right now. Just making more comments stating that “I am wrong” in big words will not make that happen faster (or more likely to happen at all).
        habryka 2 Jul 2025 6:51 UTC
        2 points
        0
        Parent
        Seems like we got lost in a tangle of edits. I hope my comment clarifies sufficiently, as it is time for me to sleep, and I am somewhat unlikely to pick up this thread tomorrow.
        Said Achmiz 2 Jul 2025 7:16 UTC
        −8 points
        0
        Parent
        Sure, I appreciate the clarification, but my last question still stands:
        
        Could you expand on “a kind of complicated social process that involves many stakeholders that can override me”? I don’t know what you mean by this.
        
        Who are these stakeholders, exactly? How might they override you?
        Expand this thread
        habryka 2 Jul 2025 16:30 UTC
        2 points
        0
        Parent
        Not going to go into this, since I think it’s actually a pretty complicated situation, but at a very high level some obvious groups that could override me:
        The Lightcone Infrastructure board (me, Vaniver, Daniel Kokotajlo)
        If Eliezer really wanted, he can probably override me
        A more distributed consensus among what one might consider the leadership of the rationality community (like, let’s say Scott Alexander and Ryan Greenblatt and Buck and Nate and John Wentworth and Gwern all roughly agree on me messing up really badly)
        There would be lots more to say on this topic, but as I said, I am unlikely to pick this thread up again, so I hope that’s good enough!
        Ben Pace 3 Jul 2025 1:41 UTC
        2 points
        0
        Parent
        (This is a tangent to the thread and so I don’t plan to reply further on this, but I just wanted to mention that while I view Greenblatt and Shlegeris as stakeholders in LessWrong, a space they’ve made many great contributions to and are quite active in, I don’t view them as leadership of the rationality community.)
      - Zack_M_Davis 2 Jul 2025 6:05 UTC
        2 points
        −2
        Parent
        Rudeness and offensiveness are, in the general case, two-place functions: text can be offensive to some particular reader, but short of unambiguous blatant insults, there’s not going to be a consensus about what is “offensive”, because people vary widely (both by personal disposition and vagarious incentives) in how easy they are to offend.
        
        When it is denied that Achmiz’s comments are offensive, the claim isn’t that no one is offended. (That would be silly. We have public testimony from people who are offended!) The claim is that the text isn’t rude in a “one-place” sense (no personal insults, &c.).
        
        The reason that “one-place” rudeness is the relevant standard is because it would be bad if a fraction of easily-offended readers (even a substantial fraction—I don’t think you can defend the adjective “overwhelming”) could weaponize their emotions to censor expressions of ideas that they don’t like.
        
        For example, take Achmiz’s January 2020 comment claiming that, “There is always an obligation by any author to respond to anyone’s comment along these lines. If no response is provided to (what ought rightly to be) simple requests for clarification [...] the author should be interpreted as ignorant.”
        
        The comment is expressing an opinion about discourse norms (“There is always an obligation”) and a belief about what Bayesian inferences are warranted by the absence of replies to a question (“the author should be interpreted as ignorant”). It makes sense that many people disagree with that opinion and that belief (say, because they think that some of the questions that Achmiz thinks are good, are actually bad, and that ignoring bad questions is good). Fine.
        
        But beyond mere disagreement, to characterize such a comment as offensive (because it criticizes people who don’t respond to questions), is something I find offensive. (If you’re thinking of allegedly worse behavior from Achmiz than this January 2020 comment, you’re going to need to provide the example.) Sometimes people who use the same website as you have opinions or beliefs that imply that they disapprove of your behavior! So what? I think grown-ups should be able to shrug this off without calling for draconian and deranged censorship policies. The mod team should not be pandering to such pathetic cry-bullying.
        Jiro 18 Jul 2025 6:14 UTC
        5 points
        3
        Parent
        
        But beyond mere disagreement, to characterize such a comment as offensive (because it criticizes people who don’t respond to questions), is something I find offensive.
        
        The comment is offensive because it communicates things other than its literal words. Autistically taking it apart word by word and saying that it only offends because it is criticism ignores this implicit communication.
        habryka 2 Jul 2025 6:49 UTC
        4 points
        1
        Parent
        Gwern himself refers to the “rude and offensive” part in this subthread as a one-place function:
        aside from the bad news being delivered in it, I wrote a lot of it to be deliberately rude and offensive—and those were some of the most effective parts of it! (And also, yes, made people mad at me.)
        I have no interest in doing more hand-wringing about whether Said’s comments are intended to make people feel judged or not, and don’t find your distinction of “no personal insults” as somehow making the rudeness more objective compelling. If you want we can talk about the Gwern hypothetical in which he clearly intended to be rude and offensive towards other people.
        I think grown-ups should be able to shrug this off without calling for draconian and deranged censorship policies.
        This is indeed a form of aggression and scorn that I do not approve of on this site, especially after extensive litigation.
        I’ll leave it on this thread, but as a concrete example for the sake of setting clear guidelines, strawmanning all (or really any) authors who have preferences about people not being super aggro in their comment threads as “pathetic cry-bullying” and “calling for draconian and deranged censorship policies” is indeed one of the things that will get you banned from this site on other threads! You have been warned!
        What links here?
        Zack_M_Davis's comment on [Meta] New moderation tools and moderation guidelines by habryka (8 Jul 2025 5:18 UTC; 7 points)
        Zack_M_Davis's comment on [Meta] New moderation tools and moderation guidelines by habryka (5 Jul 2025 23:36 UTC; 1 point)
        Ben Pace 2 Jul 2025 16:55 UTC
        2 points
        0
        Parent
        I don’t think the relevant dispute about rudeness/offensiveness is about one-place and two-place functions, I think it’s about passive/overt aggression. With passive aggression you often have to read more of the surrounding context to understand what is being communicated, whereas with overt aggression it’s clear if you just locally inspect the statement (or behavior), which sounds like one / two place functions (because ppl with different information states look at the same message and get different assessments), but isn’t.
        For instance, suppose Alice doesn’t invite Bob to a party, and then Bob responds by ignoring all of Alice’s texts and avoiding eye contact most of the time. Now any single instance of “not responding to a text” isn’t aggression, but from the context of a chance in a relationship where it was typical to reply same-day, to zero replies, it can be understood as retalliation. And of course, even then it’s not provable, there are other possible explanations (such as Bob is taking a GLP-1 inhibitor and is quite low-energy at the minute don’t think too hard about why I picked that example), which makes it a great avenue for hard-to-litigate retaliation.
        Wei Dai 5 Jul 2025 17:09 UTC
        15 points
        12
        Parent
        Does everyone here remember and/or agree with my point in The Nature of Offense, that offense is about status, which in the current context implies that it’s essentially impossible to avoid giving offense while delivering strong criticism (as it almost necessarily implies that the target of criticism deserves lower status for writing something seriously flawed, having false/harmful beliefs, etc.)? @habryka @Zack_M_Davis @Said Achmiz
        This discussion has become very long and I’ve been travelling so I may have missed something, but has anyone managed to write a version of Said’s comment that delivers the same strength of criticism while avoiding offending its target? (Given the above, I think this would be impossible.)
        What links here?
        “Some Basic Level of Mutual Respect About Whether Other People Deserve to Live”?! by Zack_M_Davis (18 Jul 2025 6:41 UTC; 25 points)
        Ben Pace 5 Jul 2025 21:36 UTC
        6 points
        2
        Parent
        Not a direct response, but I want to take some point in this discussion (I think I said this to Zack in-person the other day) to say that, while some people are arguing that things should as a rule be collaborative and not offensive (e.g. to varying extents Gordon and Rafael), this is not the position that the LW mods are arguing for. We’re arguing that authors on LessWrong should be able to moderate their posts with different norms/standards from one another, and that there should not reliably be retribution or counter-punishment by other commenters for them moderating in that way.
        I could see it being confusing because sometimes an author like Gordon is moderating you, and sometimes a site-mod like Habryka is moderating you, but they are using different standards, and the LW-mods are not typically endorsing the author standards as our own. I even generally agree with many of the counterarguments that e.g. Zack makes against those norms being the best ones. Some of my favorite comments on this site are offensive (where ‘offensive’ is referring to Wei’s meaning of ‘lowering someone’s social status’).
        Wei Dai 20 Jul 2025 14:34 UTC
        7 points
        5
        Parent
        
        We’re arguing that authors on LessWrong should be able to moderate their posts with different norms/standards from one another, and that there should not reliably be retribution or counter-punishment by other commenters for them moderating in that way.
        
        What is currently the acceptable range of moderation norms/standards (according to the LW mod team)? For example if someone blatantly deletes/bans their most effective critics, is that acceptable? What if they instead subtly discourage critics (while being overtly neutral/welcoming) by selectively enforcing rules more stringently against their critics? What if they simply ban all “offensive” content, which as a side effect discourages critics (since as I mentioned earlier, criticism almostly inescapably implies offense)?
        
        And what does “retribution or counter-punishment” mean? If I see an author doing one of the above, and question or criticize that in the comments or elsewhere, is that considered “retribution or counter-punishment” given that my comment/post is also inescapably offensive (status-lowering) toward the author?
        Ben Pace 20 Jul 2025 18:33 UTC
        11 points
        −4
        Parent
        What is currently the acceptable range of moderation norms/standards (according to the LW mod team)?
        I think the first answer is “Mostly people aren’t using this feature, and the few times people have used it it has not felt to us like abuse or strongly needing to be pushed back on” so I don’t have any examples to point to.
        But I’ll quickly generate thoughts on each of the hypothetical scenarios you briefly gestured to.
        For example if someone blatantly deletes/bans their most effective critics, is that acceptable?
        It’d depend on how things played out. If Andrew writes a blogpost with a big new theory of rationality, and then Bob and Charlie and Dave all write decisive critiques and then their comments are deleted and banned from commenting on his posts, I think it’s quite plausible that they’ll write a new post together with the copy-paste of their comments and it’ll get more karma than the original. This seems like a good-enough outcome to me. On the other hand if Andrew only gets criticism from Bob, and then deletes Bob’s comments and bans him from commenting on his posts, and then Bob leaves the site, I would take more active action, such as perhaps removing Andrew’s ability to ban people, and reaching out to Bob to thank him for his comments and encourage him to return.
        What if they instead subtly discourage critics (while being overtly neutral/welcoming) by selectively enforcing rules more stringently against their critics?
        That sounds like there’d be some increased friction on criticism. Hopefully we’d try to notice it and counteract it, or hopefully the commenters who were having annoying experience being moderated would notice and move to shortform or posts and do their criticism from there. But plausibly there’d just be some persistent additional annoyances or costs that certain users would have to pay.
        What if they simply ban all “offensive” content, which as a side effect discourages critics (since as I mentioned earlier, criticism almostly inescapably implies offense)?
        I mean, again, probably this would just be very incongruous with LessWrong and it wouldn’t really work and they’d have to ban like 30+ users because everyone wouldn’t get this and would keep doing things the author didn’t like, and the author wouldn’t eventually leave if they needed that sort of environment, or we’d step in after like 5 and say “this is kind of crazy, you have to stop doing this, it isn’t going to work out, we’re removing your ability to ban users”. So many of the good comments on LessWrong lower their interlocutor’s status in some way.
        And what does “retribution or counter-punishment” mean?
        It means actions that predictably make the author feel that them using the ban feature in general is illegitimate or that using it will cause them to have their reputation attacked, regardless of reason or context, in response to them using the ban feature.
        If I see an author doing one of the above, and question or criticize that in the comments or elsewhere, is that considered “retribution or counter-punishment” given that my comment/post is also inescapably offensive (status-lowering) toward the author?
        Many many writers on LessWrong are capable of critiquing a single instance of a ban while taking care to communicate that they are not pushing back on all instances of banning, and can also credibly offer support in other instances that are more reasonable.
        Generally it is harder to signal this when you are complaining about your own banning. For in-person contexts (e.g. events) I generally spend effort to ensure that people do not feel any cost for not inviting me to events or spaces, and not expect that I will complain loudly or cause them to lose social status for it, and a similar (but not identical) heuristic applies here. If someone finds interacting with you very unpleasant and you don’t understand quite why, it’s often bad form to loudly complain about it every time they don’t want to interact with you any more, even if you have an uncharitable hypothesis as to why.
        There is still good form and bad form to imposing costs on people for moderating their spaces, and costs imposed on people for moderating their spaces (based on disagreement or even trying to fix biases in the moderation) are the most common reason for good spaces not existing; moderation is unpleasant work, lots of people feel entitled to make strong social bids on you for your time and to threaten to attack your social standing, and I’ve seen many spaces degrade due to unwillingness to moderate. You should of course think about this if you are considering reliably complaining loudly every time anyone uses a ban feature on people.
        Added: I hope you get a sense from reading this that your questions don’t have simple answers, but that the scenarios you describe require active steering depending on the dynamics at play. I am somewhat wary you will keep asking me a lot of short questions that, due to your inexperience moderating spaces, you will assume have simple answers, and I will have to do lots of work generating all the contexts to show how things play out, else Said or someone allied with him against him being moderated on LW will claim I am unable to answer the most basic of questions and this shows me to be either ignorant or incompetent. And, man, this is a lot of moderation discussion.
        Wei Dai 21 Jul 2025 6:27 UTC
        6 points
        3
        Parent
        
        If someone finds interacting with you very unpleasant and you don’t understand quite why, it’s often bad form to loudly complain about it every time they don’t want to interact with you any more, even if you have an uncharitable hypothesis as to why.
        
        If I was in this circumstance, I would be pretty worried about my own biases, and ask neutral or potentially less biased parties whether there might be more charitable and reasonable hypotheses why that person doesn’t want to interact with me. If there isn’t though, why shouldn’t I complain and e.g. make it common knowledge that my valuable criticism is being suppressed? (Obviously I would also take into consideration social/political realities, not make enemies I can’t afford to make, etc.)
        
        I’ve seen many spaces degrade due to unwillingness to moderate
        
        But most people aren’t using this feature, so to the extent that LW hasn’t degraded (and that’s due to moderation), isn’t it mainly because of the site moderators and karma voters? The benefits of having a few people occasionally moderate their own spaces hardly seems worth the cost (to potential critics and people like me who really value criticism) of not knowing when their critiques might be unilaterally deleted or banned by post authors. I mean aside from the “benefit” of attracting/retaining the authors who demand such unilateral powers.
        
        And, man, this is a lot of moderation discussion.
        
        Aside from the above “benefit”, It seems like you’re currently getting the worst of both worlds: lack of significant usage and therefore potential positive effects, and lots of controversy when it is occasionally used. If you really thought this was an important feature for the long term health of the community, wouldn’t you do something to make it more popular? (Or have done it in the past 7 years since the feature came out?) But instead you (the mod team) seem content that few people use it, only coming out to defend the feature when people explicitly object to it. This only seems to make sense if the main motivation is again to attract/retain certain authors.
        
        I am somewhat wary you will keep asking me a lot of short questions that, due to your inexperience moderating spaces, you will assume have simple answers, and I will have to do lots of work generating all the contexts to show how things play out
        
        It seems like if you actually wanted or expected many people to use this feature, you would have written some guidelines on what people can and can’t do, or under what circumstances their moderation actions might be reversed by the site moderators. I don’t think I was expecting the answers to my questions to necessarily be simple, but rather that the answers already exist somewhere, at least in the form of general guidelines that might need to be interpreted to answer my specific questions.
        Expand this thread
        habryka 21 Jul 2025 8:06 UTC
        9 points
        −1
        Parent
        But most people aren’t using this feature, so to the extent that LW hasn’t degraded (and that’s due to moderation), isn’t it mainly because of the site moderators and karma voters? The benefits of having a few people occasionally moderate their own spaces hardly seems worth the cost
        I mean, mostly we’ve decided to give the people who complain about moderation a shot, and compensate by spending much much more moderation effort from the moderators. My guess is this has cost a large amount of counterfactual quality of the site, many contributors, etc.
        In-general, I find argument of the form “so to the extend that LW hasn’t been destroyed, X can’t be that valuable” pretty weak. It’s very hard to assess the counterfactual, and “if not X, LessWrong would have been completely destroyed” is rarely the case for almost any X that is in dispute.
        My guess is LW would be a lot better if more people felt comfortable moderating things, and in the present world, there are a lot of costs born by the site admins that wouldn’t be necessary otherwise.
        Wei Dai 21 Jul 2025 19:26 UTC
        7 points
        1
        Parent
        
        I mean, mostly we’ve decided to give the people who complain about moderation a shot
        
        What do you mean by this? Until I read this sentence, I saw you as giving the people who demand unilateral moderation powers a shot, and denying the requests of people like me to reduce such powers.
        
        My not very confident guess at this point is that if it weren’t for people like me, you would have pushed harder for people to moderate their own spaces more, perhaps by trying to publicly encourage this? And why did you decide to go against your own judgment on it, given that “people who complain about moderation” have no particular powers, except the power of persuasion (we’re not even threatening to leave the site!), and it seems like you were never persuaded?
        
        My guess is LW would be a lot better if more people felt comfortable moderating things, and in the present world, there are a lot of costs born by the site admins that wouldn’t be necessary otherwise.
        
        This seems implausible to me given my understanding of human nature (most people really hate to see/hear criticism) and history (few people can resist the temptation to shut down their critics when given the power and social license or cover to do so). If you want a taste of this, try asking DeepSeek some questions about the CCP.
        
        But presumably you also know this (at least abstractly, but perhaps not as viscerally as I do, coming from a Chinese background, where even before the CCP, criticism in many situations was culturally/socially impossible), so I’m confused and curious why you believe what you do.
        
        My guess is that you see a constant stream of bad comments, and wish you could outsource the burden of filtering them to post authors (or combine efforts to do more filtering). But as an occasional post author, my experience is that I’m not a reliable judge of what counts as a “bad comment”, e.g., I’m liable to view a critique as a low quality comment, only to change my mind later after seeing it upvoted and trying harder to understand/appreciate its point. Given this, I’m much more inclined to leave the moderation to the karma system, which seems to work well enough in leaving bad comments at low karma/visibility by not upvoting them, and even when it’s occasionally wrong, still provides a useful signal to me that many people share the same misunderstanding and it’s worth my time to try to correct (or maybe by engaging with it I find out that I still misjudged it).
        
        But if you don’t think it works well enough… hmm I recall writing a post about moderation tech proposals in 2016 and maybe there has been newer ideas since then?
        habryka 21 Jul 2025 19:47 UTC
        1 point
        0
        Parent
        I mean, I have written like 50,000+ words about this at this point in various comment threads. About why I care about archipelagos, and why I think it’s hard and bad to try to have centralized control about culture, about how much people hate being in places with ambiguous norms, and many other things. I don’t fault you for not reading them all, but I have done a huge amount of exposition.
        And why did you decide to go against your own judgment on it, given that “people who complain about moderation” have no particular powers, except the power of persuasion (we’re not even threatening to leave the site!), and it seems like you were never persuaded?
        Because the only choice at this point would be to ban them, since they appear to be willing to take any remaining channel or any remaining opportunity to heap approximately as much scorn and snark and social punishment on anyone daring to do moderation they disagree with, and I value things like readthesequences.com and many other contributions from the relevant people enough that that seemed really costly and sad.
        My guess is I will now do this, as it seems like the site doesn’t really have any other choice, and I am tired and have better things to do, but I think I was justified and right to be hesitant to do this for a while (though yes, in ex-post it would have obviously been better to just do that 5 years ago).
        What links here?
        Zack_M_Davis's comment on [Meta] New moderation tools and moderation guidelines by habryka (26 Jul 2025 6:07 UTC; 3 points)
        Wei Dai 22 Jul 2025 0:58 UTC
        5 points
        3
        Parent
        It seems to me there are plenty of options aside from centralized control and giving authors unilateral powers, and last I remember (i.e., at the end of this post) the mod team seems to be pivoting to other possibilities, some of which I would find much more reasonable/acceptable. I’m confused why you’re now so focused again on the model of authors-as-unilateral-moderators. Where have you explained this?
        habryka 22 Jul 2025 1:01 UTC
        0 points
        0
        Parent
        I have filled my interest in answering questions on this, so I’ll bow out and wish you good luck. Happy to chat some other time.
        I don’t think we ever “pivoted to other possibilities” (Ray often makes posts with moderation things he is thinking about, and the post doesn’t say anything about pivoting). Digging up the exact comments on why ultimately there needs to be at least some authority vested in authors as moderators seems like it would take a while.
        Wei Dai 22 Jul 2025 2:15 UTC
        5 points
        2
        Parent
        I meant pivot in the sense of “this doesn’t seem to be working well, we should seriously consider other possibilities” not “we’re definitely switching to a new moderation model”, but I now get that you disagree with Ray even about this.
        
        Your comment under Ray’s post wrote:
        
        We did end up implementing the AI Alignment Forum, which I do actually think is working pretty well and is a pretty good example of how I imagine Archipelago-like stuff to play out. We now also have both the EA Forum and LessWrong creating some more archipelago-like diversity in the online-forum space.
        
        This made me think you were also no longer very focused on the authors-as-unilateral-moderators model and was thinking more about subreddit-like models that Ray mentioned in his post.
        
        BTW I’ve been thinking for a while that LW needs a better search, as I’ve also often been in the position being unable to find some comment I’ve written in the past.
        
        Instead of one-on-one chats (or in addition to them), I think you should collect/organize your thoughts in a post or sequence, for a number of reasons including that you seem visibly frustrated that after having written 50k+ words on the topic, people like me still don’t know your reasons for preferring your solution.
        habryka 22 Jul 2025 2:34 UTC
        2 points
        0
        Parent
        We did end up implementing the AI Alignment Forum, which I do actually think is working pretty well and is a pretty good example of how I imagine Archipelago-like stuff to play out. We now also have both the EA Forum and LessWrong creating some more archipelago-like diversity in the online-forum space.
        Huh, ironically I now consider the AI Alignment Forum a pretty big mistake in how it’s structured (for reasons mostly orthogonal but not unrelated to this).
        BTW I’ve been thinking for a while that LW needs a better search, as I’ve also often been in the position being unable to find some comment I’ve written in the past.
        Agree.
        Instead of one-on-one chats (or in addition to them), I think you should collect/organize your thoughts in a post or sequence, for a number of reasons including that you seem visibly frustrated that after having written 50k+ words on the topic, people like me still don’t know your reasons for preferring your solution.
        I think I have elaborated non-trivially on my reasons in this thread, so I don’t really think it’s an issue of people not finding it.
        I do still agree it would be good to do more sequences-like writing on it, though like, we are already speaking in the context of Ray having done that a bunch (referencing things like the Archipelago vision), and writing top-level content takes a lot of time and effort.
        Wei Dai 22 Jul 2025 5:59 UTC
        4 points
        3
        Parent
        
        I think I have elaborated non-trivially on my reasons in this thread, so I don’t really think it’s an issue of people not finding it.
        
        It’s largely an issue of lack of organization and conciseness (50k+ words is a minus, not a plus in my view), but also clearly an issue of “not finding it”, given that you couldn’t find an important comment of your own, one that (judging from your description of it) contains a core argument needed to understand your current insistence on authors-as-unilateral-moderators.
        Ben Pace 22 Jul 2025 3:32 UTC
        7 points
        3
        Parent
        If someone finds interacting with you very unpleasant and you don’t understand quite why, it’s often bad form to loudly complain about it every time they don’t want to interact with you any more, even if you have an uncharitable hypothesis as to why.
        If I was in this circumstance, I would be pretty worried about my own biases, and ask neutral or potentially less biased parties whether there might be more charitable and reasonable hypotheses why that person doesn’t want to interact with me. If there isn’t though, why shouldn’t I complain and e.g. make it common knowledge that my valuable criticism is being suppressed? (Obviously I would also take into consideration social/political realities, not make enemies I can’t afford to make, etc.)
        I’m having a hard time seeing how this reply is hooking up to what I wrote. I didn’t say critics, I spoke much more generally. If someone wants to keep their distance from you because you have bad body odor, or because they think your job is unethical, and you either don’t know this or disagree, it’s pretty bad social form to go around loudly complaining every time they keep their distance from you. It makes it more socially costly for them to act in accordance with their preferences and makes a bunch of unnecessary social conflict. I’m pretty sure this is obvious and this doesn’t change if you’ve suddenly developed a ‘criticism’ of them.
        But most people aren’t using this feature, so to the extent that LW hasn’t degraded (and that’s due to moderation), isn’t it mainly because of the site moderators and karma voters? The benefits of having a few people occasionally moderate their own spaces hardly seems worth the cost (to potential critics and people like me who really value criticism) of not knowing when their critiques might be unilaterally deleted or banned by post authors. I mean aside from the “benefit” of attracting/retaining the authors who demand such unilateral powers.
        I mean, I think it pretty plausible that LW would be doing even better than it is with more people doing more gardening and making more moderated spaces within it, archipelago-style.
        I read you questioning my honesty and motivations a bunch (e.g. you have a few times mentioning that I probably only care about this because of status reasons I cannot mention or to attract certain authors and that my behavior is not consistent with believing in users moderating their own posts being a good idea) which are of course fine hypotheses for you to consider. After spending probably over 40 hours writing this month explaining why I think authors moderating their posts is a good idea and making some defense of myself and my reasoning, I think I’ve done my duty in showing up to engage with this semi-prosecution for the time being, and will let ppl come to their own conclusions. (Perhaps I will write up a summary of the discussion at some point.)
        Zack_M_Davis 5 Jul 2025 23:36 UTC
        1 point
        −15
        Parent
        
        and there should not reliably be retribution or counter-punishment by other commenters for them moderating in that way.
        
        Great, so all you need to do is make a rule specifying what speech constitutes “retribution” or “counterpunishment” that you want to censor on those grounds.
        
        Maybe the rule could be something like, “No complaining about being banned by a specific user (but commenting on your own shortform strictly about the substance of a post that you’ve been banned from does not itself constitute complaining about the ban)” or “No arguing against the existence on the user ban feature except in designated moderation threads (which get algorithmically deprioritized in the new Feed).”
        
        It’s your website! You have all the hard power! You can use the hard power to make the rules you want, and then the users of the website have a clear choice to either obey the rules or be banned from the site. Fine.
        
        What I find hard to understand is why the mod team seems to think it’s good for them to try to shape culture by means other than clear and explicit rules that could be neutrally enforced. Telling people to “stop optimizing in a fairly deep way” is not a rule because of how vague and potentially all-encompassing it is. Telling people to avoid “mak[ing] people feel judged or not” is not a rule because I don’t have control over how other people feel.
        
        “Don’t tell people ‘I’m judging you about X’” is a rule. I can do that.
        
        What I can’t do is convincingly pretend to be a person with a completely different personality such that people who are smart about subtext can’t even guess from subtle details of my writing style that I might privately be judging them.
        
        I mean, maybe I could if I tried very hard? But I have too much self-respect to try. If the mod team wants to force temperamentally judgemental people to convincingly pretend to be non-judgemental, that seems really crazy.
        
        I know, the mods didn’t say “We want temperamentally judgemental people to convincingly pretend to have a completely different personality” in those words; rather, Habryka said he wanted to “avoid a passive aggressive culture tak[ing] hold”. I just don’t see what the difference is supposed to be in practice.
        Ben Pace 6 Jul 2025 0:45 UTC
        16 points
        2
        Parent
        Mm, I think sometimes I’d rather judge on the standard of whether the outcome is good, rather than exclusively on the rules of behavior.
        A key question is: Are authors comfortable using the mod tools the site gives them to garden their posts?
        You can write lots of judgmental comments criticizing an author’s posts, and then they can ban you from their comments because they find engaging with you to be exhausting, and then you can make a shortform where you and your friends call them a coward, and then they stop using the mod tools (and other authors do too) out of a fear that using the mod tools will result in a group of people getting together to bully and call them names in front of the author’s peers. That’s a situation where authors become uncomfortable using their mod tools. But I don’t know precisely what comment was wrong and what was wrong with it such that had it not happened the outcome would counterfactually not have obtained i.e. that you wouldn’t have found some other way to make the author uncomfortable using his mod tools (though we could probably all agree on some schelling lines).
        Also I am hesitant to fully outlaw behavior that might sometimes be appropriate. Perhaps there are some situations where it’s appropriate to criticize someone on your shortform after they banned you. Or perhaps sometimes you should call someone a coward for not engaging with your criticism.
        Overall I believe sometimes I will have to look at the outcome and see whether the gain in this situation was worth the cost, and directly give positive/negative feedback based on that.
        Related to other things you wrote, FWIW I think you have a personality that many people would find uncomfortable interacting with a lot. In-person I regularly read you as being deeply pained and barely able to contain strongly emotional and hostile outbursts. I think just trying to ‘follow the rules’ might not succeed at making everyone feel comfortable interacting with you, even via text, if they feel a deep hostility from you to them that is struggling to contain itself with rules like “no explicit insults”, and sometimes the right choice for them will just be to not engage with you directly. So I think it is a hypothesis worth engaging with that you should work to change your personality somewhat.
        To be clear I think (as Said has said) that it is worth people learning to be able to make space to engage with people like you who they find uncomfortable, because you raise many good ideas and points (and engaging with you is something I relatively happily do, and this is a way I have grown stronger relative to myself of 10 years ago), and I hope you find more success as I respect many of your contributions, but I think a great many people who have good points to contribute don’t have as much capacity as me to do this, and you will sometimes have to take some responsibility for navigating this.
        Zack_M_Davis 8 Jul 2025 5:18 UTC
        7 points
        5
        Parent
        
        I’d rather judge on the standard of whether the outcome is good, rather than exclusively on the rules of behavior.
        
        A key reason to favor behavioral rules over trying to directly optimize outcomes (even granting that enforcement can’t be completely mechanized and there will always be some nonzero element of human judgement) is that act consequentialism doesn’t interact well with game theory, particularly when one of the consequences involved is people’s feelings.
        
        If the popular kids in the cool kids’ club don’t like Goldstein and your only goal is to make sure that the popular kids feel comfortable, then clearly your optimal policy is to kick Goldstein out of the club. But if you have some other goal that you’re trying to pursue with the club that the popular kids and Goldstein both have a stake in, then I think you do have to try to evaluate whether Goldstein “did anything wrong”, rather than just checking that everyone feels comfortable. Just ensuring that everyone feels comfortable at all costs, without regard to the reasons why people feel uncomfortable or any notion that some reasons aren’t legitimate grounds for intervention, amounts to relinquishing all control to anyone who feels uncomfortable when someone else doesn’t behave exactly how they want.
        
        Something I appreciate about the existing user ban functionality is that it is a rule-based mechanism. I have been persuaded by Achmiz and Dai’s arguments that it’s bad for our collective understanding that user bans prevent criticism, but at least it’s a procedurally “fair” kind of badness that I can tolerate, not completely arbitrary tyranny. The impartiality really helps. Do you really want to throw away that scrap of legitimacy in the name of optimizing outcomes even harder? Why?
        
        I think just trying to ‘follow the rules’ might not succeed at making everyone feel comfortable interacting with you
        
        But I’m not trying to make everyone feel comfortable interacting with me. I’m trying to achieve shared maps that reflect the territory.
        
        A big part of the reason some of my recent comments in this thread appeal to an inability or justified disinclination to convincingly pretend to not be judgmental is because your boss seems to disregard with prejudice Achmiz’s denials that his comments are “intended to make people feel judged”. In response to that, I’m “biting the bullet”: saying, okay, let’s grant that a commenter is judging someone; to what lengths must they go to conceal that, in order to prevent others from predictably feeling judged, given that people aren’t idiots and can read subtext?
        
        I think there’s something much more fundamental at stake here, which is that an intellectual forum that’s being held hostage to people’s feelings is intrinsically hampered and can’t be at the forefront of advancing the art of human rationality. If my post claims X, and a commenter says, “No, that’s wrong, actually not-X because Y”, it would be a non-sequitur for me to reply, “I’d prefer you engage with what I wrote with more curiosity and kindness.” Curiosity and kindness are just not logically relevant to the claim! (If I think the commenter has misconstrued what I wrote, I could just say that.) It needs to be possible to discuss ideas without getting tone-policed to death. Once you start playing this game of litigating feelings and feelings about other people’s feelings, there’s no end to it. The only stable Schelling point that doesn’t immediately dissolve into endless total war is to have rules and for everyone to take responsibility for their own feelings within the rules.
        
        I don’t think this is an unrealistic superhumanly high standard. As you’ve noticed, I am myself a pretty emotional person and tend to wear my heart on my sleeve. There are definitely times as recently as, um, yesterday, when I procrastinate checking this website because I’m scared that someone will have said something that will make me upset. In that sense, I think I do have some empathy for people who say that bad comments make them less likely to use the website. It’s just that, ultimately, I think that my sensitivity and vulnerability is my problem. Censoring voices that other people are interested in hearing would be making it everyone else’s problem.
        Expand this thread
        Jiro 18 Jul 2025 6:09 UTC
        9 points
        −3
        Parent
        
        I think there’s something much more fundamental at stake here, which is that an intellectual forum that’s being held hostage to people’s feelings is intrinsically hampered and can’t be at the forefront of advancing the art of human rationality.
        
        An intellectual forum that is not being “held hostage” to people’s feelings will instead be overrun by hostile actors who either are in it just to hurt people’s feelings, or who want to win through hurting people’s feelings.
        
        It’s just that, ultimately, I think that my sensitivity and vulnerability is my problem.
        
        Some sensitivity is your problem. Some sensitivity is the “problem” of being human and not reacting like Spock. It is unreasonable to treat all sensitivity as being the problem of the sensitive person.
        Elizabeth 6 Jul 2025 1:08 UTC
        5 points
        0
        Parent
        Mm, I think sometimes I’d rather judge on the standard of whether the outcome is good, rather than exclusively on the rules of behavior.
        This made my blood go cold, despite thinking it would be good if Said left LessWrong.
        My first thoughts when I read “judge on the standard of whether the outcome is good” is that this lets you cherrypick your favorite outcomes without justifying them. My second is that it knowing if something is good can be very complicated even after the fact, so predicting it ahead of time is challenging even if you are perfectly neutral.
        I think it’s good LessWrong(’s admins) allows authors to moderate their own posts (and I’ve used that to ban Said from my own posts). I think it’s good LessWrong mostly doesn’t allow explicit insults (and wish this was applied more strongly). I think it’s good LessWrong evaluates commenting patterns, not just individual comments. But “nothing that makes authors feel bad about bans” is way too far.
        Expand this thread
        habryka 6 Jul 2025 1:30 UTC
        6 points
        8
        Parent
        It’s extremely common for all judicial systems to rely on outcome assessments instead of process assessments! In many domains this is obviously the right standard! It is very common to create environments where someone can sue for damages and not just have the judgement be dependent on negligence (and both thresholds are indeed commonly relevant for almost any civil case).
        Like sure, it comes with various issues, but it seems obviously wrong to me to request that no part of the LessWrong moderation process relies on outcome assessments.
        Ben Pace 6 Jul 2025 1:48 UTC
        3 points
        0
        Parent
        Okay. But I nonetheless believe it’s necessary that we have to judge communication sometimes by outcomes rather than by process.
        Like, as a lower stakes examples, sometimes you try to teasingly make a joke at your friend’s expense, but they just find it mean, and you take responsibility for that and apologize. Just because you thought you were behaving right and communicating well doesn’t mean you were, and sometimes you accept feedback from others that says you misjudged a situation. I don’t have all the rules written down such that if you follow them your friend will read your comments as intended, sometimes I just have to check.
        Similarly sometimes you try to criticize an author, but they take it as implying you’ll push back whenever they enforce boundaries on LessWrong, and then you apologize and clarify that you do respect them enforcing boundaries in general but stand by the local criticism. (Or you don’t and then site-mods step in.) I don’t have all the rules written down such that if you follow them the author will read your comments as intended, sometimes I just have to check.
        Obviously mod powers can be abused, and having to determine on a case by case basis is a power that can be abused. Obviously it involves judgment calls. I did not disclaim this, I’m happy for anyone to point it out, perhaps nobody has mentioned it so far in this thread so it’s worth making sure the consideration is mentioned. And yeah, if you’re asking, I don’t endorse “nothing that makes authors feel bad about bans”, and there are definitely situations where I think it would be appropriate for us to reverse someone’s bans (e.g. if someone banned all of the top 20 authors in the LW review, I would probably think this is just not workable on LW and reverse that).
        Elizabeth 6 Jul 2025 4:17 UTC
        2 points
        0
        Parent
        Sure, but “is my friend upset” is very different than “is the sum total of all the positive and negative effects of this, from first order until infinite order, positive”
        Ben Pace 6 Jul 2025 4:42 UTC
        2 points
        0
        Parent
        I don’t really know what we’re talking about right now.
        habryka 6 Jul 2025 1:34 UTC
        2 points
        0
        Parent
        Said, you reacted to this:
        In-person I regularly read you as being deeply pained and barely able to contain strongly emotional and hostile outbursts.
        with “Disagree”.
        I have no idea how you could remotely know whether this is true, as I think you have never interacted with either Ben or Zack in person!
        Also, it’s really extremely obviously true. Indeed, Zack frequently has the corresponding emotional and hostile outbursts, so it’s really extremely evident they are barely contained during a lot of it (since sometimes they do not end up contained, and then Zack apologizes for containing them and explains that this is difficult for him).
        What links here?
        Said Achmiz's comment on Said Achmiz’s Shortform by Said Achmiz (6 Jul 2025 1:49 UTC; 7 points)
        Said Achmiz 6 Jul 2025 1:00 UTC
        −7 points
        −13
        Parent
        
        You can write lots of judgmental comments criticizing an author’s posts, and then they can ban you from their comments because they find engaging with you to be exhausting, and then you can make a shortform where you and your friends call them a coward, and then they stop using the mod tools (and other authors do too) out of a fear that using the mod tools will result in a group of people getting together to bully and call them names in front of the author’s peers. That’s a situation where authors become uncomfortable using their mod tools.
        
        Here’s what confuses me about this stance: do an author’s posts on Less Wrong (especially non-frontpage posts) constitute “the author’s private space”, or do they constitute “public space”?
        
        If the former, then the idea that things that Alice writes about Bob on her shortform (or in non-frontpage posts) can constitute “bullying”, or are taking place “in front of” third parties (who aren’t making the deliberate choice to go to Alice’s private space), is nonsense.
        
        If the latter, then the idea that authors should have the right to moderate discussions that are happening in a public space is clearly inappropriate.
        
        I understood the LW mods’ position to be the former—that an author’s posts are their own private space, within the LW ecosystem (which is why it makes sense to let them set their own separate moderation policy there). But then I can’t make any sense of this notion of “bullying”, as applied to comments written on an author’s shortform (or non-frontpage posts).
        
        It seems to me that these two ideas are incompatible.
        habryka 6 Jul 2025 1:26 UTC
        5 points
        2
        Parent
        What I find hard to understand is why the mod team seems to think it’s good for them to try to shape culture by means other than clear and explicit rules that could be neutrally enforced.
        No judicial system in the world has ever arrived at the ability to have “neutrally enforced rules”, at least the way I interpret you to mean this. Case law is the standard in almost every legal tradition, and the US legal system relies heavily on things like “jury of your peers” type stuff to make judgements.
        Intent frequently matters in legal decision. Cognitive state of mind matters for legal decisions. Judges go through years of training and are part of a long lineage of people who have built up various heuristics and principles about how to judge cases. Individual courts have their own culture and track record.
        And that is for the US legal system, which is absolutely not capable of operating remotely to the kind of standard that allows people to curate social spaces or deal with tricky kinds of social rulings. No company could make cultural or hiring or business decisions based on the standard of the US legal system. Neither could any internet forum.
        There is absolutely no chance we will ever be able to encodify LessWrong rules of conduct into a set of specific rules that can be neutrally judged by a third party. Zero chance. Give up. If that is something you need here, leave now. Feel free to try to build it for yourself.
        What links here?
        Zack_M_Davis's comment on [Meta] New moderation tools and moderation guidelines by habryka (8 Jul 2025 5:18 UTC; 7 points)
        Said Achmiz 6 Jul 2025 0:50 UTC
        −9 points
        −15
        Parent
        
        I could see it being confusing because sometimes an author like Gordon is moderating you, and sometimes a site-mod like Habryka is moderating you, but they are using different standards, and the LW-mods are not typically endorsing the author standards as our own.
        
        It’s not just confusing sometimes, it’s confusing basically all the time. It’s confusing even for me, even though I’ve spent all these years on Less Wrong, and have been involved in all of these discussions, and have worked on GreaterWrong, and have spent time thinking about moderation policies, etc., etc. For someone who is even a bit less “very on LW”^[1]—it’s basically incomprehensible.
        
        I mean, consider: whenever I comment on anything anywhere, on this website, I have to not only keep in mind the rules of LW (which I don’t actually know, because I can’t remember in what obscure, linked-from-nowhere-easily-findable, long, hard-to-parse post those rules are contained), and the norms of LW (which I understand only very vaguely, because they remain somewhere between “poorly explained” and “totally unexplained”), but also, in addition to those things, I have to keep in mind whose post I am commenting under, and somehow figure out from that not only what their stated “moderation policy” is (scare quotes because usually it’s not really a specification of a policy, it’s just sort of a vague allusion at a broad class of approaches to moderation policy), but also what their actual preferences are, and how they enforce those things.
        
        (I mean, take this recent post. The “moderation policy” a.k.a. “commenting guidelines” are: “Reign of Terror—I delete anything I judge to be counterproductive”. What is that? That’s not anything. What is Nate going to judge to be “counterproductive”? I have no idea. How will this “policy” be applied? I have no idea. Does anyone besides Nate himself know how he’s going to moderate the comments on his posts? Probably not. Does Nate himself even know? Well, maybe he does, I don’t know the guy; but a priori, there’s a good chance that he doesn’t know. The only way to proceed here is to just assume that he’s going to be reasonable… but it is incredibly demoralizing to invest effort into writing some comments, only for them to be summarily deleted, on the basis of arbitrary rules you weren’t told of beforehand, or “norms” that are totally up to arbitrary interpretation, etc. The result of an environment like that is that people will treat commenting here as strictly a low-effort activity. Why bother to put time and thought into your comments, if “whoops, someone’s opaque whim dictates that your comments are now gone” is a strong possibility?)
        
        The whole thing sort of works most of the time because most people on LW don’t take this “set your own moderation policy” stuff too seriously, and basically (both when posting and when commenting) treat the site as if the rules were something like what you’d find on a lightly moderated “nerdy” mailing list or classic-style discussion forum.
        
        But that just results in the same sorts of “selective enforcement” situations as you get in any real-world legal regime that criminalizes almost everything and enforces almost nothing.
        
        ↩︎
        By analogy with “very online”
        
        Said Achmiz 5 Jul 2025 19:51 UTC
        5 points
        1
        Parent
        Yes, of course. I both remember and agree wholeheartedly. (And @habryka’s reply in a sibling comment seems to me to be almost completely non-responsive to this point.)
        habryka 5 Jul 2025 18:26 UTC
        1 point
        −3
        Parent
        I think there is something to this, though I think you should not model status in this context as purely one dimensional.
        
        Like a culture of mutual dignity where you maintain some basic level of mutual respect about whether other people deserve to live, or deserve to suffer, seems achievable and my guess is strongly correlated with more reasonable criticism being made.
        
        I think parsing this through the lens of status is reasonably fruitful, and within that lens, as I discussed in other sub threads, the problem is that many bad comments try to make some things low status that I am trying to cultivate on the site, while also trying to avoid accountability and clarity over whether those implications are actually meaningfully shared by the site and its administrators (and no, voting does not magically solve this problem).
        
        The status lens doesn’t super shine light on the passive vs. active aggression distinction we discussed. And again as I said it’s too one dimensional in that people don’t view ideas on LessWrong as having a strict linear status hierarchy. Indeed ideas have lots of gears and criticism does not primarily consist of lowering something’s status, that seems like it gets rid of basically all the real things about criticism.
        What links here?
        “Some Basic Level of Mutual Respect About Whether Other People Deserve to Live”?! by Zack_M_Davis (18 Jul 2025 6:41 UTC; 25 points)
        Wei Dai 20 Jul 2025 14:44 UTC
        9 points
        2
        Parent
        
        the problem is that many bad comments try to make some things low status that I am trying to cultivate on the site
        
        What are these things? Do you have a post about them?
        Zack_M_Davis 25 Jul 2025 8:23 UTC
        4 points
        0
        Parent
        
        many bad comments try to make some things low status that I am trying to cultivate on the site
        
        I’m not sure what things you’re trying to cultivate in particular, but in general, I’m curious whether you’ve given any thought to the idea that the use of moderator power to shape culture is less robust to errors in judgement than trying to shape culture by means of just arguing for your views, for the reasons that Scott Alexander describes in “Guided by the Beauty of Our Weapons”. That is, in Alexander’s terminology, mod power is a “symmetric weapon” that works just as well whether the mods are right or wrong, whereas public arguments are an “asymmetric weapon” that’s more effective when the arguer is correct on the merits.
        
        When I think rationalist culture is getting things wrong (whether that be an object-level belief, or which things are considered high or low status), I write posts arguing for my current views. While I do sometimes worry about whether my current views are mistaken, I don’t worry much about having a large negative impact if it turns out that my views are mistaken, because I think that the means by which I hope to alter the culture has some amount of built-in error correction: if my beliefs or status-assignment-preferences are erroneous in some way that’s not currently clear to me, others who can see the error will argue against my views in the comments, contributing to the result that the culture won’t accept my (ex hypothesi erroneous) proposed changes.
        
        (In case this wasn’t already clear, this is not an argument against moderators ever doing anything. It’s a reason to be extra conservative about controversial and uncertain “culture-shaping” mod actions that would be very costly to get wrong, as contrasted to removing spam or uncontroversially low-value content.)
        habryka 25 Jul 2025 17:16 UTC
        5 points
        −1
        Parent
        I have argued a lot for my views! My sense is they are broadly (though not universally) accepted among what I consider the relevant set of core stakeholders for LessWrong.
        But beyond that, the core set of stakeholders is also pretty united behind the meta-view that in order for a place like LessWrong to work, you need the culture to be driven by someone with taste, who trusts their own judgements on matters of culture, and you should not expect that you will get consensus on most things.
        My sense is there is broad buy-in that under-moderation is a much bigger issue than over-moderation. And also ‘convincing people in the comments’ doesn’t actually like… do anything. You would have to be able to convince every single person who is causing harm to the site, which of course is untenable and unrealistic. At some point, after you explained your reasons, you have to actually enforce the things that you argued for.
        See of course the standard Well-Kept Gardens Die By Pacifism:
        In the beginning, while the community is still thriving, censorship seems like a terrible and unnecessary imposition. Things are still going fine. It’s just one fool, and if we can’t tolerate just one fool, well, we must not be very tolerant. Perhaps the fool will give up and go away, without any need of censorship. And if the whole community has become just that much less fun to be a part of… mere fun doesn’t seem like a good justification for (gasp!) censorship, any more than disliking someone’s looks seems like a good reason to punch them in the nose.
        (But joining a community is a strictly voluntary process, and if prospective new members don’t like your looks, they won’t join in the first place.)
        And after all—who will be the censor? Who can possibly be trusted with such power?
        Quite a lot of people, probably, in any well-kept garden. But if the garden is even a little divided within itself —if there are factions—if there are people who hang out in the community despite not much trusting the moderator or whoever could potentially wield the banhammer—
        (for such internal politics often seem like a matter of far greater import than mere invading barbarians)
        —then trying to defend the community is typically depicted as a coup attempt. Who is this one who dares appoint themselves as judge and executioner? Do they think their ownership of the server means they own the people? Own our community? Do they think that control over the source code makes them a god?
        I confess, for a while I didn’t even understand why communities had such trouble defending themselves—I thought it was pure naivete. It didn’t occur to me that it was an egalitarian instinct to prevent chieftains from getting too much power. “None of us are bigger than one another, all of us are men and can fight; I am going to get my arrows”, was the saying in one hunter-gatherer tribe whose name I forget. (Because among humans, unlike chimpanzees, weapons are an equalizer—the tribal chieftain seems to be an invention of agriculture, when people can’t just walk away any more.)
        Maybe it’s because I grew up on the Internet in places where there was always a sysop, and so I take for granted that whoever runs the server has certain responsibilities. Maybe I understand on a gut level that the opposite of censorship is not academia but 4chan (which probably still has mechanisms to prevent spam). Maybe because I grew up in that wide open space where the freedom that mattered was the freedom to choose a well-kept garden that you liked and that liked you, as if you actually could find a country with good laws. Maybe because I take it for granted that if you don’t like the archwizard, the thing to do is walk away (this did happen to me once, and I did indeed just walk away).
        And maybe because I, myself, have often been the one running the server. But I am consistent, usually being first in line to support moderators—even when they’re on the other side from me of the internal politics. I know what happens when an online community starts questioning its moderators. Any political enemy I have on a mailing list who’s popular enough to be dangerous is probably not someone who would abuse that particular power of censorship, and when they put on their moderator’s hat, I vocally support them—they need urging on, not restraining. People who’ve grown up in academia simply don’t realize how strong are the walls of exclusion that keep the trolls out of their lovely garden of “free speech”.
        Any community that really needs to question its moderators, that really seriously has abusive moderators, is probably not worth saving. But this is more accused than realized, so far as I can see.
        In any case the light didn’t go on in my head about egalitarian instincts (instincts to prevent leaders from exercising power) killing online communities until just recently. While reading a comment at Less Wrong, in fact, though I don’t recall which one.
        But I have seen it happen—over and over, with myself urging the moderators on and supporting them whether they were people I liked or not, and the moderators still not doing enough to prevent the slow decay. Being too humble, doubting themselves an order of magnitude more than I would have doubted them. It was a rationalist hangout, and the third besetting sin of rationalists is underconfidence.
        This about the Internet: Anyone can walk in. And anyone can walk out. And so an online community must stay fun to stay alive. Waiting until the last resort of absolute, blatent, undeniable egregiousness—waiting as long as a police officer would wait to open fire—indulging your conscience and the virtues you learned in walled fortresses, waiting until you can be certain you are in the right, and fear no questioning looks—is waiting far too late.
        I have seen rationalist communities die because they trusted their moderators too little.^[1]
        I have very extensively argued for my moderation principles, and also LessWrong has very extensively argued about the basic premise of Well-Kept Gardens Die By Pacifism. Of course, not everyone agrees, but both of these seem to me to I think create a pretty good asymmetric-weapons case for the things that I am de-facto doing as a head moderator.
        ^
        The post also ends with a call for people to downvote more, which I also mostly agree with, but also it just seems quite clear that de-facto a voting system is not sufficient to avoid these dynamics.
        Zack_M_Davis 26 Jul 2025 6:07 UTC
        3 points
        0
        Parent
        
        the core set of stakeholders is pretty united behind the meta-view that in order for a place like LessWrong to work, you need the ability to have a culture be driven by someone with taste
        
        Sorry, I don’t understand how this is consistent with the Public Archipelago doctrine, which I thought was motivated by different people wanting to have different kinds of discussions? I don’t think healthy cultures are driven by a dictator; I think cultures emerge from the interaction of their diverse members. We don’t all have to have exactly the same taste in order to share a website.
        
        I maintain hope that your taste is compatible with me and my friends and collaborators continuing to be able to use the website under the same rules as everyone else, as we have been doing for fifteen years. I have dedicated much of my adult life to the project of human rationality. (I was at the first Overcoming Bias meetup in February 2008.) If Less Wrong is publicly understood as the single conversational locus for people interested in the project of rationality, but its culture weren’t compatible with me and my friends and collaborators doing the intellectual work we’ve spent our lives doing here, that would be huge problem for my life’s work. I’ve made a lot of life decisions and investments of effort on the assumption that this is my well-kept garden, too; that I am not a “weed.” I trust you understand the seriousness of my position.
        
        And also ‘convincing people in the comments’ doesn’t actually like … do anything.
        
        Well, it depends on what cultural problem you’re trying to solve, right? If the problem you’re worried about is “Authors have to deal with unwanted comments, and the existing site functionality of user-level bans isn’t quite solving that problem yet, either because people don’t know about the feature or are uncomfortable using it”, you could publicize the feature more and encourage people to use it.
        
        That wouldn’t involve any changes to site policy; it would just be a matter of someone using speech to tell people about already-existing site functionality and thus to organically change the local culture.
        
        It wouldn’t even need to be a moderator: I thought about unilaterally making my own “PSA: You Can Ban Users From Commenting on Your Posts” post, but decided against it, because the post I could honestly write in my own voice wouldn’t be optimal for addressing the problems that I think you perceive.
        
        That is, speaking for myself in my own voice, I have been persuaded by Wei Dai’s arguments that user bans aren’t good because they censor criticism, which results in less accurate shared maps; I think people who use the feature (especially liberally) could be said to be making a rationality mistake. But crucially, that’s just my opinion, my own belief. I’m capable of sharing a website with other people who don’t believe the same things as me. I hope those people feel the same way about me.
        
        My understanding is that you don’t think that popularizing existing site functionality solves the cultural problems you perceive, because you’re worried about users “heap[ing] [...] scorn and snark and social punishment” on e.g. their own shortform. I maintain hope that this class of concern can be addressed somehow, perhaps by appropriately chosen clear rules about what sorts of speech are allowed on the topics of particular user bans or the user ban feature itself.
        
        I think clear rules are important in an Archipelago-type approach for defining how the different islands in the archipelago interact. Attitudes towards things like snark is one of the key dimensions along which I’d expect the islands in an archipelago to vary.
        
        I fear you might find this frustrating, but I’m afraid I still don’t have a good grasp of your conceptualization of what constitutes social punishment. I get the impression that in many cases, what me and my friends and collaborators would consider “sharing one’s honest opinion when it happens to be contextually relevant (including negative opinions, including opinions about people)”, you would consider social punishment. To be clear, it’s not that I’m pretending to be so socially retarded that I literally don’t understand the concept that sharing negative opinions is often intended as a social attack. (I think for many extreme cases, the two of us would agree on characterizing some speech as unambiguously an attack.)
        
        Rather, the concern is that a policy of forbidding speech that could be construed as social punishment would have a chilling effect on speech that is legitimate and necessary towards the site’s mission (particularly if it’s not clear to users how moderators are drawing the category boundary of “social punishment”). I think you can see why this is a serious concern: for example, it would be bad if you were required to pretend that people’s praise of the Trump administration’s AI Action plan was in good faith if you don’t actually think that (because bad faith accusations can be construed as social punishment).
        
        I just want to preserve the status quo where me and my friends and collaborators can keep using the same website we’ve been using for fifteen years under the same terms as everyone else. I think the status quo is fine. You want to get back to work. (Your real work, not whatever this is.) I want to get back to work. I think we can choose to get back to work.
        Expand this thread
        habryka 26 Jul 2025 6:32 UTC
        10 points
        2
        Parent
        We don’t all have to have exactly the same taste in order to share a website.
        Please don’t strawman me. I said no such thing, or anything that implies such things. Of course not everyone needs to have exactly the same taste to share a website. What I said is that the site needs taste to be properly moderated, which of course does not imply everyone on it needs to share that exact taste. You occupy spaces moderated by people with different tastes from you and the other people within it all the time.
        I maintain hope that your taste is compatible with me and my friends and collaborators continuing to be able to use the website under the same rules as everyone else, as we have been doing for fifteen years. I have dedicated much of my adult life to the project of human rationality. (I was at the first Overcoming Bias meetup in February 2008.) If Less Wrong is publicly understood as the single conversational locus for people interested in the project of rationality, but its culture weren’t compatible with me and my friends and collaborators doing the intellectual work we’ve spent our lives doing here, that would be huge problem for my life’s work. I’ve made a lot of life decisions and investments of effort on the assumption that this is my well-kept garden, too; that I am not a “weed.” I trust you understand the seriousness of my position.
        Yep, moderation sucks, competing access needs are real, and not everyone can share the same space, even within a broader archipelago (especially if one is determined to tear down that very archipelago). I do think you probably won’t get what you desire. I am genuinely sorry for this. I wish you good luck.^[1]
        Rather, the concern is that a policy of forbidding speech that could be construed as social punishment would have a chilling effect on speech that is legitimate and necessary towards the site’s mission (particularly if it’s not clear to users how moderators are drawing the category boundary of “social punishment”).
        Look, various commenters on LW including Said have caused much much stronger chilling effects than any moderation policy we have ever created, or will ever create. It is not hard to drive people out of a social space. You just have to be persistent and obnoxious and rules-lawyer every attempt at policing you. It really works with almost perfect reliability.
        forbidding speech that could be construed as social punishment
        And of course, nobody at any point was arguing (and indeed I was careful to repeatedly clarify) that all speech that could be construed as social punishment is to be forbidden. Many people will try to socially punish other people. The thing that one needs to reign in to create any kind of functional culture is social punishments of the virtues and values that are good and should be supported and are the lifeblood of the site by my lights.
        The absence of moderation does not create some special magical place in which speech can flow freely and truth can be seen clearly. You are welcome to go and share your opinions on 4chan or Facebook or Twitter or any other unmoderated place on the internet if you think that is how this works. You could even start posting on DataSecretLox if you are looking for something with more similar demographics as this place, and a moderation philosophy more akin to your own. The internet is full of places with no censorship, with nothing that should stand in the way of the truth by your lights, and you are free to contribute there.
        My models of online platforms say that if you want a place with good discussion the first priority is to optimize its signal-to-noise ratio, and make it be a place that sets the right social incentives. It is not anywhere close to the top priority to worry about every perspective you might be excluding when you are moderating. You are always excluding 99% of all positions. The question is whether you are making any kind of functional discussion space happen at all. The key to doing that is not absence of moderation, it’s presence of functional norms that produce a functional culture, which requires both leading by example and selection and pruning.
        I also more broadly have little interest in continuing this thread, so don’t expect further comments from me. Good luck. I expect I’ll write more some other time.
        ^
        Like, as in, I will probably ban Said.
        Said Achmiz 26 Jul 2025 6:58 UTC
        2 points
        −1
        Parent
        
        The thing that one needs to reign in to create any kind of functional culture is social punishments of the virtues and values that are good and should be supported and are the lifeblood of the site by my lights.
        
        Well, I agree with all of that except the last three words. Except that it seems to me that the things that you’d need to reign in is the social (and administrative) punishment that you are doing, not anything else.
        
        I’ve been reviewing older discussions lately. I’ve come to the conclusion that the most disruptive effects by far, among all discussions that I’ve been involved with, were created directly and exclusively by the LW moderators, and that if the mods had simply done absolutely nothing at all, most of those disruptions just wouldn’t have happened.
        
        I mean, take this discussion. I asked a simple question about the post. The author of the post (himself an LW mod!), when he got around to answering the question, had absolutely no trouble giving a perfectly coherent and reasonable answer. Neither did he show any signs of perceiving the question to be problematic in any way. And the testimony of multiple other commenters (including from longtime members who had contributed many useful comments over the years) affirmed that my question made sense and was highly relevant to the core point of the post.
        
        The only reason—the only reason!—why a simple question ended up leading to a three-digit-comment-count “meta” discussion about “moderation norms” and so on, was because you started that discussion. You, personally. If you had just done literally nothing at all, it would have been completely fine. A simple question would’ve been asked and then answered. Some productive follow-up discussion would’ve taken place. And that’s all.
        
        Many such cases.
        
        The absence of moderation does not create some special magical place in which speech can flow freely and truth can be seen clearly.
        
        It’s a good thing, then, that nobody in this discussion has called for the “absence of moderation”…
        
        My models of online platforms say that if you want a place with good discussion the first priority is to optimize its signal-to-noise ratio, and make it be a place that sets the right social incentives.
        
        I certainly agree with this.
        habryka 26 Jul 2025 7:00 UTC
        8 points
        3
        Parent
        Thanks Said. As you know, I have little interest in this discussion with you, as we have litigated it many times.
        Please don’t respond further to my comments. I am still thinking about this, but I will likely issue you a proper ban in the next few days. You will probably have an opportunity to say some final words if you desire.
        The only reason—the only reason!—why a simple question ended up leading to a three-digit-comment-count “meta” discussion about “moderation norms” and so on, was because you started that discussion. You, personally. If you had just done literally nothing at all, it would have been completely fine. A simple question would’ve been asked and then answered. Some productive follow-up discussion would’ve taken place. And that’s all.
        Look, this just feels like a kind of crazy catch-22. I weak-downvoted a comment, and answered a question you asked about why someone would downvote your comment. I was not responsible for anything but a small fraction of the relevant votes, nor do I consider any blame to have fallen upon me when honestly explaining my case for a weak-downvote. I did not start anything. You asked a question, I answered it, trying to be helpful in understanding where the votes came from.
        It really is extremely predictable that if you ask a question about why a thing was downvoted, that you will get a meta conversation about what is appropriate on the site and what is not.
        But again, please, let this rest. Find some other place to be. I am very likely the only moderator for this site that you are going to get, and as you seem to think my moderation is cause for much of your bad experiences, there is little hope in that changing for you. You are not going to change my mind in the 701st hour of comment thread engagement, if you didn’t succeed in the first 700.
    - Rafael Harth 9 Jul 2025 11:56 UTC
      2 points
      0
      Parent
      Alright—apologies for the long delay, but this response meant I had to reread the Scaling Hypothesis post, and I had some motivation/willpower issues in the last week. But I reread it now.
      
      I agree that the post is deliberately offensive at parts. E.g.:
      
      But I think they lack a vision. As far as I can tell: they do not have any such thing, because Google Brain & DeepMind do not believe in the scaling hypothesis the way that Sutskever, Amodei and others at OA do. Just read through machine learning Twitter to see the disdain for the scaling hypothesis. (A quarter year on from GPT-3 and counting, can you name a single dense model as large as the 17b Turing-NLG—never mind larger than GPT-3?)
      
      Google Brain is entirely too practical and short-term focused to dabble in such esoteric & expensive speculation, although Quoc V. Le’s group occasionally surprises you.
      
      or (emphasis added)
      
      OA, lacking anything like DM’s long-term funding from Google or its enormous headcount, is making a startup-like bet that they know an important truth which is a secret: “the scaling hypothesis is true!” So, simple DRL algorithms like PPO on top of large simple architectures like RNNs or Transformers can emerge, exploiting the blessings of scale, and meta-learn their way to powerful capabilities, enabling further funding for still more compute & scaling, in a virtuous cycle. [...]
      
      and probably the most offensive is the ending (wont quote to not clutter the reply, but it’s in Critiquing the Critics, especially from “What should we think about the experts?” onward). You’re essentially accusing all the skeptics of falling victim to a bundle of biases/signaling incentives, rather than disagreeing with you for rational reasons. So you were right, this is deliberately offensive.
      
      But I think the answer to the question—well actually let’s clarify what we’re debating, that might avoid miscommunication. You said this in your initial reply:
      
      I can definitely say on my own part that nothing of major value I have done as a writer online—whether it was popularizing Bitcoin or darknet markets or the embryo selection analysis or writing ‘The Scaling Hypothesis’—would have been done if I had cared too much about “vibes” or how it made the reader feel. (Many of the things I have written definitely did make a lot of readers feel bad. And they should have. There is something wrong with you if you can read, say, ‘Scaling Hypothesis’ and not feel bad. I myself regularly feel bad about it! But that’s not a bad thing.) Even my Wikipedia editing earned me doxes and death threats.
      
      So in a nutshell, I think we’re debating something like “will what I advocate mean you’ll be less effective as a writer” or more narrowly “will what I’m advocating for mean you couldn’t have written really valuable past pieces like the Scaling Hypothesis”. To me it still seems like the answer to both is a clear no.
      
      The main thing is, you’re treating my position as if it’s just “always be nice”, which isn’t correct. I’m very utilitarian (about commenting and in general) (one of my main insights from the conversation with Zack is that this is a genuine difference). I’ve argued repeatedly that Said’s comment is ineffective, basically because of what Scott said in How Not to Lose an Argument. It was obviously ineffective at persuading Gordon. Now Said argued that persuading the author isn’t the point, which I can sort of grant, but I think it will be similarly ineffective for anyone sympathetic to religion for the same reasons. So it’s not that I terminally value being nice,^[1] it’s that being nice is generally instrumentally useful, and would have been useful in Said’s case. But that doesn’t mean it’s necessarily always useful.
      
      I want to call attention my rephrasing of Said’s post. I still claim that this post would have been much more effective in criticizing Gordon’s post. Gordon would have reacted in more constructive way, and again, I think everyone else who sympathizes with religion is essentially in the same position. This seems to me like a really important point.
      
      So to clarify, I would not have objected to the Scaling Hypothesis post despite some rudeness. The rudeness has a purpose (the bolded sentence is the one that I remembered most from reading it all the way back, which is evidence for your claim that “those were some of the most effective parts”). And the context is also importantly different; you’re not directly replying to a skeptic; the post was likely to be read by lots of people who are undecided. And the fact that it was a super high effort post also matters because ‘how much effort does the other person put into this conversation’ is always one of the important parameters for vibes.
      
      I also wanna point out that your response was contradictory in an important way. (This isn’t meant as a gotcha, I think it capture the difference between “always be nice” and “maximize vibes for impact under the constraint of being honest and not misleading”.) Because you said that you wouldn’t have been successful if you worried about vibes, but also that you made the Scaling Hypothesis post deliberately offensive, which means you did care about vibes, you just didn’t optimize them to be nice in this case.
      
      Idk if this is worth adding, but two days ago I remembered something you wrote that I had mentally tagged as “very rude”, and where following my principles would mean you’re “not allowed” to write that. (So if you think that was important to write in this way, then we have a genuine disagreement.) That was your response to now-anonymous on your Clippy post, here. Here, my take (though I didn’t reread, this is mostly from memory) is something like
      
      the critique didn’t make a lot of sense because it boiled down to “you’re asserting that people would do xyz, but xyz is stupid”, which is a nonseqitor (“people do xyz” and “xyz is stupid” can both be true)
      your response was needlessly aggressive and you “lost” the argument in the sense that you failed the persuade the person who complained
      it was absolutely possible to write a better reply here; you could have just made the above point (i.e., “it being stupid doesn’t mean it’s unrealistic”) in a friendly tone and the result would probably been that the commenter realizes their mistake; the same is achieved with fewer words and it arguably makes you look better. I don’t see the downside.
      
      ↩︎
      Strictly speaking I do terminally value being nice a little bit because I terminally value people feeling good/bad, but I think the ‘improve everyone’s models about the world’ consideration dominates the calculation.