Rafael Harth comments on [Meta] New moderation tools and moderation guidelines

Rafael Harth 30 Jun 2025 12:07 UTC
19 points
3
Okay—I agree that the overall meaning of the comment is altered. If you have a categorical rule of “I want my meaning to be only this and exactly this, and anything that changes it is disqualified” then, yes, your object is valid. So consider my updated position to be something like, “your standard (A) has no rational justification, and also (B) relies a false model of how people write comments.” I’ll first argue (A), then (B).

Similar information, but not “exactly” the same information. Deleting the “very harmful false things” parenthetical omits the claim that the falsehoods promulgated by organized religion are very harmful. (That’s significant because someone focused on harm rather than epistemics might be okay with picking up harmless false beliefs, but not very harmful false beliefs.) Changing “very quickly you descend” to “you can descend” alters the speed and certainty with which religious converts are claimed to descend into nebulous and vague anti-epistemology. (That’s significant, because a potential convert being warned that they could descend into anti-epistemology might think, “Well, I’ll be extra careful not to do that, then,” whereas a warning that one very quickly will descend is less casually brushed off.)

It is logically coherent to have the () reactions. But do you think it’s plausible? What would be your honest probability assessment that a religious person reads this and actually goes that route—as in, they accept the claims of the comment but take the outs you describe in () -- whereas if they had read Said’s original comment instead, they’d still accept the premises, and this time they’d be convinced?

Conversely, one could imagine that a religious person reads Said’s version and doesn’t engage with it because they feel offended, whereas they would have engaged with it, and that the same person would have engaged with my version. (Which, obviously, I’d argue is more likely.)

At this point, my mental model of you responds with something like

You’re probably correct on the consequential analysis (i.e., the softened version would be more likely to be persuasive)^[1], but I don’t think it follows that we as a community should therefore moderate vibes because [very eloquently argued case about censorship being bad that I won’t try to replicate here]

To which I say, okay. Fine. I don’t think there is a slippery slope here, but I think arguing this is a losing battle. So I’ll stop with (A) here.

My case for (B) is that the algorithm which produced Said’s message didn’t take of these details into account, so changing them doesn’t censor or distort the intent behind the message. Said didn’t run an assessment of how harmful the consequences are exactly, determined that they’re most accurately described as “very harmful” rather than “harmful” or “extremely harmful”, and then posted it. Ditto with the other example.

I’m not sure if how much of any evidence I need here to make this point, but here are some ways in which you can see that the above is true
- if you did consider the meaning to this level of detail, then you wouldn’t write “very quickly you descend” because well, you might not descend, it’s not 100%, so you’d have to qualify this somehow.^[2]
- Thinking this carefully about the content of your messages takes a lot of time. Said doesn’t take this much time for his comments, which is how he can respond so quickly.
- If you thought about the actual merits of the proposal, then you’d scrap the entire second half of the comment, which is only tangentially relevant to the actual crux. You would be far more likely to point out that a good chunk of the post relies on this sentence
and to the extent anything that doesn’t consider itself a religion provides these, it’s because it’s imitating the package of things that makes something a religion.

… which is not justified in the post at all. This would be a vastly more useful critique!

So, you’re placing this extreme importance on the precise semantic meaning of Said’s comment, when the comment wasn’t that well thought-out in the first place. I’d be much more sympathetic to defending details of semantic meaning if those details had been carefully selected.

The thing that’s frustrating to me—not just this particular point in this conversation but the entire vibes debate—and which I should have probably pointed out much earlier—is that being more aware of vibes makes your messages less dependent on them, not more. Because noticing the influence allows you to adjust. If you realize a vibe is pushing you to write X, you can then be like, hold on that’s stupid, let me instead re-assess how whatever I’m responding to right now actually impacts the reasons why I believe the thing I believe. And then you’ll probably notice that what you’re pushed to write doesn’t really hit the crux at all and instead scrap it and write something else. (See the footnote^[3] for examples in this category.)

To put it extremely bluntly, the thing that was actually causally upstream of the details in Said’s message was not a careful consideration of the factual details; it was that he thinks religion is dumb and bad, which influenced a parameter sent to the language-generation module that output the message, which made it choose language that sounded more harsh. This is why it says “perfect example” and not “example”, why the third paragraph sounds so dismissive, why the message contains no !s, why he said “very quickly you descend” rather than “you can descend”, and so on. The vibe isn’t an accidental by-product, it’s the optimization target! Which you can clearly observe by the changes I’ve pointed out here.

… and on a very high level, to just give a sense of my actual views on this, the whole thing just seems ridiculously backwards in the sense that it doesn’t engage with what our brains are actually doing. Like I think it happens to be the case that not listening to vibes is often better (although this is a murky distinction because a lot of good thought relies on what are essentially vibes as well—it’s ultimately a form of computation), but the broader point is that, whatever you want to improve, more awareness of what’s actually going going to be good. Knowledge is power and all that.
1. ↩︎
  If you don’t think this, then that would be a crux, but also I’d be very surprised and, not sure how I’d continue the conversion then, but for now I’m not thinking too much about this.
2. ↩︎
  This is absurdly nit-picky but as are the changes you pointed out.
3. ↩︎
  Alright for example, the first thing I wrote when responding to your comment was about you quoting me saying “These two messages convey exactly the same information”. I actually meant to refer to the specific line I quoted only, where this statement was more defensible. But I asked myself, “does this actually matter for the crux” and the answer was no, so I scrapped it. The same thing is true for me quoting Gordon’s response and pointing out that it fits better with my model than yours, and a snide remark about how your () ascribes superhuman rationality powers to religious people in particular.
  
  Now you may be like, well those are good things, but that’s different from vibes. But it’s not really, it’s the same skill of, notice what your brain is actually doing, and if it’s dumb, interfere and make it do something else. More introspection is good.
  
  I guess the other difference is that I’m changing how I react here rather than how someone else reacts. I guess some people may view one as super good and the other as super bad (e.g., gwern’s comment gave off that vibe to me). To me these are both good for the same reason. Deliberately inserting unhelpful vibes into your comment is like uploading a post with formatting that you know will break the editor and then being like “well the editor only breaks because this part here is poorly programmed, if it were programmed better then it would do fine”. In any other context this would-pattern match to obviously foolish behavior. (“I don’t look before crossing the sidewalk because cars should stop.”) It’s only taken seriously because people are deluded about the degree to which vibes matter in practice.
  
  Anyway, I think you get the point. In retrospect I should have probably structured a lot of my writing about this differently, but can’t do that now.
What links here?
- Rafael Harth's comment on [Meta] New moderation tools and moderation guidelines by habryka (30 Jun 2025 12:19 UTC; 6 points)
- Zack_M_Davis 1 Jul 2025 4:18 UTC
  11 points
  6
  Parent
  
  What would be your honest probability assessment that a religious person reads this and actually goes that route
  
  Sorry, phrasing it in terms of “someone focused on harm”/”a potential convert being warned” might have been bad writing on my part, because what matters is the logical structure of the claim, not whether some particular target audience will be persuaded.
  
  Suppose I were to say, “Drug addiction is bad because it destroys the addict’s physical health and ability to function in Society.” I like that sentence and think it is true. But the reason it’s a good sentence isn’t because I’m a consequentialist agent whose only goal is to minimize drug addiction, and I’ve computed that that’s the optimal sentence to persuade people to not take drugs. I’m not, and it isn’t. (An addict isn’t going to magically summon the will to quit as a result of reading that sentence, and someone considering taking drugs has already heard it and might feel offended.) Rather, it’s a good sentence because it clearly explains why I think drug addiction is bad, and it would be dishonest to try to persuade some particular target audience with a line of reasoning other than the one that persuades me.
  
  Deliberately inserting unhelpful vibes into your comment is like uploading a post with formatting that you know will break the editor and then being like “well the editor only breaks because this part here is poorly programmed, if it were programmed better then it would do fine”. In any other context this would-pattern match to obviously foolish behavior. (“I don’t look before crossing the sidewalk because cars should stop.”)
  
  I don’t think those are good metaphors, because the function of a markup language or traffic laws is very different from the function of blog comments. We want documents to conform to the spec of the markup language so that our browsers know how to render them. We want cars and pedestrians to follow the traffic law in order to avoid dangerous accidents. In these cases, coordination is paramount: we want everyone to follow the same right-of-way convention, rather than just going into the road whenever they individually feel like it.
  
  In contrast, if everyone writes the blog comment they individually feel like writing, that seems good, because then everyone gets to read what everyone else individually felt like writing, rather than having to read something else, which would probably be less informative. We don’t need to coordinate the vibes. (We probably do want to coordinate the language; it would be confusing if you wrote your comments in English, but I wrote all my replies in French.)
  
  the thing that was actually causally upstream of the details in Said’s message [...] was that he thinks religion is dumb and bad, which influenced a parameter sent to the language-generation module that output the message, which made it choose language that sounded more harsh. [...] The vibe isn’t an accidental by-product
  
  Right, exactly. He thinks religion is dumb and bad, and he wrote a comment that expresses what he thinks, which ends up having harsh vibes. If the comment were edited to make the vibes less harsh, then it would be less clear exactly how dumb and bad the author thinks religion is. But it would be bad to make comments less clearly express the author’s thoughts, because the function of a comment is to express the author’s thoughts.
  
  whatever you want to improve, more awareness of what’s actually going going to be good
  
  Absolutely. For example, if everyone around me is obfuscating their actual thoughts because they’re trying to coordinate vibes, that distortion is definitely something I want to be tracking.
  
  to just give a sense of my actual views on this, the whole thing just seems ridiculously backwards
  
  The feeling is mutual?!
  What links here?
  - Rafael Harth's comment on [Meta] New moderation tools and moderation guidelines by habryka (4 Jul 2025 10:00 UTC; 13 points)
  - Zack_M_Davis's comment on [Meta] New moderation tools and moderation guidelines by habryka (8 Jul 2025 5:07 UTC; 4 points)
  - Rafael Harth 1 Jul 2025 22:29 UTC
    6 points
    2
    Parent
    what matters is the logical structure of the claim, not whether some particular target audience will be persuaded.
    
    Right, exactly. He thinks religion is dumb and bad, and he wrote a comment that expresses what he thinks, which ends up having harsh vibes. If the comment were edited to make the vibes less harsh, then it would be less clear exactly how dumb and bad the author thinks religion is. But it would be bad to make comments less clearly express the author’s thoughts, because the function of a comment is to express the author’s thoughts.
    
    Oh. Oh. So you agree with me that the details weren’t that well thought out (or at least didn’t bother arguing against it), and ditto about the net effects, but you don’t think it matters (or at any rate, isn’t the important point) because you’re not trying to optimize positive effects, but just honest communication...?
    
    This is not what I thought your position was, but I guess it makes sense if I try to retroactively fit it. This means most (all?) of my objections don’t apply anymore. Like, yeah, if you terminally value authentically representing the author’s emotional state of mind, then of course deliberately adjusting vibes is a net negative for your values.
    
    I don’t think those are good metaphors, because the function of a markup language or traffic laws is very different from the function of blog comments. We want documents to conform to the spec of the markup language so that our browsers know how to render them. We want cars and pedestrians to follow the traffic law in order to avoid dangerous accidents. In these cases, coordination is paramount: we want everyone to follow the same right-of-way convention, rather than just going into the road whenever they individually feel like it.
    
    In contrast, if everyone writes the blog comment they individually feel like writing, that seems good, because then everyone gets to read what everyone else individually felt like writing, rather than having to read something else, which would probably be less informative. We don’t need to coordinate the vibes. (We probably do want to coordinate the language; it would be confusing if you wrote your comments in English, but I wrote all my replies in French.)
    
    (I think this completely misses the point I was trying to make, which is that “I will do X which I know will have bad effects, but I’ll do it anyway because the reason it has bad effects is that other people are making mistakes, so it’s not me who should change X, but other people who should change” is recognized as dumb for almost all values of X, especially on LW—but I also think this doesn’t matter anymore, either, because the argument is again about consequences, which you just demoted as the optimization target. If you agree that it doesn’t matter anymore, then no need to discuss this more.)
    
    I guess now I have a few questions
    
    Why do you have this position? (i.e., that comments aren’t about impact). Is this supposed to be, like, the super obvious message that was clearly the main point of the sequences, or something like that?
    Is your default model of LWians that most of them have this position?
    You said earlier that the repeated moderation blow-ups aren’t about bad vibes. I feel like what you’ve said since justifies why you think Said’s comments are good, but not that they aren’t about vibes—like even with everything you said here, it still seems like the causal stream here is clearly bad vibes → people complain to harbyka → Said gets in trouble? (This isn’t super important, but still felt worth asking.)
    - Zack_M_Davis 2 Jul 2025 6:02 UTC
      6 points
      0
      Parent
      
      Why do you have this position? (i.e., that comments aren’t about impact).
      
      Because naïvely optimizing for impact requires concealing or distorting information that people could have used to make better (more impactful) decisions in ways that can’t realistically be anticipated by writers naïvely optimizing for impact.
      
      Here’s an example from Ben Hoffman’s “The Humility Argument for Honesty”. Suppose my neck hurts (coincidentally, after trying a new workout routine), and after some internet research, I decide I have neck cancer. The impact-oriented approach would call for me to do my best to convince my doctor I have neck cancer, to make sure that I get the chemotherapy I’m sure I need. The honesty-oriented approach would call for me to explain to my doctor the evidence and reasoning for why I think I have neck cancer.
      
      Maybe there’s something to be said for the impact-oriented approach if my self-diagnoses are never wrong. But if there’s a chance I could be wrong, the honesty-oriented approach is much more robust. If I don’t really have neck cancer and describe my actual symptoms, the doctor has a chance to help me discover my mistake.
      
      Is your default model of LWians that most of them have this position?
      
      No. But that’s OK with me, because I don’t regard “other people who use one of the same websites as me” as a generic authority figure.
      
      it still seems like the causal stream here is clearly bad vibes → people complain to harbyka → Said gets in trouble?
      
      Yes, that sounds right. As you’ve gathered, I want to delete the second arrow rather than altering the value of the “vibes” node.
      - Rafael Harth 4 Jul 2025 10:00 UTC
        13 points
        0
        Parent
        
        No. But that’s OK with me, because I don’t regard “other people who use one of the same websites as me” as a generic authority figure.
        
        Was definitely not going to make an argument from authority, just trying to understand your world view.
        
        Iirc we’ve touched on four (increasingly strong) standards for truth
        
        Don’t lie
        (I won’t be the best at phrasing this) something like “don’t try to make someone believe things for reasons that have nothing to do with why you believe it”
        Use only the arguments that convinced you (the one you mentioned here
        Make sure the comment accurately reflects your emotional state^[1] about the situation.
        
        For me, I endorse #1, and about 80% endorse #2 (you said in an earlier comment that #1 is too weak, and I agree). #3 seems pretty bad to me because the most convincing arguments to me don’t have to be the most convincing arguments the others (and indeed, they’re often not), and the argument that persuaded me initially especially doesn’t need to be good. And #4 seems extremely counter-productive both because it’ll routinely make people angry and because so much of one’s state of mind at any point is determined by irrelevant variables. It seems only slightly less crazy than—and in fact very similar to—the radical honesty stuff. (Only in the most radical interpretation of #4 is like that, but as I said in the footnote, the most radical interpretation is what you used when you applied it to Said’s commenting style, so that’s the one I’m using here.)
        
        Here’s an example from Ben Hoffman’s “The Humility Argument for Honesty” [...]
        
        This is not a useful example though because it doesn’t differentiate between any two points on this 1-4 scale. You don’t even need to agree with #1 to realize that trying to convince the doctor is a bad idea; all you need to do is realize that they’re more competent than you at understanding symptoms. A non-naive purely impact based approach just describes symptoms honestly in this situation.
        
        My sense is that examples that prefer something stronger than #2 will be hard to come up with. (Notably your argument for why a higher standard is better was itself consequentialist.)
        
        Idk, I mean we’ve drifted pretty far off the original topic and we don’t have to talk any more about this if you’re not interested (and also you’ve already been patient in describing your model). I’m just getting this feeling—vibe! -- of “hmm no this doesn’t seem quite right, I don’t think Zack genuinely believed #1-#4 all this time and everything was upstream of that, this position is too extreme and doesn’t really align with the earliest comment about the moderation debate, I think there’s still some misunderstanding here somewhere”, so my instinct is to dig a little deeper to really get your position. Although I could be wrong, too. In any case, like I said, feel free to end the conversation here.
        
        ↩︎
        Re-reading this comment again, you said ‘thought’, which maybe I should have criticized because it’s not a thought. How annoyed you are by something isn’t an intellectual position, it’s a feeling. It’s influenced by beliefs about the thing, but also by unrelated things like how you’re feeling about the person you’re talking to (RE what I’ve demonstrated with Said).
        
        Zack_M_Davis 8 Jul 2025 5:07 UTC
        4 points
        0
        Parent
        
        Was definitely not going to make an argument from authority, just trying to understand your world view.
        
        Right. Sorry, I think I uncharitably interpreted “Do you think others agree?” as an implied “Who are you to disagree with others?”, but you’ve earned more charity than that. (Or if it’s odd to speak of “earning” charity, say that I unjustly misinterpreted it.)
        
        the argument that persuaded me initially especially doesn’t need to be good
        
        Right. I tried to cover this earlier when I said “(a cleaned-up refinement of) my thought process” (emphasis added). When I wrote about eschewing “line[s] of reasoning other than the one that persuades me”, it’s persuades in the present tense because what matters is the justifactory structure of the belief, not the humdrum causal history.
        
        you said ‘thought’, which maybe I should have criticized because it’s not a thought. How annoyed you are by something isn’t an intellectual position, it’s a feeling. It’s influenced by beliefs about the thing, but also by unrelated things
        
        There’s probably a crux somewhere near here. Your formulation of #4 seems bad because, indeed, my emotions shouldn’t be directly relevant to an intellectual discussion of some topic. But I don’t think that gives you license to say, “Ah, if emotions aren’t relevant, therefore no harm is done by rewriting your comments to be nicer,” because, as I’ve said, I think the nicewashing does end up distorting the content. The feelings are downstream of the beliefs and can’t be changed arbitrarily.
        Said Achmiz 4 Jul 2025 10:33 UTC
        4 points
        0
        Parent
        
        It’s influenced by beliefs about the thing, but also by unrelated things like how you’re feeling about the person you’re talking to (RE what I’ve demonstrated with Said).
        
        I want to note that I dispute that you demonstrated this.
- Said Achmiz 30 Jun 2025 22:34 UTC
  4 points
  0
  Parent
  
  At this point, my mental model of you responds with something like
  
  You’re probably correct on the consequential analysis (i.e., the softened version would be more likely to be persuasive)[1], but I don’t think it follows that we as a community should therefore moderate vibes because [very eloquently argued case about censorship being bad that I won’t try to replicate here]
  
  If you don’t think this, then that would be a crux, but also I’d be very surprised and, not sure how I’d continue the conversion then, but for now I’m not thinking too much about this.
  
  FWIW, I absolutely do not think that the “softened” version would be more likely to be persuasive. (I think that the “softened” version is much worse, even more so than Zack does.)
- Said Achmiz 30 Jun 2025 22:29 UTC
  −8 points
  −2
  Parent
  
  Thinking this carefully about the content of your messages takes a lot of time. Said doesn’t take this much time for his comments, which is how he can respond so quickly.
  
  Wrong:
  
  Consider a very short post (or comment), which—briefly, elegantly, with a minimum of words—expresses some transformative idea, or makes some stunningly incisive point. Forget, for now, the question of its quality, and consider instead: how much effort went into writing it? Do you tally up only the keystrokes? Or do you count also the years of thought and experience and work that allowed the writer to come up with this idea, and this sequence of words to express it? Do you count the knowledge of a lifetime?