Zack_M_Davis comments on [Meta] New moderation tools and moderation guidelines

Zack_M_Davis 1 Jul 2025 4:18 UTC
11 points
6

What would be your honest probability assessment that a religious person reads this and actually goes that route

Sorry, phrasing it in terms of “someone focused on harm”/”a potential convert being warned” might have been bad writing on my part, because what matters is the logical structure of the claim, not whether some particular target audience will be persuaded.

Suppose I were to say, “Drug addiction is bad because it destroys the addict’s physical health and ability to function in Society.” I like that sentence and think it is true. But the reason it’s a good sentence isn’t because I’m a consequentialist agent whose only goal is to minimize drug addiction, and I’ve computed that that’s the optimal sentence to persuade people to not take drugs. I’m not, and it isn’t. (An addict isn’t going to magically summon the will to quit as a result of reading that sentence, and someone considering taking drugs has already heard it and might feel offended.) Rather, it’s a good sentence because it clearly explains why I think drug addiction is bad, and it would be dishonest to try to persuade some particular target audience with a line of reasoning other than the one that persuades me.

Deliberately inserting unhelpful vibes into your comment is like uploading a post with formatting that you know will break the editor and then being like “well the editor only breaks because this part here is poorly programmed, if it were programmed better then it would do fine”. In any other context this would-pattern match to obviously foolish behavior. (“I don’t look before crossing the sidewalk because cars should stop.”)

I don’t think those are good metaphors, because the function of a markup language or traffic laws is very different from the function of blog comments. We want documents to conform to the spec of the markup language so that our browsers know how to render them. We want cars and pedestrians to follow the traffic law in order to avoid dangerous accidents. In these cases, coordination is paramount: we want everyone to follow the same right-of-way convention, rather than just going into the road whenever they individually feel like it.

In contrast, if everyone writes the blog comment they individually feel like writing, that seems good, because then everyone gets to read what everyone else individually felt like writing, rather than having to read something else, which would probably be less informative. We don’t need to coordinate the vibes. (We probably do want to coordinate the language; it would be confusing if you wrote your comments in English, but I wrote all my replies in French.)

the thing that was actually causally upstream of the details in Said’s message [...] was that he thinks religion is dumb and bad, which influenced a parameter sent to the language-generation module that output the message, which made it choose language that sounded more harsh. [...] The vibe isn’t an accidental by-product

Right, exactly. He thinks religion is dumb and bad, and he wrote a comment that expresses what he thinks, which ends up having harsh vibes. If the comment were edited to make the vibes less harsh, then it would be less clear exactly how dumb and bad the author thinks religion is. But it would be bad to make comments less clearly express the author’s thoughts, because the function of a comment is to express the author’s thoughts.

whatever you want to improve, more awareness of what’s actually going going to be good

Absolutely. For example, if everyone around me is obfuscating their actual thoughts because they’re trying to coordinate vibes, that distortion is definitely something I want to be tracking.

to just give a sense of my actual views on this, the whole thing just seems ridiculously backwards

The feeling is mutual?!
- Rafael Harth 1 Jul 2025 22:29 UTC
  6 points
  2
  Parent
  what matters is the logical structure of the claim, not whether some particular target audience will be persuaded.
  
  Right, exactly. He thinks religion is dumb and bad, and he wrote a comment that expresses what he thinks, which ends up having harsh vibes. If the comment were edited to make the vibes less harsh, then it would be less clear exactly how dumb and bad the author thinks religion is. But it would be bad to make comments less clearly express the author’s thoughts, because the function of a comment is to express the author’s thoughts.
  
  Oh. Oh. So you agree with me that the details weren’t that well thought out (or at least didn’t bother arguing against it), and ditto about the net effects, but you don’t think it matters (or at any rate, isn’t the important point) because you’re not trying to optimize positive effects, but just honest communication...?
  
  This is not what I thought your position was, but I guess it makes sense if I try to retroactively fit it. This means most (all?) of my objections don’t apply anymore. Like, yeah, if you terminally value authentically representing the author’s emotional state of mind, then of course deliberately adjusting vibes is a net negative for your values.
  
  I don’t think those are good metaphors, because the function of a markup language or traffic laws is very different from the function of blog comments. We want documents to conform to the spec of the markup language so that our browsers know how to render them. We want cars and pedestrians to follow the traffic law in order to avoid dangerous accidents. In these cases, coordination is paramount: we want everyone to follow the same right-of-way convention, rather than just going into the road whenever they individually feel like it.
  
  In contrast, if everyone writes the blog comment they individually feel like writing, that seems good, because then everyone gets to read what everyone else individually felt like writing, rather than having to read something else, which would probably be less informative. We don’t need to coordinate the vibes. (We probably do want to coordinate the language; it would be confusing if you wrote your comments in English, but I wrote all my replies in French.)
  
  (I think this completely misses the point I was trying to make, which is that “I will do X which I know will have bad effects, but I’ll do it anyway because the reason it has bad effects is that other people are making mistakes, so it’s not me who should change X, but other people who should change” is recognized as dumb for almost all values of X, especially on LW—but I also think this doesn’t matter anymore, either, because the argument is again about consequences, which you just demoted as the optimization target. If you agree that it doesn’t matter anymore, then no need to discuss this more.)
  
  I guess now I have a few questions
  - Why do you have this position? (i.e., that comments aren’t about impact). Is this supposed to be, like, the super obvious message that was clearly the main point of the sequences, or something like that?
  - Is your default model of LWians that most of them have this position?
  - You said earlier that the repeated moderation blow-ups aren’t about bad vibes. I feel like what you’ve said since justifies why you think Said’s comments are good, but not that they aren’t about vibes—like even with everything you said here, it still seems like the causal stream here is clearly bad vibes → people complain to harbyka → Said gets in trouble? (This isn’t super important, but still felt worth asking.)
  - Zack_M_Davis 2 Jul 2025 6:02 UTC
    6 points
    0
    Parent
    
    Why do you have this position? (i.e., that comments aren’t about impact).
    
    Because naïvely optimizing for impact requires concealing or distorting information that people could have used to make better (more impactful) decisions in ways that can’t realistically be anticipated by writers naïvely optimizing for impact.
    
    Here’s an example from Ben Hoffman’s “The Humility Argument for Honesty”. Suppose my neck hurts (coincidentally, after trying a new workout routine), and after some internet research, I decide I have neck cancer. The impact-oriented approach would call for me to do my best to convince my doctor I have neck cancer, to make sure that I get the chemotherapy I’m sure I need. The honesty-oriented approach would call for me to explain to my doctor the evidence and reasoning for why I think I have neck cancer.
    
    Maybe there’s something to be said for the impact-oriented approach if my self-diagnoses are never wrong. But if there’s a chance I could be wrong, the honesty-oriented approach is much more robust. If I don’t really have neck cancer and describe my actual symptoms, the doctor has a chance to help me discover my mistake.
    
    Is your default model of LWians that most of them have this position?
    
    No. But that’s OK with me, because I don’t regard “other people who use one of the same websites as me” as a generic authority figure.
    
    it still seems like the causal stream here is clearly bad vibes → people complain to harbyka → Said gets in trouble?
    
    Yes, that sounds right. As you’ve gathered, I want to delete the second arrow rather than altering the value of the “vibes” node.
    - Rafael Harth 4 Jul 2025 10:00 UTC
      13 points
      0
      Parent
      
      No. But that’s OK with me, because I don’t regard “other people who use one of the same websites as me” as a generic authority figure.
      
      Was definitely not going to make an argument from authority, just trying to understand your world view.
      
      Iirc we’ve touched on four (increasingly strong) standards for truth
      
      Don’t lie
      (I won’t be the best at phrasing this) something like “don’t try to make someone believe things for reasons that have nothing to do with why you believe it”
      Use only the arguments that convinced you (the one you mentioned here
      Make sure the comment accurately reflects your emotional state^[1] about the situation.
      
      For me, I endorse #1, and about 80% endorse #2 (you said in an earlier comment that #1 is too weak, and I agree). #3 seems pretty bad to me because the most convincing arguments to me don’t have to be the most convincing arguments the others (and indeed, they’re often not), and the argument that persuaded me initially especially doesn’t need to be good. And #4 seems extremely counter-productive both because it’ll routinely make people angry and because so much of one’s state of mind at any point is determined by irrelevant variables. It seems only slightly less crazy than—and in fact very similar to—the radical honesty stuff. (Only in the most radical interpretation of #4 is like that, but as I said in the footnote, the most radical interpretation is what you used when you applied it to Said’s commenting style, so that’s the one I’m using here.)
      
      Here’s an example from Ben Hoffman’s “The Humility Argument for Honesty” [...]
      
      This is not a useful example though because it doesn’t differentiate between any two points on this 1-4 scale. You don’t even need to agree with #1 to realize that trying to convince the doctor is a bad idea; all you need to do is realize that they’re more competent than you at understanding symptoms. A non-naive purely impact based approach just describes symptoms honestly in this situation.
      
      My sense is that examples that prefer something stronger than #2 will be hard to come up with. (Notably your argument for why a higher standard is better was itself consequentialist.)
      
      Idk, I mean we’ve drifted pretty far off the original topic and we don’t have to talk any more about this if you’re not interested (and also you’ve already been patient in describing your model). I’m just getting this feeling—vibe! -- of “hmm no this doesn’t seem quite right, I don’t think Zack genuinely believed #1-#4 all this time and everything was upstream of that, this position is too extreme and doesn’t really align with the earliest comment about the moderation debate, I think there’s still some misunderstanding here somewhere”, so my instinct is to dig a little deeper to really get your position. Although I could be wrong, too. In any case, like I said, feel free to end the conversation here.
      
      ↩︎
      Re-reading this comment again, you said ‘thought’, which maybe I should have criticized because it’s not a thought. How annoyed you are by something isn’t an intellectual position, it’s a feeling. It’s influenced by beliefs about the thing, but also by unrelated things like how you’re feeling about the person you’re talking to (RE what I’ve demonstrated with Said).
      - Zack_M_Davis 8 Jul 2025 5:07 UTC
        4 points
        0
        Parent
        
        Was definitely not going to make an argument from authority, just trying to understand your world view.
        
        Right. Sorry, I think I uncharitably interpreted “Do you think others agree?” as an implied “Who are you to disagree with others?”, but you’ve earned more charity than that. (Or if it’s odd to speak of “earning” charity, say that I unjustly misinterpreted it.)
        
        the argument that persuaded me initially especially doesn’t need to be good
        
        Right. I tried to cover this earlier when I said “(a cleaned-up refinement of) my thought process” (emphasis added). When I wrote about eschewing “line[s] of reasoning other than the one that persuades me”, it’s persuades in the present tense because what matters is the justifactory structure of the belief, not the humdrum causal history.
        
        you said ‘thought’, which maybe I should have criticized because it’s not a thought. How annoyed you are by something isn’t an intellectual position, it’s a feeling. It’s influenced by beliefs about the thing, but also by unrelated things
        
        There’s probably a crux somewhere near here. Your formulation of #4 seems bad because, indeed, my emotions shouldn’t be directly relevant to an intellectual discussion of some topic. But I don’t think that gives you license to say, “Ah, if emotions aren’t relevant, therefore no harm is done by rewriting your comments to be nicer,” because, as I’ve said, I think the nicewashing does end up distorting the content. The feelings are downstream of the beliefs and can’t be changed arbitrarily.
      - Said Achmiz 4 Jul 2025 10:33 UTC
        4 points
        0
        Parent
        
        It’s influenced by beliefs about the thing, but also by unrelated things like how you’re feeling about the person you’re talking to (RE what I’ve demonstrated with Said).
        
        I want to note that I dispute that you demonstrated this.