Jonas Hallgren comments on Policy for LLM Writing on LessWrong

Jonas Hallgren 25 Mar 2025 8:28 UTC
12 points
0
So, I’ve got a question about the policy. My brain is just kind of weird so I really appreciate having claude being able to translate my thoughts into normal speak.
The case study is the following comments in the same comment section:
13 upvotes—written with help of claude
1 upvote (me) - written with the help of my brain only
I’m honestly quite tightly coupled to claude at this point, it is around 40-50% of my thinking process (which is like kind of weird when I think about it?) and so I don’t know how to think about this policy change?
- the gears to ascension 25 Mar 2025 13:09 UTC
  8 points
  2
  Parent
  I’m pretty sure this isn’t a policy change but rather a policy distillation, and you were operating under the policy described above already. eg, I often have conversations with AIs that I don’t want to bother to translate into a whole post, but where I think folks here would benefit from seeing the thread. what I’ll likely do is make the AI portions collapsible and the human portions default uncollapsed; often the human side is sufficient to make a point (when the conversation is basically just a human thinking out loud with some helpful feedback), but sometimes the AI responses provide significant insight not otherwise present that doesn’t get represented in subsequent human message (eg, when asking the AI to do a significant amount of thinking before responding).
  
  I’m not a moderator, but I predict your comment was and is allowed by this policy, because of #Humans_Using_AI_as_Writing_or_Research_Assistants.
- Seth Herd 25 Mar 2025 20:38 UTC
  4 points
  −6
  Parent
  If you wrote the whole thing, then prompted Claude to rewrite it, that would seem to “add significant value.” If you then read the whole thing carefully to say “that’s what I meant, and it didn’t make anything up I’m not sure about”, then you’ve more than met the requirement laid out here, right?
  
  They’re saying the second part is all you have to do. If you had some vague prompt like “write an essay about how the field of alignment is misguided” and then proofread it you’ve met the criteria as laid out. So if your prompt was essentially the complete essay, you’ve gone far beyond their standards it seems like.
  
  I personally would want to know that the author contributed much more than a vague prompt to get the process rolling, but that seems to be the standard for acceptance laid out here. I assume they’d prefer much mroe involvement on the prompting side, like you’re talking about doing.
  - RobertM 25 Mar 2025 20:49 UTC
    2 points
    0
    Parent
    If you had some vague prompt like “write an essay about how the field of alignment is misguided” and then proofread it you’ve met the criteria as laid out.
    No, such outputs will almost certainly fail this criteria (since they will by default be written with the typical LLM “style”).
    - Seth Herd 25 Mar 2025 20:56 UTC
      4 points
      0
      Parent
      That’s a good point and it does set at least a low bar of bothering to try.
      
      But they don’t have to try hard. They can almost just append the prompt with “and don’t write it in standard LLM style”.
      
      I think it’s a little more complex than that, but not much. Humans can’t tell LLM writing from human writing in controlled studies. The question isn’t whether you can hide the style or even if it’s hard, just how easy.
      
      Which raises the question of whether they’d even do that much, because of course they haven’t read the FAQ before posting.
      
      Really just making sure that new authors read SOMETHING about what’s appreciated here would go a long way toward reducing slop posts.
      - osmarks 26 Mar 2025 14:26 UTC
        11 points
        8
        Parent
        Average humans can’t distinguish LLM writing from human writing, presumably through lack of exposure and not trying (https://arxiv.org/abs/2502.12150 shows that it is not an extremely hard problem). We are much more Online than average.
        gwern 15 Apr 2025 20:30 UTC
        12 points
        0
        Parent
        But the caveat there is that this is inherently a backwards-looking result:
        
        We consider GPT-4o (OpenAI, 2024), Claude-3.5-Sonnet (Anthropic, 2024), Grok-2 (xAI, 2024), Gemini-1.5-Pro (Google, 2024), and DeepSeek-V3 (DeepSeek-AI, 2024).
        
        So one way to put it would be that people & classifiers are good at detecting mid-2024-era chatbot prose. Unfortunately, somewhere after then, at least OpenAI and Google apparently began to target the problem of ChatGPTese (possibly for different reasons: Altman’s push into consumer companion-bots/personalization/social-networking, and Google just mostly ignoring RLHF in favor of capabilities), and the chatbot style seems to have improved substantially. Even the current GPT-4o doesn’t sound nearly as 4o-like as it did just back in November 2024. Since mode-collapse/ChatGPTese stuff was never a capabilities problem per se (just look at GPT-3!), but mostly just neglect/apathy on part of the foundation labs (as I’ve been pointing out since the beginning), it’s not a surprise that it could improve rapidly once they put (possibly literally) any effort into fixing it.
        
        Between the continued rapid increase in capabilities and paying some attention to esthetics & prose style and attackers slowly improving their infrastructure in the obvious ways, I expect over the course of 2025 that detecting prose from a SOTA model is going to get much more difficult. (And this excludes the cumulative effect on humans increasingly writing like ChatGPT.)
        
        EDIT: today on HN, a post was on the front page for several hours with +70 upvotes, despite being blatantly new-4o-written (and impressively vapid). Is this the highest-upvoted LLM text on HN to date? I suspect that if it is, we’ll soon see higher...
        habryka 15 Apr 2025 20:57 UTC
        2 points
        0
        Parent
        It already has been getting a bunch harder. I am quite confident a lot of new submissions to LW are AI-generated, but the last month or two have made distinguishing them from human writing a lot harder. I still think we are pretty good, but I don’t think we are that many months away from that breaking as well.
        kave 15 Apr 2025 21:11 UTC
        2 points
        0
        Parent
        In particular, it’s hard to distinguish in the amount of time that I have to moderate a new user submission. Given that I’m trying to spend a few minutes on a new user, it’s very helpful to be able to rely on style cues.
        Seth Herd 27 Mar 2025 0:34 UTC
        2 points
        0
        Parent
        Interesting! Do you think humans could pick up on word use that well? My perception is that humans mostly cue on structure to detect LLM slop writing, and that is relatively easily changed with prompts (although it’s definitely not trivial at this point—but I haven’t searched for recipes).
        
        I did concede the point, since the research I was thinkingg of didn’t use humans who’ve practiced detecting LLM writing.
        osmarks 15 Apr 2025 13:15 UTC
        1 point
        0
        Parent
        We probably use a mix of strategies. Certainly people take “delve” and “tapestry” as LLM signals these days.
      - habryka 26 Mar 2025 19:42 UTC
        2 points
        1
        Parent
        I think it’s a little more complex than that, but not much. Humans can’t tell LLM writing from human writing in controlled studies. The question isn’t whether you can hide the style or even if it’s hard, just how easy.
        I am quite confident I can tell LLM writing from human writing. Yes, there are prompts sufficient to fool me, but only for a bit until I pick up on it. Adding “don’t write in a standard LLM style” would not be enough, and my guess is nothing that takes less than half an hour to figure out would be enough.
        Seth Herd 27 Mar 2025 0:03 UTC
        2 points
        0
        Parent
        I concede the point. That’s a high bar for getting LLM submissions past you. I don’t know of studies that tested people who’d actually practiced detecting LLM writing.
        
        I’d still be more comfortable with a disclosure criteria of some sort, but I don’t have a great argument beyond valuing transparency and honesty.
- habryka 25 Mar 2025 20:47 UTC
  2 points
  0
  Parent
  The first one fails IMO on “don’t use the stereotypical writing style of LLM assistants”, but seems probably fine on the other ones (a bit hard to judge without knowing how much are your own ideas). You also disclose the AI writing at the bottom, which helps, though it would be better for it to be at the top. I think it’s plausible I would have given a warning for this.
  I think the comment that you write with “the help of my brain only” is better than the other one, so in as much as you have a choice, I would choose to do more of that.