RobertM comments on Policy for LLM Writing on LessWrong

RobertM 25 Mar 2025 20:49 UTC
2 points
0
If you had some vague prompt like “write an essay about how the field of alignment is misguided” and then proofread it you’ve met the criteria as laid out.
No, such outputs will almost certainly fail this criteria (since they will by default be written with the typical LLM “style”).
- Seth Herd 25 Mar 2025 20:56 UTC
  4 points
  0
  Parent
  That’s a good point and it does set at least a low bar of bothering to try.
  
  But they don’t have to try hard. They can almost just append the prompt with “and don’t write it in standard LLM style”.
  
  I think it’s a little more complex than that, but not much. Humans can’t tell LLM writing from human writing in controlled studies. The question isn’t whether you can hide the style or even if it’s hard, just how easy.
  
  Which raises the question of whether they’d even do that much, because of course they haven’t read the FAQ before posting.
  
  Really just making sure that new authors read SOMETHING about what’s appreciated here would go a long way toward reducing slop posts.
  - osmarks 26 Mar 2025 14:26 UTC
    11 points
    8
    Parent
    Average humans can’t distinguish LLM writing from human writing, presumably through lack of exposure and not trying (https://arxiv.org/abs/2502.12150 shows that it is not an extremely hard problem). We are much more Online than average.
    - gwern 15 Apr 2025 20:30 UTC
      12 points
      0
      Parent
      But the caveat there is that this is inherently a backwards-looking result:
      
      We consider GPT-4o (OpenAI, 2024), Claude-3.5-Sonnet (Anthropic, 2024), Grok-2 (xAI, 2024), Gemini-1.5-Pro (Google, 2024), and DeepSeek-V3 (DeepSeek-AI, 2024).
      
      So one way to put it would be that people & classifiers are good at detecting mid-2024-era chatbot prose. Unfortunately, somewhere after then, at least OpenAI and Google apparently began to target the problem of ChatGPTese (possibly for different reasons: Altman’s push into consumer companion-bots/personalization/social-networking, and Google just mostly ignoring RLHF in favor of capabilities), and the chatbot style seems to have improved substantially. Even the current GPT-4o doesn’t sound nearly as 4o-like as it did just back in November 2024. Since mode-collapse/ChatGPTese stuff was never a capabilities problem per se (just look at GPT-3!), but mostly just neglect/apathy on part of the foundation labs (as I’ve been pointing out since the beginning), it’s not a surprise that it could improve rapidly once they put (possibly literally) any effort into fixing it.
      
      Between the continued rapid increase in capabilities and paying some attention to esthetics & prose style and attackers slowly improving their infrastructure in the obvious ways, I expect over the course of 2025 that detecting prose from a SOTA model is going to get much more difficult. (And this excludes the cumulative effect on humans increasingly writing like ChatGPT.)
      
      EDIT: today on HN, a post was on the front page for several hours with +70 upvotes, despite being blatantly new-4o-written (and impressively vapid). Is this the highest-upvoted LLM text on HN to date? I suspect that if it is, we’ll soon see higher...
      - habryka 15 Apr 2025 20:57 UTC
        2 points
        0
        Parent
        It already has been getting a bunch harder. I am quite confident a lot of new submissions to LW are AI-generated, but the last month or two have made distinguishing them from human writing a lot harder. I still think we are pretty good, but I don’t think we are that many months away from that breaking as well.
        kave 15 Apr 2025 21:11 UTC
        2 points
        0
        Parent
        In particular, it’s hard to distinguish in the amount of time that I have to moderate a new user submission. Given that I’m trying to spend a few minutes on a new user, it’s very helpful to be able to rely on style cues.
    - Seth Herd 27 Mar 2025 0:34 UTC
      2 points
      0
      Parent
      Interesting! Do you think humans could pick up on word use that well? My perception is that humans mostly cue on structure to detect LLM slop writing, and that is relatively easily changed with prompts (although it’s definitely not trivial at this point—but I haven’t searched for recipes).
      
      I did concede the point, since the research I was thinkingg of didn’t use humans who’ve practiced detecting LLM writing.
      - osmarks 15 Apr 2025 13:15 UTC
        1 point
        0
        Parent
        We probably use a mix of strategies. Certainly people take “delve” and “tapestry” as LLM signals these days.
  - habryka 26 Mar 2025 19:42 UTC
    2 points
    1
    Parent
    I think it’s a little more complex than that, but not much. Humans can’t tell LLM writing from human writing in controlled studies. The question isn’t whether you can hide the style or even if it’s hard, just how easy.
    I am quite confident I can tell LLM writing from human writing. Yes, there are prompts sufficient to fool me, but only for a bit until I pick up on it. Adding “don’t write in a standard LLM style” would not be enough, and my guess is nothing that takes less than half an hour to figure out would be enough.
    - Seth Herd 27 Mar 2025 0:03 UTC
      2 points
      0
      Parent
      I concede the point. That’s a high bar for getting LLM submissions past you. I don’t know of studies that tested people who’d actually practiced detecting LLM writing.
      
      I’d still be more comfortable with a disclosure criteria of some sort, but I don’t have a great argument beyond valuing transparency and honesty.