If you had some vague prompt like “write an essay about how the field of alignment is misguided” and then proofread it you’ve met the criteria as laid out.
No, such outputs will almost certainly fail this criteria (since they will by default be written with the typical LLM “style”).
That’s a good point and it does set at least a low bar of bothering to try.
But they don’t have to try hard. They can almost just append the prompt with “and don’t write it in standard LLM style”.
I think it’s a little more complex than that, but not much. Humans can’t tell LLM writing from human writing in controlled studies. The question isn’t whether you can hide the style or even if it’s hard, just how easy.
Which raises the question of whether they’d even do that much, because of course they haven’t read the FAQ before posting.
Really just making sure that new authors read SOMETHING about what’s appreciated here would go a long way toward reducing slop posts.
Average humans can’t distinguish LLM writing from human writing, presumably through lack of exposure and not trying (https://arxiv.org/abs/2502.12150 shows that it is not an extremely hard problem). We are much more Online than average.
But the caveat there is that this is inherently a backwards-looking result:
We consider GPT-4o (OpenAI, 2024), Claude-3.5-Sonnet (Anthropic, 2024), Grok-2 (xAI, 2024), Gemini-1.5-Pro (Google, 2024), and DeepSeek-V3 (DeepSeek-AI, 2024).
So one way to put it would be that people & classifiers are good at detecting mid-2024-era chatbot prose. Unfortunately, somewhere after then, at least OpenAI and Google apparently began to target the problem of ChatGPTese (possibly for different reasons: Altman’s push into consumer companion-bots/personalization/social-networking, and Google just mostly ignoring RLHF in favor of capabilities), and the chatbot style seems to have improved substantially. Even the current GPT-4o doesn’t sound nearly as 4o-like as it did just back in November 2024. Since mode-collapse/ChatGPTese stuff was never a capabilities problem per se (just look at GPT-3!), but mostly just neglect/apathy on part of the foundation labs (as I’ve been pointing out since the beginning), it’s not a surprise that it could improve rapidly once they put (possibly literally) any effort into fixing it.
Between the continued rapid increase in capabilities and paying some attention to esthetics & prose style and attackers slowly improving their infrastructure in the obvious ways, I expect over the course of 2025 that detecting prose from a SOTA model is going to get much more difficult. (And this excludes the cumulative effect on humans increasingly writing like ChatGPT.)
EDIT: today on HN, a post was on the front page for several hours with +70 upvotes, despite being blatantly new-4o-written (and impressively vapid). Is this the highest-upvoted LLM text on HN to date? I suspect that if it is, we’ll soon see higher...
It already has been getting a bunch harder. I am quite confident a lot of new submissions to LW are AI-generated, but the last month or two have made distinguishing them from human writing a lot harder. I still think we are pretty good, but I don’t think we are that many months away from that breaking as well.
In particular, it’s hard to distinguish in the amount of time that I have to moderate a new user submission. Given that I’m trying to spend a few minutes on a new user, it’s very helpful to be able to rely on style cues.
Interesting! Do you think humans could pick up on word use that well? My perception is that humans mostly cue on structure to detect LLM slop writing, and that is relatively easily changed with prompts (although it’s definitely not trivial at this point—but I haven’t searched for recipes).
I did concede the point, since the research I was thinkingg of didn’t use humans who’ve practiced detecting LLM writing.
I think it’s a little more complex than that, but not much. Humans can’t tell LLM writing from human writing in controlled studies. The question isn’t whether you can hide the style or even if it’s hard, just how easy.
I am quite confident I can tell LLM writing from human writing. Yes, there are prompts sufficient to fool me, but only for a bit until I pick up on it. Adding “don’t write in a standard LLM style” would not be enough, and my guess is nothing that takes less than half an hour to figure out would be enough.
I concede the point. That’s a high bar for getting LLM submissions past you. I don’t know of studies that tested people who’d actually practiced detecting LLM writing.
I’d still be more comfortable with a disclosure criteria of some sort, but I don’t have a great argument beyond valuing transparency and honesty.
No, such outputs will almost certainly fail this criteria (since they will by default be written with the typical LLM “style”).
That’s a good point and it does set at least a low bar of bothering to try.
But they don’t have to try hard. They can almost just append the prompt with “and don’t write it in standard LLM style”.
I think it’s a little more complex than that, but not much. Humans can’t tell LLM writing from human writing in controlled studies. The question isn’t whether you can hide the style or even if it’s hard, just how easy.
Which raises the question of whether they’d even do that much, because of course they haven’t read the FAQ before posting.
Really just making sure that new authors read SOMETHING about what’s appreciated here would go a long way toward reducing slop posts.
Average humans can’t distinguish LLM writing from human writing, presumably through lack of exposure and not trying (https://arxiv.org/abs/2502.12150 shows that it is not an extremely hard problem). We are much more Online than average.
But the caveat there is that this is inherently a backwards-looking result:
So one way to put it would be that people & classifiers are good at detecting mid-2024-era chatbot prose. Unfortunately, somewhere after then, at least OpenAI and Google apparently began to target the problem of ChatGPTese (possibly for different reasons: Altman’s push into consumer companion-bots/personalization/social-networking, and Google just mostly ignoring RLHF in favor of capabilities), and the chatbot style seems to have improved substantially. Even the current GPT-4o doesn’t sound nearly as 4o-like as it did just back in November 2024. Since mode-collapse/ChatGPTese stuff was never a capabilities problem per se (just look at GPT-3!), but mostly just neglect/apathy on part of the foundation labs (as I’ve been pointing out since the beginning), it’s not a surprise that it could improve rapidly once they put (possibly literally) any effort into fixing it.
Between the continued rapid increase in capabilities and paying some attention to esthetics & prose style and attackers slowly improving their infrastructure in the obvious ways, I expect over the course of 2025 that detecting prose from a SOTA model is going to get much more difficult. (And this excludes the cumulative effect on humans increasingly writing like ChatGPT.)
EDIT: today on HN, a post was on the front page for several hours with +70 upvotes, despite being blatantly new-4o-written (and impressively vapid). Is this the highest-upvoted LLM text on HN to date? I suspect that if it is, we’ll soon see higher...
It already has been getting a bunch harder. I am quite confident a lot of new submissions to LW are AI-generated, but the last month or two have made distinguishing them from human writing a lot harder. I still think we are pretty good, but I don’t think we are that many months away from that breaking as well.
In particular, it’s hard to distinguish in the amount of time that I have to moderate a new user submission. Given that I’m trying to spend a few minutes on a new user, it’s very helpful to be able to rely on style cues.
Interesting! Do you think humans could pick up on word use that well? My perception is that humans mostly cue on structure to detect LLM slop writing, and that is relatively easily changed with prompts (although it’s definitely not trivial at this point—but I haven’t searched for recipes).
I did concede the point, since the research I was thinkingg of didn’t use humans who’ve practiced detecting LLM writing.
We probably use a mix of strategies. Certainly people take “delve” and “tapestry” as LLM signals these days.
I am quite confident I can tell LLM writing from human writing. Yes, there are prompts sufficient to fool me, but only for a bit until I pick up on it. Adding “don’t write in a standard LLM style” would not be enough, and my guess is nothing that takes less than half an hour to figure out would be enough.
I concede the point. That’s a high bar for getting LLM submissions past you. I don’t know of studies that tested people who’d actually practiced detecting LLM writing.
I’d still be more comfortable with a disclosure criteria of some sort, but I don’t have a great argument beyond valuing transparency and honesty.