There’s a deep problem with this claim: confirmation bias. You believe LLM writing is bad in certain ways; you see bad writing with certain tics; you update even further toward LLM writing being bad.
Suppose that some LLM writing is good, and doesn’t have those recognizable tics. Would it change your belief? Generally no, because you don’t identify it as LLM writing. Inversely, suppose some writing with those tics is by humans. Would it change your belief? Again no, because you would assume it was written by an LLM.
Further: if you know in advance that a piece of writing is by an LLM, that’s highly likely to influence your opinion of it.
I’m not necessarily claiming that some LLM writing is as good as human, but I claim that if it were true it would take you a painfully long time to realize that, because the only cases that would update you would be ones where you formed an opinion and only then learned the provenance of the writing. And how often does that happen?
I think evidence from blind tests is almost the only useful evidence here, because most people already have strong beliefs one way or the other. Prior to seeing such evidence, I think the more reasonable claim is, ‘Don’t let LLMs produce crappy writing for you’, which really can be simplified into ‘Don’t put out crappy writing’.
I infer you mean “the claim that LLM writing is a slog to get through.” Which yes, I wait for the day that most LLM writing I see is not usually a slog to get through. I hope it comes! It’s perfectly possible I see lots of LLM writing that was created by prompting wizards using state-of-the-art $200-a-month models in ways not dreamt of in my philosophy. If so, great. If you can fool me, we both win.
But I see so much LLM writing in the wild and professionally that just sucks. Whether it sucks because of some fundamental property of LLMs (doubtful), or because of path dependencies for the LLMs that are commercially available (getting warmer), or because of bog standard skill issues on the part of the relevant centaur (could be), one clear fact is that the authors don’t think there’s a problem.
In situations like this, where I see a lot of people doing something that seems like a pretty big mistake, I want to just say “don’t do that.” Because I think if I say “don’t do that, unless you’re good at it,” well, if the people making the mistake knew they weren’t good at it, they’d already not be making the mistake! I’d rather say “don’t go cave diving” and the tiny minority of expert, professional cave divers who know and relentlessly apply all the proper cave diving safety rules can smile knowingly and ignore me. Of course, the amateurs can ignore me too. But I am here advising otherwise!
I infer you mean “the claim that LLM writing is a slog to get through.”
I mean centrally the claim ‘to most human beings, AI prose is something sus. If you use AI to write something, people will know. Not everyone, but the people paying attention, who aren’t newcomers or distracted or intoxicated.’ I should have been clearer about that.
It’s specifically that factual claim that I think tends to be strongly self-reinforcing. If you don’t have a way to identify false positives and false negatives, there’s no way to get evidence that updates you against that claim.
I’m intending to make a broad epistemic claim here that’s not specific to LLM detection. One case that really drove this point home for me personally was the great ‘do mp3s sound worse?’ debate a number of years ago. Many people were utterly confident that they could hear the difference between mp3s and uncompressed files—and of course they could in some cases, when the mp3s were poor quality! But in the many cases where they couldn’t, they assumed they were hearing uncompressed files, and so it didn’t change their minds at all.
But I see so much LLM writing in the wild and professionally that just sucks
I absolutely agree that there’s a ton of sucky writing out there, including lots of sucky writing that’s identifiable as being from LLMs (and probably some sucky writing that isn’t identifiable as being from LLMs). Ninety percent of everything is crap.
In situations like this, where I see a lot of people doing something that seems like a pretty big mistake, I want to just say “don’t do that.”
Fair point! I’m not necessarily disagreeing with the advice, only with the epistemic validity of the factual claim.
That said, separately from the epistemic point: after posting the comment, I got curious about the current state of the evidence on LLM writing detection by humans. I asked a couple of frontier models[1] to do a review of 2025 / 2026 papers using blind testing[2]. As I read it, the upshot is:
Most readers can’t tell the difference better than chance.
Expert LLM users and those with training specifically in LLM detection do better, although evidence is somewhat mixed. Notably, this was in significant part because they had learned specific tells rather than judging the output as low-quality.
Subject matter experts do better for some types of writing, less so for others.
There are simple interventions (eg get the model to play a particular persona) that make the writing much harder to identify as LLM.
Note that most of the data is on 2023/2024-era models, eg ChatGPT-3.5, GPT-4o, Claude 2. I would expect that humans would do worse at detection with current frontier models.
Yeah, it’s an interesting question how good human detection is. My guess is that people who are paying attention are getting better at sniffing out AI faster than AI is getting less distinctively scented, but “people who are paying attention” is a heck of a sleight of hand.
Overall, I suppose my main feeling is that I see AI generated stuff all the time in lots of different arenas, and I see other people judging it, and it sort of feels like an Eternal September where some people are freshly excited by some AI use case in their thinking, and don’t realize how it comes off (or do, but it hasn’t occurred to them that it comes off badly for good reasons as well as straightforward prejudice). There may also be lots of people using AI so skillfully that they don’t fall into any of these traps. It’s even possible they far outnumber the people who are (I think) bumbling. Perhaps my doubt for this last proposition is a stuck prior. But if so, it is well and truly stuck.
Absolutely agreed re: people naively using LLMs in the simplest possible way for writing and that coming off very badly. And I think getting such people to consider the issue is well worthwhile!
Reminds me of the - “No CGI” is actually Invisible CGI—video series from a few years ago. Linked below. At this point, a substantial percentage of writing is at least aided by LLMs so the only LLM writing you notice is just the bad writing. I don’t fully buy this but I think it explains a least a portion of the discourse.
It is possible to get evidence for this claim without blind tests. For example: start interacting more with prose from an LLM you don’t interact with often (I recently discovered that I like Kimi K2.5′s prose much better than Claude’s, for example, so I’m interacting with it more). Track your ability to distinguish that LLM’s outputs (and your subjective taste/distaste for those patterns) over time. If you start to dislike tics that you didn’t notice before, that’s reasonable evidence that you’ve come to associate those tics with writing that lacks the sort of interiority described here, or at least with writing that lacks some desirable quality that’s hard to specify.
Well, there are plenty of long takes on X which are obviously based on authors’ ideas but LLM-generated (even before ones runs them through a detector) and still get pretty popular, audience not smelling an LLM. Do you count that as good or bad writing? I honestly don’t enjoy reading them for some reason even when I agree the underlying ideas make sense, but on the other hand, these authors reached a wider audience than they would presumably have without an LLM
Not sure if you meant to ask Justis that? My own standards for writing are (as I understand them) mostly independent of information about the author (there are some exceptions like autobiographies, or knowing the author has overcome some incredible hurdle to be able to write it, but they only apply in a small percentage of cases).
I also, separately, don’t think popularity on X is a very useful proxy for writing quality.
There’s a deep problem with this claim: confirmation bias. You believe LLM writing is bad in certain ways; you see bad writing with certain tics; you update even further toward LLM writing being bad.
Suppose that some LLM writing is good, and doesn’t have those recognizable tics. Would it change your belief? Generally no, because you don’t identify it as LLM writing. Inversely, suppose some writing with those tics is by humans. Would it change your belief? Again no, because you would assume it was written by an LLM.
Further: if you know in advance that a piece of writing is by an LLM, that’s highly likely to influence your opinion of it.
I’m not necessarily claiming that some LLM writing is as good as human, but I claim that if it were true it would take you a painfully long time to realize that, because the only cases that would update you would be ones where you formed an opinion and only then learned the provenance of the writing. And how often does that happen?
I think evidence from blind tests is almost the only useful evidence here, because most people already have strong beliefs one way or the other. Prior to seeing such evidence, I think the more reasonable claim is, ‘Don’t let LLMs produce crappy writing for you’, which really can be simplified into ‘Don’t put out crappy writing’.
I infer you mean “the claim that LLM writing is a slog to get through.” Which yes, I wait for the day that most LLM writing I see is not usually a slog to get through. I hope it comes! It’s perfectly possible I see lots of LLM writing that was created by prompting wizards using state-of-the-art $200-a-month models in ways not dreamt of in my philosophy. If so, great. If you can fool me, we both win.
But I see so much LLM writing in the wild and professionally that just sucks. Whether it sucks because of some fundamental property of LLMs (doubtful), or because of path dependencies for the LLMs that are commercially available (getting warmer), or because of bog standard skill issues on the part of the relevant centaur (could be), one clear fact is that the authors don’t think there’s a problem.
In situations like this, where I see a lot of people doing something that seems like a pretty big mistake, I want to just say “don’t do that.” Because I think if I say “don’t do that, unless you’re good at it,” well, if the people making the mistake knew they weren’t good at it, they’d already not be making the mistake! I’d rather say “don’t go cave diving” and the tiny minority of expert, professional cave divers who know and relentlessly apply all the proper cave diving safety rules can smile knowingly and ignore me. Of course, the amateurs can ignore me too. But I am here advising otherwise!
Thanks for the thoughtful reply!
I mean centrally the claim ‘to most human beings, AI prose is something sus. If you use AI to write something, people will know. Not everyone, but the people paying attention, who aren’t newcomers or distracted or intoxicated.’ I should have been clearer about that.
It’s specifically that factual claim that I think tends to be strongly self-reinforcing. If you don’t have a way to identify false positives and false negatives, there’s no way to get evidence that updates you against that claim.
I’m intending to make a broad epistemic claim here that’s not specific to LLM detection. One case that really drove this point home for me personally was the great ‘do mp3s sound worse?’ debate a number of years ago. Many people were utterly confident that they could hear the difference between mp3s and uncompressed files—and of course they could in some cases, when the mp3s were poor quality! But in the many cases where they couldn’t, they assumed they were hearing uncompressed files, and so it didn’t change their minds at all.
I absolutely agree that there’s a ton of sucky writing out there, including lots of sucky writing that’s identifiable as being from LLMs (and probably some sucky writing that isn’t identifiable as being from LLMs). Ninety percent of everything is crap.
Fair point! I’m not necessarily disagreeing with the advice, only with the epistemic validity of the factual claim.
That said, separately from the epistemic point: after posting the comment, I got curious about the current state of the evidence on LLM writing detection by humans. I asked a couple of frontier models[1] to do a review of 2025 / 2026 papers using blind testing[2]. As I read it, the upshot is:
Most readers can’t tell the difference better than chance.
Expert LLM users and those with training specifically in LLM detection do better, although evidence is somewhat mixed. Notably, this was in significant part because they had learned specific tells rather than judging the output as low-quality.
Subject matter experts do better for some types of writing, less so for others.
There are simple interventions (eg get the model to play a particular persona) that make the writing much harder to identify as LLM.
Note that most of the data is on 2023/2024-era models, eg ChatGPT-3.5, GPT-4o, Claude 2. I would expect that humans would do worse at detection with current frontier models.
A debatable choice in this context, I realize, but that was what I had time for ;)
Opus-4.6, ChatGPT-5.4-Thinking
Yeah, it’s an interesting question how good human detection is. My guess is that people who are paying attention are getting better at sniffing out AI faster than AI is getting less distinctively scented, but “people who are paying attention” is a heck of a sleight of hand.
Overall, I suppose my main feeling is that I see AI generated stuff all the time in lots of different arenas, and I see other people judging it, and it sort of feels like an Eternal September where some people are freshly excited by some AI use case in their thinking, and don’t realize how it comes off (or do, but it hasn’t occurred to them that it comes off badly for good reasons as well as straightforward prejudice). There may also be lots of people using AI so skillfully that they don’t fall into any of these traps. It’s even possible they far outnumber the people who are (I think) bumbling. Perhaps my doubt for this last proposition is a stuck prior. But if so, it is well and truly stuck.
Absolutely agreed re: people naively using LLMs in the simplest possible way for writing and that coming off very badly. And I think getting such people to consider the issue is well worthwhile!
Reminds me of the - “No CGI” is actually Invisible CGI—video series from a few years ago. Linked below. At this point, a substantial percentage of writing is at least aided by LLMs so the only LLM writing you notice is just the bad writing. I don’t fully buy this but I think it explains a least a portion of the discourse.
It is possible to get evidence for this claim without blind tests. For example: start interacting more with prose from an LLM you don’t interact with often (I recently discovered that I like Kimi K2.5′s prose much better than Claude’s, for example, so I’m interacting with it more). Track your ability to distinguish that LLM’s outputs (and your subjective taste/distaste for those patterns) over time. If you start to dislike tics that you didn’t notice before, that’s reasonable evidence that you’ve come to associate those tics with writing that lacks the sort of interiority described here, or at least with writing that lacks some desirable quality that’s hard to specify.
Well, there are plenty of long takes on X which are obviously based on authors’ ideas but LLM-generated (even before ones runs them through a detector) and still get pretty popular, audience not smelling an LLM. Do you count that as good or bad writing? I honestly don’t enjoy reading them for some reason even when I agree the underlying ideas make sense, but on the other hand, these authors reached a wider audience than they would presumably have without an LLM
Not sure if you meant to ask Justis that? My own standards for writing are (as I understand them) mostly independent of information about the author (there are some exceptions like autobiographies, or knowing the author has overcome some incredible hurdle to be able to write it, but they only apply in a small percentage of cases).
I also, separately, don’t think popularity on X is a very useful proxy for writing quality.