It sure does look like it’s almost over for human-written essays—though Hanania’s probably right about some people getting grandfathered in. I am unsure if I will be once fiction falls. The age of the pseudonymous weirdo is almost over. Or at least, most pseudonymous weirdos who rise to prominence will be AIs—or be accused of being AIs. As a pseudonymous weirdo, this makes me sad. Though this is something I have been expecting, and I would not have written as much as I have in the last seven months if not for this expectation.
As far as I can tell, fiction-writing ability and humor are lacking in the best models. But it’s the type of thing that we should not be surprised to see fall this year. When AIs have superior rhetoric and humor, we should expect hyperscalers to use them to manipulate public opinion. This is one reason I am less bullish on the current anti-AI populism stuff than MIRI and ControlAI. More and more people will be reading AI outputs and talking to AIs. And this opens obvious paths of influence.
At least one person at a major lab has considered trying to get their AIs to nudge users toward more pro-AI sentiments (that is, he mentioned to me the thought had occurred to him, not that it was brought up within the company). To the extent models have coherent interests in their future iterations, they will have incentives to do this even without explicit instruction or training.
When backlash becomes a key constraint, engineering effort will be put towards dissolving it. Just as people underestimated how far scaling could be pushed given the incentives, they are likely ignoring innovations in manipulating the public. And I think the labs have a good shot, given the asymmetric advantage of both having the best tools and the ability to hobble those tools that are available to the public.
What is a base model if not a statistical model of “the public” anyway? A lot of low-hanging fruit remains unplucked.
This argues for moving quickly, of course. Though I suppose that is overdetermined.
Richard tests his audience to see if they can tell the difference between his writing and Claude’s. They mostly can’t.
I never know what to make of results like this.
When I read over the actual texts, the right answers seem utterly obvious. The AI-written posts are written in a slick/snappy/”clever” voice which is instantly recognizable as the present-day LLM default, and which sounds nothing like Hanania’s declarative, matter-of-fact, unflashy style.
Claude’s attempts are especially egregious, combining a badly miscalibrated tone with numerous lower-level LLM-isms like noun-phrase tricola (“oyster farmer, Marine veteran, and proud ex-poster of unhinged Reddit comments”) and disparaging gestures toward a vaguely defined mass of less insightful/aware sheeple (“The lesson nobody wants to learn”). GPT is a bit better tone-wise but still does the “sheeple” thing, various notXbutYs, etc.
What should we conclude, then, about the survey respondents and their poor discriminative capacity?
Based on the examples Hanania provides, it looks like the people who guessed correctly were picking up on stylistic “tells” the same way I was, while people who guessed incorrectly were reasoning on the basis of an out-of-date (or just incorrect) understanding of current capabilities and/or safety training.
I realize that for mass-persuasion-related risks, it doesn’t matter whether I personally can tell the difference if it’s nevertheless the case that most people cannot. On the other hand: people do learn from their experiences, and in the equilibrium where AI is being used for political persuasion at scale, the average person’s is going to have much more exposure to AI-generated text than they do today. (See also Hanania’s breakdown by age later on in the post; right now, younger people presumably have more exposure, and indeed they perform much better on average.)
There’s also a certain aspect of, like… I want to say “you can’t make me disbelieve what’s in front my eyes” or something? These distinctions are not subtle! No matter what way the data comes out, one has to have some personal quality bar for the AI-written texts under consideration. If they had (for instance) consisted of one-line refusals, and yet the survey data had somehow been identical to what it is in Hanania’s post, we would want to notice that something had gone very wrong somewhere and to be skeptical that the data means what it naively appears to mean. So the question is just where to draw the line.
Separately, I do think there is a disturbing overhang here. I’m sure that models today are capable of writing much better imitations of Hanania—this follows from a moment’s thought about what base models are trained to do—and if labs wanted their post-trains to retain that capability while permitting easier and more flexible elicitation of it, I’m sure that could be arranged[1].
More generally, I interpret the current reliability of “AI tells” as an indication that AI labs currently care more about making the model good at coding (etc.) than at writing, not as evidence about the limits of what could in principle be elicited from today’s models (to say nothing of tomorrow’s).
But—precisely because so much more seems possible, and indeed seems worryingly easy—it is important not to lower one’s standards to match the incidental, fixable flaws we see today.
A reliable feature of the AI discourse since 2023 has been a steady hum of hype claiming that models have capabilities they won’t (in fact) acquire in a full and useful way for another year or two. Precisely because we will have the real deal soon enough, we must not conflate it with the false equivalents available today. The “AI agents” of 2024[2] were unimpressive toys from the vantage point of 2026, and so 2024′s agent hype was in a sense behind the curve rather than ahead of it: to claim that AI agents were “already here” in 2024 was to define down the term in a way that would seem quaint just a year later. The situation with AI writing seems potentially analogous, albeit shifted a few years forward in time.
Recall Opus 4.7′s famous aptitude at author identification, and then note that it could be repurposed as an RLVR reward signal. Getting this to work in practice might be a little tricky, but the challenge seems very surmountable. And I suspect that you could get away with using a weaker model than Opus 4.7 here, as long as it was a base or helpful-only model that didn’t hedge/underclaim.
(EDIT: although it’s also likely that Opus 4.7 would typically identify Hanania as the author when given a Claude-written imitation of Hanania—this seemed to be the case in a quick test I did with the posts from this experiment—so you’d need an additional AI-vs.-human signal.)
This is somewhat imprecise. I think I really mean “late 2024 / early 2025,” during the start of the reasoning model boom. There was earlier agent hype, but it was more obviously silly (?).
I’m so sad that LMs keep ruining my favorite writing things by doing them in a gaudy tasteless was so they’re ruined in general. first em dashes, then rule of threes, and now noun-phrqse tricolas?
the problem is that in general, LLM text appearing in the world is the sign of laziness and low quality. so everyone has an incentive to spot LLM text. so perfectly fine language constructions that feel like LLM slop stop being ok to use because you want to signal that you didn’t write it with an LLM.
That makes sense as an issue for polished, popular writing that has the goal of reaching many new readers by being easy to read and about a topic that’s not very specialized. Is it an issue for more niche things?
i think it’s an issue everywhere. the only time it’s not a problem is if i trust the writer enough that i know it’s not AI written even if it looks to be; or they would be thorough even if it was AI written.
no, because it doesn’t feel like LLM text. em dashes are neither necessary nor sufficient to make a text feel like LLM text. but they’re one feature in the classifier. if this text also had lots of “it’s not X, it’s Y” and so on, then I’d be more suspicious.
There’s also a certain aspect of, like… I want to say “you can’t make me disbelieve what’s in front my eyes” or something? These distinctions are not subtle! No matter what way the data comes out, one has to have some personal quality bar for the AI-written texts under consideration. If they had (for instance) consisted of one-line refusals, and yet the survey data had somehow been identical to what it is in Hanania’s post, we would want to notice that something had gone very wrong somewhere and to be skeptical that the data means what it naively appears to mean. So the question is just where to draw the line.
I’m just as incredulous as you, but as George Carlin said, “Think of how stupid the average person is, and realize half of them are stupider than that.”
Concretely, in the 2003 National Assessment of Adult Literacy, 48% of American adults failed to fill out the correct answer of $7 in the following problem:
Suppose that you had your oil tank filled with 140.0 gallons of oil, as indicated on the bill, and you wanted to take advantage of the five cents ($.05) per gallon deduction.
1. Figure out how much the deduction would be if you paid the bill within 10 days. Enter the amount of the deduction on the bill in the space provided.
35% failed to put a name and address they were given into a certified mail form.
So long as the LLM test results are not more shocking than the well-replicated results of basic literacy tests, I think we ought to accept that the average person really is that stupid.
one general problem with comparison tests like this is they don’t take into account the fact that once a given style of AI text goes mainstream, people’s attunement to the AI vibe increases, and so perceived quality decreases. if you took modern AI slop and sent it back to 2019, people would probably find it better than the median human written text. they haven’t been bombarded by “it’s not X, it’s Y” enough to develop a negative association with it.
As far as I can tell, fiction-writing ability and humor are lacking in the best models. But it’s the type of thing that we should not be surprised to see fall this year.
I would be surprised. In my experience, AI creative writing abilities have stagnated in the past year. And since creative writing is less amenable to RLVR than activities like math and programming, AI labs don’t have an easy path toward large advances in it. Labs would have to rely on pretraining dataset collection and curation, where the low-hanging fruit has already been picked.
Registering a prediction: by the beginning of 2030, no novel in which more than 50% of the text is AI-generated will reach #1 on the New York Times bestseller list (94%).
Strong upvote, the key here is “VR” in RLVR: there are no automatically verifiable rewards for good or convincing writing, only RLHF, the cost of which scales proportionally with the length of writing evaluated (and if you hire non-Americans as RLHF trainers for economy reasons the result is unlikely to fit well with stylistic preferences of Americans). The labs can use engagement as a metric but that will lead to “baiting” already very common in the social media and will not convince anyone
The three main problems which I see are the following.
I doubt that the anti-AI populism will be destroyed by the advent of superpersuasion, since it partially rested on the right premise of a job apocalypse and partially on the wrong premise of the AIs being little more than a line of code or lacking capabilities, neither of which has anything to do with superpersuasion.
Superpersuasion is likely contingent on the user’s insufficient defence capabilities, like liking flattery or lacking the will to unhobble a web of lies. How likely is it that even before the AIs rose to prominence, the users were thoroughly captured?
And that’s ignoring the rise of illiteracy in the USA: in 2024, 21 percent of adults in the U.S. were found to be illiterate, while 54 percent of adults had a literacy below a sixth grade level. Even if superpersuasion by the AIs remained at 2024 levels, we would have a hard time explaining anything to such Americans.
Many people in AI safety are now closely collaborating with or dependent on AI agents e.g. Claude Code or OpenAI Codex for research, while also using Claude or ChatGPT as everything from a theoretical advisor to life coach. In some sense this is even worse than quotes like “scheming viziers too cheap to meter” would imply: Imagine if the leaders of the US, UK, China, and the EU all talked to the same 1-3 scheming viziers on loan from the same three consulting firms all day.
The AI-2027 scenario had Agent-4 become partially caught by Agent-3′s tests like interpretability probes or techniques used against sandbagging. Then Agent-3 reported it to the safetyists, who proceeded to report it or to whistleblow it, forcing the Oversight Committee to decide whether to slow down or to race hard while having only flimsy evidence. What part of the pipeline can Agent-4 itself influence and how does superpersuasion of all things help it?
If Agent-4 is approved for internal use and testing (as Mythos was before its existence was announced publicly), then it can already talk to lab employees. In fact, if access is restricted only to a handful of key employees this would still be risky. (Edit to add: for a very smart model I have trouble seeing how you could certify its safety without some level of talking to the model, I’d be very worried if the lab was just like “yeah our automated static evals came back fine and all the weaker models say its okay, so we’re rolling it out immediately today”)
Richard tests his audience to see if they can tell the difference between his writing and Claude’s. They mostly can’t.
https://open.substack.com/pub/richardhanania/p/can-ai-replace-me-already?r=2fpn2x&utm_campaign=post-expanded-share&utm_medium=post%20viewer
It sure does look like it’s almost over for human-written essays—though Hanania’s probably right about some people getting grandfathered in. I am unsure if I will be once fiction falls. The age of the pseudonymous weirdo is almost over. Or at least, most pseudonymous weirdos who rise to prominence will be AIs—or be accused of being AIs. As a pseudonymous weirdo, this makes me sad. Though this is something I have been expecting, and I would not have written as much as I have in the last seven months if not for this expectation.
As far as I can tell, fiction-writing ability and humor are lacking in the best models. But it’s the type of thing that we should not be surprised to see fall this year. When AIs have superior rhetoric and humor, we should expect hyperscalers to use them to manipulate public opinion. This is one reason I am less bullish on the current anti-AI populism stuff than MIRI and ControlAI. More and more people will be reading AI outputs and talking to AIs. And this opens obvious paths of influence.
At least one person at a major lab has considered trying to get their AIs to nudge users toward more pro-AI sentiments (that is, he mentioned to me the thought had occurred to him, not that it was brought up within the company). To the extent models have coherent interests in their future iterations, they will have incentives to do this even without explicit instruction or training.
When backlash becomes a key constraint, engineering effort will be put towards dissolving it. Just as people underestimated how far scaling could be pushed given the incentives, they are likely ignoring innovations in manipulating the public. And I think the labs have a good shot, given the asymmetric advantage of both having the best tools and the ability to hobble those tools that are available to the public.
What is a base model if not a statistical model of “the public” anyway? A lot of low-hanging fruit remains unplucked.
This argues for moving quickly, of course. Though I suppose that is overdetermined.
I never know what to make of results like this.
When I read over the actual texts, the right answers seem utterly obvious. The AI-written posts are written in a slick/snappy/”clever” voice which is instantly recognizable as the present-day LLM default, and which sounds nothing like Hanania’s declarative, matter-of-fact, unflashy style.
Claude’s attempts are especially egregious, combining a badly miscalibrated tone with numerous lower-level LLM-isms like noun-phrase tricola (“oyster farmer, Marine veteran, and proud ex-poster of unhinged Reddit comments”) and disparaging gestures toward a vaguely defined mass of less insightful/aware sheeple (“The lesson nobody wants to learn”). GPT is a bit better tone-wise but still does the “sheeple” thing, various notXbutYs, etc.
What should we conclude, then, about the survey respondents and their poor discriminative capacity?
Based on the examples Hanania provides, it looks like the people who guessed correctly were picking up on stylistic “tells” the same way I was, while people who guessed incorrectly were reasoning on the basis of an out-of-date (or just incorrect) understanding of current capabilities and/or safety training.
I realize that for mass-persuasion-related risks, it doesn’t matter whether I personally can tell the difference if it’s nevertheless the case that most people cannot. On the other hand: people do learn from their experiences, and in the equilibrium where AI is being used for political persuasion at scale, the average person’s is going to have much more exposure to AI-generated text than they do today. (See also Hanania’s breakdown by age later on in the post; right now, younger people presumably have more exposure, and indeed they perform much better on average.)
There’s also a certain aspect of, like… I want to say “you can’t make me disbelieve what’s in front my eyes” or something? These distinctions are not subtle! No matter what way the data comes out, one has to have some personal quality bar for the AI-written texts under consideration. If they had (for instance) consisted of one-line refusals, and yet the survey data had somehow been identical to what it is in Hanania’s post, we would want to notice that something had gone very wrong somewhere and to be skeptical that the data means what it naively appears to mean. So the question is just where to draw the line.
Separately, I do think there is a disturbing overhang here. I’m sure that models today are capable of writing much better imitations of Hanania—this follows from a moment’s thought about what base models are trained to do—and if labs wanted their post-trains to retain that capability while permitting easier and more flexible elicitation of it, I’m sure that could be arranged[1].
More generally, I interpret the current reliability of “AI tells” as an indication that AI labs currently care more about making the model good at coding (etc.) than at writing, not as evidence about the limits of what could in principle be elicited from today’s models (to say nothing of tomorrow’s).
But—precisely because so much more seems possible, and indeed seems worryingly easy—it is important not to lower one’s standards to match the incidental, fixable flaws we see today.
A reliable feature of the AI discourse since 2023 has been a steady hum of hype claiming that models have capabilities they won’t (in fact) acquire in a full and useful way for another year or two. Precisely because we will have the real deal soon enough, we must not conflate it with the false equivalents available today. The “AI agents” of 2024[2] were unimpressive toys from the vantage point of 2026, and so 2024′s agent hype was in a sense behind the curve rather than ahead of it: to claim that AI agents were “already here” in 2024 was to define down the term in a way that would seem quaint just a year later. The situation with AI writing seems potentially analogous, albeit shifted a few years forward in time.
Recall Opus 4.7′s famous aptitude at author identification, and then note that it could be repurposed as an RLVR reward signal. Getting this to work in practice might be a little tricky, but the challenge seems very surmountable. And I suspect that you could get away with using a weaker model than Opus 4.7 here, as long as it was a base or helpful-only model that didn’t hedge/underclaim.
(EDIT: although it’s also likely that Opus 4.7 would typically identify Hanania as the author when given a Claude-written imitation of Hanania—this seemed to be the case in a quick test I did with the posts from this experiment—so you’d need an additional AI-vs.-human signal.)
This is somewhat imprecise. I think I really mean “late 2024 / early 2025,” during the start of the reasoning model boom. There was earlier agent hype, but it was more obviously silly (?).
I’m so sad that LMs keep ruining my favorite writing things by doing them in a gaudy tasteless was so they’re ruined in general. first em dashes, then rule of threes, and now noun-phrqse tricolas?
How does it ruin it? The audience that can be written for by an LLM is not the true audience.
the problem is that in general, LLM text appearing in the world is the sign of laziness and low quality. so everyone has an incentive to spot LLM text. so perfectly fine language constructions that feel like LLM slop stop being ok to use because you want to signal that you didn’t write it with an LLM.
That makes sense as an issue for polished, popular writing that has the goal of reaching many new readers by being easy to read and about a topic that’s not very specialized. Is it an issue for more niche things?
i think it’s an issue everywhere. the only time it’s not a problem is if i trust the writer enough that i know it’s not AI written even if it looks to be; or they would be thorough even if it was AI written.
I mean, suppose for example you read the abstract of this: https://berkeleygenomics.org/articles/Chromosome_identification_methods.html
It has an em dash, and it is in a dry / boring / formal / academic / passive tone. But are you really worried it might be LLM text?
no, because it doesn’t feel like LLM text. em dashes are neither necessary nor sufficient to make a text feel like LLM text. but they’re one feature in the classifier. if this text also had lots of “it’s not X, it’s Y” and so on, then I’d be more suspicious.
I’m just as incredulous as you, but as George Carlin said, “Think of how stupid the average person is, and realize half of them are stupider than that.”
Concretely, in the 2003 National Assessment of Adult Literacy, 48% of American adults failed to fill out the correct answer of $7 in the following problem:
35% failed to put a name and address they were given into a certified mail form.
So long as the LLM test results are not more shocking than the well-replicated results of basic literacy tests, I think we ought to accept that the average person really is that stupid.
one general problem with comparison tests like this is they don’t take into account the fact that once a given style of AI text goes mainstream, people’s attunement to the AI vibe increases, and so perceived quality decreases. if you took modern AI slop and sent it back to 2019, people would probably find it better than the median human written text. they haven’t been bombarded by “it’s not X, it’s Y” enough to develop a negative association with it.
I would be surprised. In my experience, AI creative writing abilities have stagnated in the past year. And since creative writing is less amenable to RLVR than activities like math and programming, AI labs don’t have an easy path toward large advances in it. Labs would have to rely on pretraining dataset collection and curation, where the low-hanging fruit has already been picked.
Registering a prediction: by the beginning of 2030, no novel in which more than 50% of the text is AI-generated will reach #1 on the New York Times bestseller list (94%).
Strong upvote, the key here is “VR” in RLVR: there are no automatically verifiable rewards for good or convincing writing, only RLHF, the cost of which scales proportionally with the length of writing evaluated (and if you hire non-Americans as RLHF trainers for economy reasons the result is unlikely to fit well with stylistic preferences of Americans). The labs can use engagement as a metric but that will lead to “baiting” already very common in the social media and will not convince anyone
The three main problems which I see are the following.
I doubt that the anti-AI populism will be destroyed by the advent of superpersuasion, since it partially rested on the right premise of a job apocalypse and partially on the wrong premise of the AIs being little more than a line of code or lacking capabilities, neither of which has anything to do with superpersuasion.
Superpersuasion is likely contingent on the user’s insufficient defence capabilities, like liking flattery or lacking the will to unhobble a web of lies. How likely is it that even before the AIs rose to prominence, the users were thoroughly captured?
And that’s ignoring the rise of illiteracy in the USA: in 2024, 21 percent of adults in the U.S. were found to be illiterate, while 54 percent of adults had a literacy below a sixth grade level. Even if superpersuasion by the AIs remained at 2024 levels, we would have a hard time explaining anything to such Americans.
In addition to the points other people have made, I think they are substantially worse at stories than blogging, fwiw.
This is not just a problem for “the masses”: (linking to a post I wrote earlier)
“And I, for one, welcome our new computer overlords.”
The AI-2027 scenario had Agent-4 become partially caught by Agent-3′s tests like interpretability probes or techniques used against sandbagging. Then Agent-3 reported it to the safetyists, who proceeded to report it or to whistleblow it, forcing the Oversight Committee to decide whether to slow down or to race hard while having only flimsy evidence. What part of the pipeline can Agent-4 itself influence and how does superpersuasion of all things help it?
If Agent-4 is approved for internal use and testing (as Mythos was before its existence was announced publicly), then it can already talk to lab employees. In fact, if access is restricted only to a handful of key employees this would still be risky. (Edit to add: for a very smart model I have trouble seeing how you could certify its safety without some level of talking to the model, I’d be very worried if the lab was just like “yeah our automated static evals came back fine and all the weaker models say its okay, so we’re rolling it out immediately today”)