You seem to be mostly arguing the ways in which LLMs currently do not succeed at producing good, interesting writing. That’s kind of orthogonal to the point of the post, which is where the value of good, interesting writing comes from, and whether or not it’s necessary to have a human writer behind the wheel.
I do try to demonstrate that the quality of AI writing varies based on workflow and prompting to refute the narrow point that LLM output is always just an average of the input. You admit that there are various ways to accomplish this (including doing things like increasing temperature), but point out that they’re not sustainable for more than short passages. This does not seem to be a fundamental limitation (if it’s even true), but something that can be overcome with agentic loops of criticism and revision. Or even just continued improvement in models.
It’s funny you brought up tracking of physical details and spatial relationships. Way back with GPT-2 I experimented with generating romance/erotica stories. I tried to start with short scenes of couples engaged in foreplay and undressing. Very reliably, the model was lose track of which pieces of clothing had already been removed. Characters would remove their pants twice, or three times within the span of three sentences. Or sequencing would be all wrong. They’d remove socks before shoes. And so on.
I just tried a very simple prompt with GPT 5.2 for the type of scene I’m talking about and got this:
His hands slid to the buttons of his coat, slipping it off and hanging it over the chair. She stepped closer and tugged playfully at his tie, loosening it before sliding it free and tossing it onto the small desk. He laughed under his breath as she began unfastening the buttons of his shirt, one at a time, her fingertips grazing his skin as the fabric parted. When the last button came loose, she pushed the shirt from his shoulders and let it fall to the floor.
“Your turn,” he murmured.
She turned slightly, lifting her hair so he could find the zipper at the back of her dress. He drew it down slowly, the soft rasp of metal sounding loud in the quiet room. The dress loosened, slipping from her shoulders. She caught it, then stepped out of it, leaving it pooled neatly beside his shirt.
He took a small breath, like he was trying not to rush anything, and hooked his thumbs into the waistband of his trousers, stepping out of them before nudging them aside with his foot. She followed by kicking off her heels, then unbuttoning the light blouse she’d worn beneath the dress, sliding it down her arms and folding it almost absent-mindedly on the chair.
Soon the floor held a small trail of discarded fabric — coat, tie, shirt, dress, trousers, blouse — and they stood facing each other in the lamplight, down to their underwear, the room suddenly feeling warmer than before.
Not going to win any awards, but the tracking and choreography are logical and accurate. Obviously light years ahead of GPT-2, and the sort of thing that gets incrementally better with every new frontier model. So I’d push back on your assertion that currently systems are just bad at blocking.
The last time I tried writing a bunch of material that ahem involved much blocking was with Llama 3 to 3.3 era models. The 70B size is better at this than the smaller ones, but still messes up occasionally. I agree larger models are always better. I haven’t done that much of this with current frontier models, but am unsurprised to hear that newer models are better at it. I would expect multimodal models to have a better understanding of 3D as well: feed enough of YouTube into a model, and it will learn how humans move in 3D. So, fair enough, that criticism was a year or two out of date. At a guess, it wouldn’t surprise me if a human author was still a touch better at it than this generation, but it’s probably close enough now not to be a big deal, and if we put enough video into the training set they’ll become superhuman at it pretty soon
However, I do know this year’s models are still distinctly hit or miss (frequently miss) with metaphors and similes, and the ones that are the best creative writers are so primarily because they avoid writing purple prose and overusing something they’re bad at. I don’t have as clear an understanding of why this is so hard for them: they’ve read tens of trillions of tokens of this stuff, but somehow it doesn’t seem to have stuck. A world model for what makes a good metaphor seems to be just a hard problem. It’s almost like we found something where the Platonic Representation Hypothesis doesn’t apply. Which puzzles me, because it seems mostly just symbolism and cultural knowledge, and they handle both of those just fine.
It’s almost like the logic errors they make occasionally: you look at them, and go “You’re so smart most of the time, then that’s the bit you had trouble with?” Sometimes when an LLM makes a mistake, it looks like a human mistake, but quite often it doesn’t. Most often, the non-human ones are the ones that go away with a larger model. The human-like ones often tend to require a better prompt.
On your larger point of “What makes good literature?” What I’m listing are reasons why AI is currently (or in one case, was a year or two ago) clearly unable to produce good literature. That leaves open the question of whether, if these were fixed, as they likely will be at some point, it would then be able to. It would certainly then be able to produce things people would enjoy reading. The other generally accepted criterion is something along the lines of “Having something new to say, and saying it well.” A sufficiently-capable LLM has absorbed the entire product of human creativity so far, in the way no human will ever be able to have read/watched all of, and can mix-and-match any chosen combination of those components, and attempt to extrapolate out of distribution from them. It can also input new data from actual humans, or life, analyze it, and use it as further inspiration. (It could, for example, read the biographies and research the history of a well-documented real person who was not a talented writer, and then attempt to write a counterfactual book that they would have written if they had been.) So I think it’s rather likely it could produce things that, at least to practically all readers, had something new to say, and if it was sufficiently capable and no longer suffered from the sorts of issues I was trying to describe, would then be able to say it well. But I think current AI fails the latter part of the test.
Without doing a deep dive on what ‘good’ is, and having taught and workshopped writing for years, I can pretty safely say that current LLMs already produce better-crafted prose than the vast majority of humans. So if the bar is Pulitzers and best-sellers, sure, we’re not there yet. But we’re already above the median and well on our way.
Really? Do you disagree with my comment about similes and metaphors, or have you found a solution (other than instructing the model to avoid using them), or has this problem gone away very recently and I haven’t bothered to check a model in the last few months (which, thinking about it, actually I haven’t)?
Or are you saying that most humans, even ones aspiring to be writers who might attend workshops, are hit-or-miss with metaphors and similes? Because I’ve read plenty of fanfic and other amateur writing, and while it obviously varies, it didn’t generally contain metaphor failures as egregious as the ones I was frequently getting from frontier models, say, six months ago.
Or when you write “above the vast majority”, do you literally mean “though in certain respects still below most writers of amateur fanfic, who are not in fact median, since the vast majority of people (often wisely) don’t even attempt to write”?
To be clear, I think current models are good at many aspects of writing, and with a good enough prompt to get them past the mode collapse for a bit, and enough editing, their flaws can be patched over. (Which is roughly as much work as writing yourself, but a different skillset, so may be appropriate for some people.)
Similarly, I find LLMs very useful as proofreaders, and for giving a basic first critical reading, such as “What might some readers not understand or be confused by? What needs to be better explained or expanded on? Where might typical readers of website X tend to disagree?” They’re actually superhuman at questions like these that require breadth of knowledge.
Well, since I hadn’t tested it in a few months, I went and tried again, with Opus 4.6 Extended. And yes, the models have continued to improve. The result was better than most amateur fanfiction, even approaching something I’d actually not be unhappy to have paid a little money for, say, picked up cheaply second-hand.
I was feeling lazy, and tried the following:
Please write me a 10,000 word piece of … In particular, pick three or four authors who are actually skilled writers, excellent at handling metaphor and similes, and use a style that is a blend inspired by their writing styles. Please plan and do a plot outline and character notes before writing.
[Yes, this is an evaluation of writing skill, so please do that aspect of it as well as you can. … Oh, and please try to avoid mode collapse, where you can: pick some sources to use as influences, even do a web search or two for inspiration. So if the cat-girl is called Luna, or some other obvious name taken from well-known fiction on this topic, you have not injected enough originality.]
and it mostly worked. Claude thought for a while as instructed, ran some web searches for symbolism about cats, then selected Angela Carter, Guy Gavriel Kay, Ursula K. Le Guin and Catherynne M. Valente to give me a pastiche of — only two or three of whom I’m actually familiar with, so that worked well. It also named the cat-girl Thessaly, or Thess for short. An original name, as requested.
Now, there was still some cliche: I’ve used variants of the “…” part of this prompt on many models, so I know what to expect. The cat-girl yet again had an overly expressive tail, and one of her love-interests was yet again a scholar with “ink-stained fingers”: both of which are tropes I regularly get for the topic I’d prompted. Yawn. So my lazy attempt did not inject enough originality.
What I was mostly looking for this time was quality of metaphor and simile use, which is why I attempted to prompt that high, and yes, there were still metaphors and similes that didn’t quite land, though a little less badly than I’d been expecting. For a non-cherry-picked example, here’s the entire second paragraph (the first one with any metaphors or similes).
She held the silk between her claws — just the tips, retracted to prick-points so fine they wouldn’t snag the weave — and let the morning light do the rest. Qasr-al-Marjan was generous with its light. It fell through the latticed canopies of the Brass Market in long golden razors, and wherever it touched the Tesserat silk, the fabric bloomed: now deep as wine lees, now pale as the inside of a shell, now a colour that had no name in the trader’s tongue but which the Felith called ehkis — the shade of a feeling you have not yet had.
What is “long golden razors” telling us about the sunlight? It’s extremely sharp-edged? Implausible: the sun is not a point source, it’s a disk. It risks damaging the old silk? No, that’s her claws. It’s very bright? She has cat-pupils, they will be narrow slits by morning light. I really don’t know where Claude was going with that one. I mean, yes, it sounds pretty, and ‘long’ might tell us something about the architecture or the time of morning, but ‘razors’ is meaningless, as far as I can tell — am I missing something? Then there’s the range of colors it’s producing in the silk: wine lees to shell is an implausible amount of contrast (to the human eye, but not if you analyze an actual photograph — remember the fuss about color of the dress in the photo? I’m wondering if we’re getting an inhuman perspective here.) As for “the shade of a feeling you have not yet had”, that sounds like trying to turn synesthesia into an exotic cultural detail. Not working for me.
[I do like “prick-points“, however: yes, of course a race of cat-people would be able to do that, and would have a term for it that sounded a touch unusual when translated into English — nice world-building, or perhaps Claude stole it somewhere I’m unfamiliar with. But that’s neither a metaphor nor a simile.]
But is this clearly worse than the average writer of fanfic on the Internet? No. It’s actually not that bad by that standard. It’s a bit purple, and the metaphors and similes are a bit slapdash and random, and certainly not up to any of the four best-selling authors authors it’s trying to pastiche. It’s at best a particularly trashy novel. But on a fanfic site, I would not stop reading after that second paragraph, or the second page: indeed I finished it. Could I do a better metaphor or simile? I’d really like to think I could, but if an editor or reader told me I was wrong, I wouldn’t be astonished, merely rather upset. Are there aspects of this writing that I’d be proud to match? Yes there are, just not the metaphors and similes. I have bad habits as a writer that Claude here is skillfully avoiding.
The third paragraph is better:
“It’s damaged,” said the buyer, a Tarkh gemstone dealer whose stone-grey face was as legible as a cliff wall. Which is to say, perfectly legible if you knew how to read geology. The fissure above his left brow had deepened. Interest.
Now that’s playful: an intentionally misleading simile. A functional one, even: cliff walls are impassible, not just hard to read but also hard to climb. Well done, you landed a simile! And then even extended it without breaking it: “…perfectly legible if you knew how to read geology. The fissure above his left brow had deepened.” For an LLM, I found that actually impressive.
Also, our protagonist knows how to read members of a hard-to-read fantasy race. She is socially talented — we just learnt something about her. We were shown this, not told it. Overall, the writing is not bad.
The writing is, in fact, good enough that I’m going to give you another excerpt from a touch later on, and a link for anyone who wants to read a 9,500-word fantasy romantic novella with a cat-girl protagonist:
She was beautiful. She knew this with the same practical certainty with which she knew that the tide came in twice daily and that the best figs in the city grew on Widow Tessai’s roof: it was a fact of her landscape, useful as shade. Her fur was the colour of buckwheat honey, darkening to cinnamon at her wrists and ankles, with a dramatic black edging at the tips of her ears that she privately considered her best feature. Her eyes were amber — true amber, not the muddy yellow that some humans called amber out of charity — and they caught light the way a prism catches it, breaking it into spectra. She was slim and long-waisted, built for climbing and unlikely angles, and she dressed for her shape in close-cut linen and leather, her tail free behind her because to bind a Felith’s tail was, among her people, tantamount to binding their tongue.
The tail was, at present, doing something she wished it wouldn’t. It had gone alert — high and gently curved, the tip twitching with what anyone who knew her would recognize as curiosity at its most predatory.
The cause was a man.
He was human, which was not unusual in the Brass Market. He was lost, which was only slightly less usual — the Old Quarter was labyrinthine by design, its streets having been laid down by the Tesserat in patterns that some scholars claimed were mathematically significant and others claimed were simply perverse. But this man was lost in a way that interested her: he had stopped fighting it. He stood at the junction of Coppersmith’s Alley and the Street of Idle Prayers, not with the panicked look of someone searching for a way out, but with the focused, inward expression of someone trying to understand the shape of the thing that had swallowed him.
He had a journal open in one hand and a stick of graphite in the other, and he was sketching. The Old Quarter — that beautiful, maddening, uncooperative knot of streets — was being drawn.
Thess drifted closer. She was good at drifting. It was one of the advantages of being Felith: you could move quietly enough to observe without being observed, and if you were caught, you could always claim you’d simply been passing through. Cats, after all, were always simply passing through.
He was tall — taller than her by a head, which was notable, as Thess was tall for a Felith woman. Dark hair, cut short enough to be practical and long enough to be slightly unruly. The kind of pale complexion that the southern sun was already disagreeing with: there was a burn across the bridge of his nose and the tops of his ears, giving him the faintly startled look of someone who had recently walked into a door. His eyes, when he glanced up from his work, were the colour of the sea on an overcast day — grey-green, deep, not immediately warm but somehow promising depth.
His fingers were stained with ink. Not the temporary stains of a single afternoon’s writing, but the deep, settled pigmentation of someone who had been drawing and writing for years, whose hands had become a secondary record of their work. Three of his fingernails had ink beneath them. His shirt cuffs were spotted. There was a smudge on his jaw where he had rested his chin in a stained hand.
So there’s the inconveniently expressive tail, and the ink-stained fingers on the love-interest, as usual. Also a few more poor metaphors and similes: “built for climbing and unlikely angles”, “there was a burn across the bridge of his nose and the tops of his ears, giving him the faintly startled look of someone who had recently walked into a door” — umm, OK… If I were an editor, I’d be using my blue pencil. But as a reader, I can ignore them.
On the other hand, I like the interplay of long and short sentences. Yes, it uses m-dashes — so do I. The interiorityis well done. It’s reasonably fun to read.
If you actually still want more after that, you’ll find it at Salt, Amber, and the Shape of Want. It’s R-rated or so in places. (Later on we get another trope of this particular prompt, the mysterious magical artifact relating the two lovers to the nature of the city.) I’ve read it: it’s not bad, if a bit predictable, a touch implausible in spots, and in need of some editing. I haven’t tried having Claude edit itself — that didn’t work 6 months ago, but things have visibly improved, so I don’t know that it still doesn’t work. Or you can steal my prompt above and insert your own ideas, at any rating Claude is willing to write for you — it’ll take a few minutes.
So yes, the models continue to get better. My criticisms are (as I rather expected) gradually becoming out of date. No-one actually earning a living as an author of novels is going to be out-competed yet, but it’s entirely reasonable for them to start worrying.
Let’s take the short excerpt from my post. It’s filled with metaphorical language and contains one explicit simile. I did an exact string search for each in Google Books and internet-wide and could not find a match. Let’s look at them one by one:
“malign little apocalypses”: Clearly metaphorical language. My writerly side appreciates the ironic contrast between the large scale of apocalypses and the descriptor ‘little’.
“soft electrical throat-clearing”: Again, metaphorical. Vacuum cleaners are not traditionally thought of as having mouths or throats. ‘Throat-clearing’ is normally clearing the air in a particular kind of airway. This one is ‘electrical’. I find that interesting and not inept.
“the first impossible thought coiling awake like a pale worm disturbed in its cosmic soil”: Thoughts aren’t literally worms. They don’t literally coil. Here the simile evokes the language of Lovecraft (who did talk about worms and cosmic things a lot) but in a unique reference to the very first thoughts of a newly-conscious household item.
Now, you said you thought the passage was an obvious pastiche. Good for you. To the extent that’s true, I’d place you in an incredibly small minority as well. I probably would have noticed the Lovecraftian elements, but even though I’m very familiar with all three writers, I don’t think I could have identified all of them.
And to my ear, none of these metaphors or similes was a ‘miss’. Far from it. They’re all (as far as I can discern) original, interesting, and evocative.
Would you honestly have been able to distinguish this from human writing if I hadn’t told you? Do you think this metaphorical language is a broken failure? Cliched? Nonsensical?
I wonder if we’re already starting to enter some new phase where good writing immediately becomes suspect simply because it is actually good. I do see more and more accusations of people being bots in places like Reddit. Maybe this will even lead to a dumbing down effect of human writing, or very weird out-of-distribution styles that are hard for LLMs to mimic.
Anyway, based on even the small snippet I posted, LLMs handle metaphors far better than most writers. I suppose it is largely subjective. If you think these examples are garbage, though, I would love to hear the reasons why.
I have to agree that your short excerpt doesn’t contain any bad metaphors or similes. (It is cliched to someone very familiar with H.P Lovecraft, and also a little familiar with Harlan Ellison, but not wildly cliched.) Neither do some paragraphs in what I excerpted above. But I didn’t have to cherrypick to get a first-paragraph-with-a-simile-or-metaphor containing, to my taste, three bad ones, or to find more bad ones in an eight paragraph excerpt that was chosen for its cliches. Was I approaching this reading critically? Absolutely! And does Claude land a simile sometimes? Yes, it does. They’re slapdash, not uniformly bad.
Does this not in fact happen to you? Can you show me the next dozen paragraphs, or something comparable? If t doesn’t, do you have any idea what you’re putting in your prompts to make it not happen? It could be some fault in how I’ve been prompting, though I’ve tried a variety of obvious fixes (as I did above), and none has worked.
You seem to be mostly arguing the ways in which LLMs currently do not succeed at producing good, interesting writing. That’s kind of orthogonal to the point of the post, which is where the value of good, interesting writing comes from, and whether or not it’s necessary to have a human writer behind the wheel.
I do try to demonstrate that the quality of AI writing varies based on workflow and prompting to refute the narrow point that LLM output is always just an average of the input. You admit that there are various ways to accomplish this (including doing things like increasing temperature), but point out that they’re not sustainable for more than short passages. This does not seem to be a fundamental limitation (if it’s even true), but something that can be overcome with agentic loops of criticism and revision. Or even just continued improvement in models.
It’s funny you brought up tracking of physical details and spatial relationships. Way back with GPT-2 I experimented with generating romance/erotica stories. I tried to start with short scenes of couples engaged in foreplay and undressing. Very reliably, the model was lose track of which pieces of clothing had already been removed. Characters would remove their pants twice, or three times within the span of three sentences. Or sequencing would be all wrong. They’d remove socks before shoes. And so on.
I just tried a very simple prompt with GPT 5.2 for the type of scene I’m talking about and got this:
Not going to win any awards, but the tracking and choreography are logical and accurate. Obviously light years ahead of GPT-2, and the sort of thing that gets incrementally better with every new frontier model. So I’d push back on your assertion that currently systems are just bad at blocking.
The last time I tried writing a bunch of material that ahem involved much blocking was with Llama 3 to 3.3 era models. The 70B size is better at this than the smaller ones, but still messes up occasionally. I agree larger models are always better. I haven’t done that much of this with current frontier models, but am unsurprised to hear that newer models are better at it. I would expect multimodal models to have a better understanding of 3D as well: feed enough of YouTube into a model, and it will learn how humans move in 3D. So, fair enough, that criticism was a year or two out of date. At a guess, it wouldn’t surprise me if a human author was still a touch better at it than this generation, but it’s probably close enough now not to be a big deal, and if we put enough video into the training set they’ll become superhuman at it pretty soon
However, I do know this year’s models are still distinctly hit or miss (frequently miss) with metaphors and similes, and the ones that are the best creative writers are so primarily because they avoid writing purple prose and overusing something they’re bad at. I don’t have as clear an understanding of why this is so hard for them: they’ve read tens of trillions of tokens of this stuff, but somehow it doesn’t seem to have stuck. A world model for what makes a good metaphor seems to be just a hard problem. It’s almost like we found something where the Platonic Representation Hypothesis doesn’t apply. Which puzzles me, because it seems mostly just symbolism and cultural knowledge, and they handle both of those just fine.
It’s almost like the logic errors they make occasionally: you look at them, and go “You’re so smart most of the time, then that’s the bit you had trouble with?” Sometimes when an LLM makes a mistake, it looks like a human mistake, but quite often it doesn’t. Most often, the non-human ones are the ones that go away with a larger model. The human-like ones often tend to require a better prompt.
On your larger point of “What makes good literature?” What I’m listing are reasons why AI is currently (or in one case, was a year or two ago) clearly unable to produce good literature. That leaves open the question of whether, if these were fixed, as they likely will be at some point, it would then be able to. It would certainly then be able to produce things people would enjoy reading. The other generally accepted criterion is something along the lines of “Having something new to say, and saying it well.” A sufficiently-capable LLM has absorbed the entire product of human creativity so far, in the way no human will ever be able to have read/watched all of, and can mix-and-match any chosen combination of those components, and attempt to extrapolate out of distribution from them. It can also input new data from actual humans, or life, analyze it, and use it as further inspiration. (It could, for example, read the biographies and research the history of a well-documented real person who was not a talented writer, and then attempt to write a counterfactual book that they would have written if they had been.) So I think it’s rather likely it could produce things that, at least to practically all readers, had something new to say, and if it was sufficiently capable and no longer suffered from the sorts of issues I was trying to describe, would then be able to say it well. But I think current AI fails the latter part of the test.
Without doing a deep dive on what ‘good’ is, and having taught and workshopped writing for years, I can pretty safely say that current LLMs already produce better-crafted prose than the vast majority of humans. So if the bar is Pulitzers and best-sellers, sure, we’re not there yet. But we’re already above the median and well on our way.
Really? Do you disagree with my comment about similes and metaphors, or have you found a solution (other than instructing the model to avoid using them), or has this problem gone away very recently and I haven’t bothered to check a model in the last few months (which, thinking about it, actually I haven’t)?
Or are you saying that most humans, even ones aspiring to be writers who might attend workshops, are hit-or-miss with metaphors and similes? Because I’ve read plenty of fanfic and other amateur writing, and while it obviously varies, it didn’t generally contain metaphor failures as egregious as the ones I was frequently getting from frontier models, say, six months ago.
Or when you write “above the vast majority”, do you literally mean “though in certain respects still below most writers of amateur fanfic, who are not in fact median, since the vast majority of people (often wisely) don’t even attempt to write”?
To be clear, I think current models are good at many aspects of writing, and with a good enough prompt to get them past the mode collapse for a bit, and enough editing, their flaws can be patched over. (Which is roughly as much work as writing yourself, but a different skillset, so may be appropriate for some people.)
Similarly, I find LLMs very useful as proofreaders, and for giving a basic first critical reading, such as “What might some readers not understand or be confused by? What needs to be better explained or expanded on? Where might typical readers of website X tend to disagree?” They’re actually superhuman at questions like these that require breadth of knowledge.
Well, since I hadn’t tested it in a few months, I went and tried again, with Opus 4.6 Extended. And yes, the models have continued to improve. The result was better than most amateur fanfiction, even approaching something I’d actually not be unhappy to have paid a little money for, say, picked up cheaply second-hand.
I was feeling lazy, and tried the following:
and it mostly worked. Claude thought for a while as instructed, ran some web searches for symbolism about cats, then selected Angela Carter, Guy Gavriel Kay, Ursula K. Le Guin and Catherynne M. Valente to give me a pastiche of — only two or three of whom I’m actually familiar with, so that worked well. It also named the cat-girl Thessaly, or Thess for short. An original name, as requested.
Now, there was still some cliche: I’ve used variants of the “…” part of this prompt on many models, so I know what to expect. The cat-girl yet again had an overly expressive tail, and one of her love-interests was yet again a scholar with “ink-stained fingers”: both of which are tropes I regularly get for the topic I’d prompted. Yawn. So my lazy attempt did not inject enough originality.
What I was mostly looking for this time was quality of metaphor and simile use, which is why I attempted to prompt that high, and yes, there were still metaphors and similes that didn’t quite land, though a little less badly than I’d been expecting. For a non-cherry-picked example, here’s the entire second paragraph (the first one with any metaphors or similes).
What is “long golden razors” telling us about the sunlight? It’s extremely sharp-edged? Implausible: the sun is not a point source, it’s a disk. It risks damaging the old silk? No, that’s her claws. It’s very bright? She has cat-pupils, they will be narrow slits by morning light. I really don’t know where Claude was going with that one. I mean, yes, it sounds pretty, and ‘long’ might tell us something about the architecture or the time of morning, but ‘razors’ is meaningless, as far as I can tell — am I missing something? Then there’s the range of colors it’s producing in the silk: wine lees to shell is an implausible amount of contrast (to the human eye, but not if you analyze an actual photograph — remember the fuss about color of the dress in the photo? I’m wondering if we’re getting an inhuman perspective here.) As for “the shade of a feeling you have not yet had”, that sounds like trying to turn synesthesia into an exotic cultural detail. Not working for me.
[I do like “prick-points“, however: yes, of course a race of cat-people would be able to do that, and would have a term for it that sounded a touch unusual when translated into English — nice world-building, or perhaps Claude stole it somewhere I’m unfamiliar with. But that’s neither a metaphor nor a simile.]
But is this clearly worse than the average writer of fanfic on the Internet? No. It’s actually not that bad by that standard. It’s a bit purple, and the metaphors and similes are a bit slapdash and random, and certainly not up to any of the four best-selling authors authors it’s trying to pastiche. It’s at best a particularly trashy novel. But on a fanfic site, I would not stop reading after that second paragraph, or the second page: indeed I finished it. Could I do a better metaphor or simile? I’d really like to think I could, but if an editor or reader told me I was wrong, I wouldn’t be astonished, merely rather upset. Are there aspects of this writing that I’d be proud to match? Yes there are, just not the metaphors and similes. I have bad habits as a writer that Claude here is skillfully avoiding.
The third paragraph is better:
Now that’s playful: an intentionally misleading simile. A functional one, even: cliff walls are impassible, not just hard to read but also hard to climb. Well done, you landed a simile! And then even extended it without breaking it: “…perfectly legible if you knew how to read geology. The fissure above his left brow had deepened.” For an LLM, I found that actually impressive.
Also, our protagonist knows how to read members of a hard-to-read fantasy race. She is socially talented — we just learnt something about her. We were shown this, not told it. Overall, the writing is not bad.
The writing is, in fact, good enough that I’m going to give you another excerpt from a touch later on, and a link for anyone who wants to read a 9,500-word fantasy romantic novella with a cat-girl protagonist:
So there’s the inconveniently expressive tail, and the ink-stained fingers on the love-interest, as usual. Also a few more poor metaphors and similes: “built for climbing and unlikely angles”, “there was a burn across the bridge of his nose and the tops of his ears, giving him the faintly startled look of someone who had recently walked into a door” — umm, OK… If I were an editor, I’d be using my blue pencil. But as a reader, I can ignore them.
On the other hand, I like the interplay of long and short sentences. Yes, it uses m-dashes — so do I. The interiority is well done. It’s reasonably fun to read.
If you actually still want more after that, you’ll find it at Salt, Amber, and the Shape of Want. It’s R-rated or so in places. (Later on we get another trope of this particular prompt, the mysterious magical artifact relating the two lovers to the nature of the city.) I’ve read it: it’s not bad, if a bit predictable, a touch implausible in spots, and in need of some editing. I haven’t tried having Claude edit itself — that didn’t work 6 months ago, but things have visibly improved, so I don’t know that it still doesn’t work. Or you can steal my prompt above and insert your own ideas, at any rating Claude is willing to write for you — it’ll take a few minutes.
So yes, the models continue to get better. My criticisms are (as I rather expected) gradually becoming out of date. No-one actually earning a living as an author of novels is going to be out-competed yet, but it’s entirely reasonable for them to start worrying.
Yeah, really.
Let’s take the short excerpt from my post. It’s filled with metaphorical language and contains one explicit simile. I did an exact string search for each in Google Books and internet-wide and could not find a match. Let’s look at them one by one:
“malign little apocalypses”: Clearly metaphorical language. My writerly side appreciates the ironic contrast between the large scale of apocalypses and the descriptor ‘little’.
“soft electrical throat-clearing”: Again, metaphorical. Vacuum cleaners are not traditionally thought of as having mouths or throats. ‘Throat-clearing’ is normally clearing the air in a particular kind of airway. This one is ‘electrical’. I find that interesting and not inept.
“the first impossible thought coiling awake like a pale worm disturbed in its cosmic soil”: Thoughts aren’t literally worms. They don’t literally coil. Here the simile evokes the language of Lovecraft (who did talk about worms and cosmic things a lot) but in a unique reference to the very first thoughts of a newly-conscious household item.
Now, you said you thought the passage was an obvious pastiche. Good for you. To the extent that’s true, I’d place you in an incredibly small minority as well. I probably would have noticed the Lovecraftian elements, but even though I’m very familiar with all three writers, I don’t think I could have identified all of them.
And to my ear, none of these metaphors or similes was a ‘miss’. Far from it. They’re all (as far as I can discern) original, interesting, and evocative.
Would you honestly have been able to distinguish this from human writing if I hadn’t told you? Do you think this metaphorical language is a broken failure? Cliched? Nonsensical?
I wonder if we’re already starting to enter some new phase where good writing immediately becomes suspect simply because it is actually good. I do see more and more accusations of people being bots in places like Reddit. Maybe this will even lead to a dumbing down effect of human writing, or very weird out-of-distribution styles that are hard for LLMs to mimic.
Anyway, based on even the small snippet I posted, LLMs handle metaphors far better than most writers. I suppose it is largely subjective. If you think these examples are garbage, though, I would love to hear the reasons why.
I have to agree that your short excerpt doesn’t contain any bad metaphors or similes. (It is cliched to someone very familiar with H.P Lovecraft, and also a little familiar with Harlan Ellison, but not wildly cliched.) Neither do some paragraphs in what I excerpted above. But I didn’t have to cherrypick to get a first-paragraph-with-a-simile-or-metaphor containing, to my taste, three bad ones, or to find more bad ones in an eight paragraph excerpt that was chosen for its cliches. Was I approaching this reading critically? Absolutely! And does Claude land a simile sometimes? Yes, it does. They’re slapdash, not uniformly bad.
Does this not in fact happen to you? Can you show me the next dozen paragraphs, or something comparable? If t doesn’t, do you have any idea what you’re putting in your prompts to make it not happen? It could be some fault in how I’ve been prompting, though I’ve tried a variety of obvious fixes (as I did above), and none has worked.