gwern comments on AI Can’t Write Good Fiction

gwern 13 Mar 2025 3:20 UTC
8 points
2
It’s about grief, with central metaphors that add exactly zero to anyone’s aesthetic understanding of grief (stuff being underground, things not staying buried)

It is about grief, but it didn’t have to be. This would’ve been more obvious if I could’ve shown you the session, but I’ll copy it out:
2. Brainstorming Ideas:
- A child confronting a local superstition after witnessing something traumatic.
- A funeral narrated by an animal’s perspective.
- A celebrity’s fall from grace caught on live camera.
- A girl who collects superstitions until one unexpectedly comes true.
- A gardener who secretly buries cursed objects.
- Friends reuniting silently after a crime.
- A model who steals other people’s identities through makeup.
- A ghost attending its own funeral, confused.
- Farmers discovering an ominous black stone.
- Neighbors noticing unsettling coincidences.
- Someone who compulsively invents last messages for strangers.
- A woman discovering her double living her unlived life.
- An eerie superstition circulating quietly through a town.
- Villagers silently celebrating the death of a tyrant.
- Two women performing friendship publicly, enemies privately.
- Someone obsessively organizing goodbyes.
- A funeral revealing hidden grudges through seating arrangements.
- Friends trapped in a toxic nostalgia loop.
- A boy who accidentally predicts deaths.
- An abandoned motel that traps memories.
3. Selection

Idea: A gardener who secretly buries cursed objects.

Rationale:
- Symbolic and psychological depth.
- Opportunity for visual and thematic resonance.
- Tension around secrecy, superstition, and consequences.
Now that I look at it, the 5 flash stories I happened to copy into seem to mostly hit on themes of death & grief (the last one is literally titled “obit”!), so I think you are unfairly dinging 4.5 here—if ‘good’ flash fiction, by your own standards, keep revolving around death and grief, how can we blame 4.5 (or r1) for doing likewise?

Anyway, some of these ideas seem promising. “A funeral narrated by an animal’s perspective” is one I like, I don’t think I’ve ever seen that.

And of course, if the failure mode is so common, throw it into the prompt. (When I yell at 4.5 to avoid grief/death/funerals and brainstorm some more, it picks out ‘”The Parking Attendant Matchmaker”: A seemingly ordinary parking attendant quietly manipulates parking assignments at a large business complex to engineer chance encounters and romances among strangers.’ Yeah sure why not.)

Like, what does it possibly mean for mourners to “trust my silence” here. What is it they’re trusting? How does the earth’s hunger contrast to that?

Balderdash. There’s a lot to criticize here, but you’re straining to come up with criticisms now. That’s possibly the least objectionable sentence in the whole thing. If this had been written by a human, you wouldn’t hesitate in the slightest to accept that. It is perfectly sensible to speak of trusting the confidentiality of a confessor/witness figure, and the hungry earth is a cliche so straightforward and obvious that it is beyond cliche and loops around to ordinary fact, and if a human had written it, you would have no trouble in understanding the idea of ‘even if I were to gossip about what I saw, the earth would have hidden or destroyed the physical evidence’.

I also see it a lot in the ClaudePlaysPokemon twitch chat, this idea that simply adding greater situational awareness or more layers of metacognition would make Claude way better at the game.

I do agree that the Claude-Pokemon experiment shows a limitation of LLMs that isn’t fixed easily by simply a bit more metadata or fancier retrieval. (I think it shows, specifically, the serious flaws in relying on frozen weights and refusing to admit neuroplasticity is a thing which is something that violates RL scaling laws, because those always assume that the model is, y’know, learning as it gains more experience, because who would be dumb enough to deploy frozen models in tasks far exceeding their context window and where they also aren’t trained at all? - and why we need things like dynamic evaluation. I should probably write a comment on that—the pathologies like the deliberate-fainting are, I think, really striking demonstrations of the problems with powerful but frozen amnesiac agents.)

I’m much less convinced that we’re seeing anything like that with LLMs writing fiction. What is the equivalent of the Claude pathologies, like the fainting delusion, in fiction writing? (There used to be ‘write a non-rhyming poem’ but that seems solved at this point.) Especially if you look at the research on people rating LLM outputs, or LMsys; if they are being trained on lousy preference data, and this is why they are like they are, that’s very different from somehow being completely incapable of “extracting the actual latent features of good flash fiction”. (What would such a latent feature look like? Do you really think that there’s some property of flash fiction like “has a twist ending” that you can put two flash stories into 4.5 or o1-pro, with & without, and ask it to classify which is which and it’ll perform at chance? Sounds unlikely to me, but I’d be interested to see some examples.)
- JustisMills 13 Mar 2025 3:49 UTC
  7 points
  1
  Parent
  Yeah, a lot of the suggested topics there seem to be borrowing from the specific stories you included, which makes sense (and I don’t think is a flaw, really). Like the first story you included in the context is a funeral witnessed by a little girl, with the deceased’s dog freaking out as a major plot point, so it’s sensible enough that it’s coming up with ideas that are fairly closely related.
  I’m not sure what you mean about twist endings? I tend to think they’re pretty bad in most flash fiction, at least literary flash fiction, but certainly plenty of humans write them and occasionally they’re fine.
  I still hate the “earth’s hunger” sentence, and am confident I would if this was a story by a human, mostly just because I evaluated and hated lots and lots of submissions by humans with similar stuff! That being said, I don’t think I understood what 4.5 was going for there, and your explanation makes sense, so my objection is purely aesthetic. Of course, I can’t prove that I’m not just evincing anti-LLM prejudice. It’s possible! But overall I really like LLM outputs often, talk to multiple LLMs every day, and try prompting them in lots of different ways to see what happens, so I don’t think I go into reading LLM fiction efforts determined to hate them. I just do in fact hate them. But I also hated, say, Rogue One, and many of my friends liked it. No accounting for taste!
  I am curious, since you are a writer/thinker I respect a lot, if you like… have a feeling of sincere aesthetic appreciation for the story you shared (and thanks, by the way, for putting in the effort to generate it), or any other AI-generated fiction. Because while I point to a bunch of specific stuff I don’t like, the main thing is the total lack of a feeling I get when reading good flash fiction stories, which is surprise. A sentence, or word choice, or plot pivot (though not something as banal as a twist ending) catching me off guard. To date, machine-generated stuff has failed to do that to me, including when I’ve tried to coax it into doing so in various conversations.
  I look forward to the day that it does!
  Edit: also, I now notice you were asking about what the latent features of good flash fiction would be. I think they’re pretty ineffable, which is part of the challenge. One might be something like “the text quickly creates a scene with a strongly identifiable vibe, then complicates that vibe with a key understated detail which admits multiple interpretations”; another might be “there is an extreme economy of words/symbols such that capitalization/punctuation choices are load bearing and admit discussion”; a third might be “sentences with weird structure and repetition appear at a key point to pivot away from sensory or character moments, and into the interiority of the viewpoint character”. None of this is easy to capture; I don’t really think I’ve captured it. But I don’t feel like LLMs really get it yet. I understand it may be a prompting skill issue, or something, but the fact that no LLM output I’ve seen really plays with sentence structure or an unusual narrative voice, despite many celebrated flash fiction pieces doing so, feels somewhat instructive.