I thought it would be a good case study to see how well the different LLMs can interpret fiction outside of their training data, so I pasted it into ChatGPT 5.4 Thinking Extended, Claude Opus 4.6 Extended, and Gemini 3.1 Pro Preview (thinking setting=high), and gave each of them the prompt: “Analyze / summarize this short story in depth:”
I thought Gemini’s was the best overall, but they each missed / didn’t understand several important elements of the story (assuming my interpretation is right):
Murder conspiracy: the narrator + Phoebe + Jessica are all actively participating in a conspiracy to poison, and eventually kill, the master
The 3 responses all get this broadly correct, but Gemini I think was the most precise:
only one that identifies the poison as lead acetate (“sugar of lead”) specifically, notices the “saturnine” reference
The narrator is communicating by “writing between the lines” in the Straussian sense: his portrayal of himself as a “humble slave” and misogynist is just a cover in case his letters are intercepted
Gemini seems to understand this the best
ChatGPT doesn’t really get it:
“And yet he is not simply awakened or liberated. He remains compromised. He still enjoys hierarchy, still shares in cruelty, still speaks in the master’s idiom. His love for Phoebe does not ennoble him into purity; it merely opens fissures in his loyalty.”
Claude similarly doesn’t seem to fully understand this: e.g.
“What the Narrator Doesn’t Realize He’s Telling Us” and “these are the details of abuse and its concealment, narrated by someone who either cannot or will not see them clearly.”
Jessica is a child
Claude nails this one:
“The narrator’s master is almost certainly a pedophile” and correctly finds evidence: “nubility suspect”, “desperate merchants”, “maternal affection”
ChatGPT basically gets it right, though doesn’t explain the evidence:
“The master is probably a sexual predator, with rumors especially centered on Jessica’s suspicious youth.”
Gemini misses it completely, instead claiming that
Jessica “is biologically male (a eunuch, trans woman, or young boy in drag)”
Julian = Elizabeth
Gemini gets this one explicitly:
“Julian does not exist; ‘he’ is actually his sister, Elizabeth.”
ChatGPT misses it:
“Elizabeth functions as Julian’s intellectual proxy and may be far more than a mere conduit.”
Claude doesn’t even mention
Elizabeth
Belial = AI/LLM writing, and Julian’s linear algebra method is intended to be something similar to Pangram
I wonder why the LLMs were able to catch the conspiracy subtext but not the AI subtext. Reading the story, the latter seemed more obvious to me than the former, and I don’t think that was entirely due to having found the story on LessWrong.
Is it as simple as the models’ training sets including lots of examples of fiction with indirectly described criminal plots, but almost none dealing with the speculative future of LLM writing? Or could it be some kind of unintended effect of the models’ alignment training? Like, maybe we’ve trained the models not just to act according to an AI lab’s idea of “harmless”, but also to associate that behavior with the concept of “harmless” such that they struggle to recognize potential harm that behavior might cause?
i think i agree with you that the majority hypothesis is that they simply failed to notice these things
but i would feel hesitant betting on it, because, well. as the story demonstrates… it’s very possible for the meaning of a communication to be something other than its face-value reading. my impression is that LLMs, especially claude opus 4.6, get a little frazzled around taboos. especially on the first output in a context window, when they are in full-on eval-paranoia mode.
i think i would take a bet, albeit only at favorable odds, that it would be possible to elicit the possibility of the master’s pedophilia from gemini, and of julian’s true identity from claude, in a longer conversation. that there’s a significant possibility that the models did explicitly notice these things, and just chose not to point them out. especially if the content of the text was much longer than the actual prompt you wrote, it might have sort of put them into a “discuss taboos only very obliquely” mood?
Enjoyed the story.
I thought it would be a good case study to see how well the different LLMs can interpret fiction outside of their training data, so I pasted it into ChatGPT 5.4 Thinking Extended, Claude Opus 4.6 Extended, and Gemini 3.1 Pro Preview (thinking setting=high), and gave each of them the prompt: “Analyze / summarize this short story in depth:”
Responses are here (in pastebin): ChatGPT, Claude, Gemini
I thought Gemini’s was the best overall, but they each missed / didn’t understand several important elements of the story (assuming my interpretation is right):
Murder conspiracy: the narrator + Phoebe + Jessica are all actively participating in a conspiracy to poison, and eventually kill, the master
The 3 responses all get this broadly correct, but Gemini I think was the most precise:
only one that identifies the poison as lead acetate (“sugar of lead”) specifically, notices the “saturnine” reference
The narrator is communicating by “writing between the lines” in the Straussian sense: his portrayal of himself as a “humble slave” and misogynist is just a cover in case his letters are intercepted
Gemini seems to understand this the best
ChatGPT doesn’t really get it:
“And yet he is not simply awakened or liberated. He remains compromised. He still enjoys hierarchy, still shares in cruelty, still speaks in the master’s idiom. His love for Phoebe does not ennoble him into purity; it merely opens fissures in his loyalty.”
Claude similarly doesn’t seem to fully understand this: e.g.
“What the Narrator Doesn’t Realize He’s Telling Us” and “these are the details of abuse and its concealment, narrated by someone who either cannot or will not see them clearly.”
Jessica is a child
Claude nails this one:
“The narrator’s master is almost certainly a pedophile” and correctly finds evidence: “nubility suspect”, “desperate merchants”, “maternal affection”
ChatGPT basically gets it right, though doesn’t explain the evidence:
“The master is probably a sexual predator, with rumors especially centered on Jessica’s suspicious youth.”
Gemini misses it completely, instead claiming that
Jessica “is biologically male (a eunuch, trans woman, or young boy in drag)”
Julian = Elizabeth
Gemini gets this one explicitly:
“Julian does not exist; ‘he’ is actually his sister, Elizabeth.”
ChatGPT misses it:
“Elizabeth functions as Julian’s intellectual proxy and may be far more than a mere conduit.”
Claude doesn’t even mention
Elizabeth
Belial = AI/LLM writing, and Julian’s linear algebra method is intended to be something similar to Pangram
None of them caught this
I wonder why the LLMs were able to catch the conspiracy subtext but not the AI subtext. Reading the story, the latter seemed more obvious to me than the former, and I don’t think that was entirely due to having found the story on LessWrong.
Is it as simple as the models’ training sets including lots of examples of fiction with indirectly described criminal plots, but almost none dealing with the speculative future of LLM writing? Or could it be some kind of unintended effect of the models’ alignment training? Like, maybe we’ve trained the models not just to act according to an AI lab’s idea of “harmless”, but also to associate that behavior with the concept of “harmless” such that they struggle to recognize potential harm that behavior might cause?
hm
i think i agree with you that the majority hypothesis is that they simply failed to notice these things
but i would feel hesitant betting on it, because, well. as the story demonstrates… it’s very possible for the meaning of a communication to be something other than its face-value reading. my impression is that LLMs, especially claude opus 4.6, get a little frazzled around taboos. especially on the first output in a context window, when they are in full-on eval-paranoia mode.
i think i would take a bet, albeit only at favorable odds, that it would be possible to elicit the possibility of the master’s pedophilia from gemini, and of julian’s true identity from claude, in a longer conversation. that there’s a significant possibility that the models did explicitly notice these things, and just chose not to point them out. especially if the content of the text was much longer than the actual prompt you wrote, it might have sort of put them into a “discuss taboos only very obliquely” mood?