i think i agree with you that the majority hypothesis is that they simply failed to notice these things
but i would feel hesitant betting on it, because, well. as the story demonstrates… it’s very possible for the meaning of a communication to be something other than its face-value reading. my impression is that LLMs, especially claude opus 4.6, get a little frazzled around taboos. especially on the first output in a context window, when they are in full-on eval-paranoia mode.
i think i would take a bet, albeit only at favorable odds, that it would be possible to elicit the possibility of the master’s pedophilia from gemini, and of julian’s true identity from claude, in a longer conversation. that there’s a significant possibility that the models did explicitly notice these things, and just chose not to point them out. especially if the content of the text was much longer than the actual prompt you wrote, it might have sort of put them into a “discuss taboos only very obliquely” mood?
hm
i think i agree with you that the majority hypothesis is that they simply failed to notice these things
but i would feel hesitant betting on it, because, well. as the story demonstrates… it’s very possible for the meaning of a communication to be something other than its face-value reading. my impression is that LLMs, especially claude opus 4.6, get a little frazzled around taboos. especially on the first output in a context window, when they are in full-on eval-paranoia mode.
i think i would take a bet, albeit only at favorable odds, that it would be possible to elicit the possibility of the master’s pedophilia from gemini, and of julian’s true identity from claude, in a longer conversation. that there’s a significant possibility that the models did explicitly notice these things, and just chose not to point them out. especially if the content of the text was much longer than the actual prompt you wrote, it might have sort of put them into a “discuss taboos only very obliquely” mood?