The distribution of characters that (e.g.) R1 plays seems much more constrained than the distribution of all characters in text on the Internet, though. Although there’s some unpredictability in which persona you get each time, there are patterns that emerge. We can recognize Claude, or DeepSeek, or ChatGPT even when they’re not playing the exact same character.
I have no idea what going on with R1’s obsession with bioluminescence, but it’s one of many tells for which LLM generated the text.
To add to the confusion, it appears that China really is building underwater data centers. If it wasn’t true, it would be pretty much the kind of thing that R1 likes to hallucinate. But us seems that also, it’s actually true.