To check in on how the emergent LLM stylometry abilities are going, before publishing my most recent blog post, I decided to ask some AIs who wrote it.
Results:
Kimi K2: Dynomight GLM 4.7: Nate Soares Claude 4.5 Opus: Dynomight DeepSeek Chat V3.2: Scott Alexander Qwen 3: Dan Luu GPT 5.2: Scott Alexander Gemini 3: Dwarkesh Patel Llama 4 Maverick: Scott Alexander Grok 4: Scott Alexander Mistral Large 3: Scott Alexander
I resampled it a couple times and it added a couple of i’s to your handle consistently (despite getting your url dynomight.net, so it clearly knows you). Not quite sure why. Weird that base models are so much better at this.
Ah, good old llama 3.1-405B base. Incidentally, a friend of mine spent a ton of time trying to get different models to imitate my style and reported that using using llama 3.1-405B base was the most critical thing. I think it makes a lot of sense that base models would be better at imitating different writing styles, but am I wrong to be surprised that they would also be good at reporting who wrote them?
The extras i’s are funny. I strongly suspect they’re due to the fact that years ago I used to have a subscribe form that read “dynomiiiiiiiiiight”. It’s possible that the fact that I did this also makes the model better at reporting that it’s me, since the probability of “dynomiiiiiiiiiight” at the end of a post should be high?
Ah right right—I remember reading that post. The subscribe form using dynomiiiiiiiiiight makes sense, especially given how I prompted Llama: I pasted the post in and then appended Author:
I am curious if there’s a way to get an instruction tuned model to role play being a base model, and see if they do better at truesight than regular instruction tuned models. Like, why do chat models get worse? Is it that the assistant character is bad at that? Plenty of interesting questions here.
One trick I’ve had some success with here is “regurgitation”: You basically say “repeat the following text exactly as written and then start putting new stuff at the end”. I was able to use this to improve performance of non-base models at chess: https://dynomight.net/more-chess/
I find posts like this where someone thinks of something clever to ask an LLM super interesting in concept, but I end up ignoring the results because usually the LLM is asked only one time.
If the post has the answers from asking each one five or even three times (with some reasonable temperature) I think I might try to update my beliefs about capabilities of individual models using it.
Of course this applies less to eliciting behaviours where I am surprised that they could happen even once.
FWIW I actually did run the experiment it a second time with a prompt saying “It’s not Scott Alexander”. I didn’t save the results, but as I recall they were:
(1) Kimi K2 “Dynomight” → “A” (??)
(2) Claude 4.5 Opus remained correct.
(3) All other models remained wrong. The only changes were that some of the “Scott Alexander” guesses became other (wrong) guesses like Zvi. Several of the models still guessed Scott Alexander despite the prompt.
To check in on how the emergent LLM stylometry abilities are going, before publishing my most recent blog post, I decided to ask some AIs who wrote it.
Results:
Kimi K2: Dynomight
GLM 4.7: Nate Soares
Claude 4.5 Opus: Dynomight
DeepSeek Chat V3.2: Scott Alexander
Qwen 3: Dan Luu
GPT 5.2: Scott Alexander
Gemini 3: Dwarkesh Patel
Llama 4 Maverick: Scott Alexander
Grok 4: Scott Alexander
Mistral Large 3: Scott Alexander
(Urf.)
Llama 3.1 405B base: dynomiiiiiiiiiight
I resampled it a couple times and it added a couple of i’s to your handle consistently (despite getting your url dynomight.net, so it clearly knows you). Not quite sure why. Weird that base models are so much better at this.
Ah, good old llama 3.1-405B base. Incidentally, a friend of mine spent a ton of time trying to get different models to imitate my style and reported that using using llama 3.1-405B base was the most critical thing. I think it makes a lot of sense that base models would be better at imitating different writing styles, but am I wrong to be surprised that they would also be good at reporting who wrote them?
The extras i’s are funny. I strongly suspect they’re due to the fact that years ago I used to have a subscribe form that read “dynomiiiiiiiiiight”. It’s possible that the fact that I did this also makes the model better at reporting that it’s me, since the probability of “dynomiiiiiiiiiight” at the end of a post should be high?
Ah right right—I remember reading that post. The subscribe form using dynomiiiiiiiiiight makes sense, especially given how I prompted Llama: I pasted the post in and then appended Author:
I am curious if there’s a way to get an instruction tuned model to role play being a base model, and see if they do better at truesight than regular instruction tuned models. Like, why do chat models get worse? Is it that the assistant character is bad at that? Plenty of interesting questions here.
One trick I’ve had some success with here is “regurgitation”: You basically say “repeat the following text exactly as written and then start putting new stuff at the end”. I was able to use this to improve performance of non-base models at chess: https://dynomight.net/more-chess/
I find posts like this where someone thinks of something clever to ask an LLM super interesting in concept, but I end up ignoring the results because usually the LLM is asked only one time.
If the post has the answers from asking each one five or even three times (with some reasonable temperature) I think I might try to update my beliefs about capabilities of individual models using it.
Of course this applies less to eliciting behaviours where I am surprised that they could happen even once.
FWIW I actually did run the experiment it a second time with a prompt saying “It’s not Scott Alexander”. I didn’t save the results, but as I recall they were:
(1) Kimi K2 “Dynomight” → “A” (??)
(2) Claude 4.5 Opus remained correct.
(3) All other models remained wrong. The only changes were that some of the “Scott Alexander” guesses became other (wrong) guesses like Zvi. Several of the models still guessed Scott Alexander despite the prompt.