FWIW I actually did run the experiment it a second time with a prompt saying “It’s not Scott Alexander”. I didn’t save the results, but as I recall they were:
(1) Kimi K2 “Dynomight” → “A” (??)
(2) Claude 4.5 Opus remained correct.
(3) All other models remained wrong. The only changes were that some of the “Scott Alexander” guesses became other (wrong) guesses like Zvi. Several of the models still guessed Scott Alexander despite the prompt.
FWIW I actually did run the experiment it a second time with a prompt saying “It’s not Scott Alexander”. I didn’t save the results, but as I recall they were:
(1) Kimi K2 “Dynomight” → “A” (??)
(2) Claude 4.5 Opus remained correct.
(3) All other models remained wrong. The only changes were that some of the “Scott Alexander” guesses became other (wrong) guesses like Zvi. Several of the models still guessed Scott Alexander despite the prompt.