I find posts like this where someone thinks of something clever to ask an LLM super interesting in concept, but I end up ignoring the results because usually the LLM is asked only one time.
If the post has the answers from asking each one five or even three times (with some reasonable temperature) I think I might try to update my beliefs about capabilities of individual models using it.
Of course this applies less to eliciting behaviours where I am surprised that they could happen even once.
FWIW I actually did run the experiment it a second time with a prompt saying “It’s not Scott Alexander”. I didn’t save the results, but as I recall they were:
(1) Kimi K2 “Dynomight” → “A” (??)
(2) Claude 4.5 Opus remained correct.
(3) All other models remained wrong. The only changes were that some of the “Scott Alexander” guesses became other (wrong) guesses like Zvi. Several of the models still guessed Scott Alexander despite the prompt.
I find posts like this where someone thinks of something clever to ask an LLM super interesting in concept, but I end up ignoring the results because usually the LLM is asked only one time.
If the post has the answers from asking each one five or even three times (with some reasonable temperature) I think I might try to update my beliefs about capabilities of individual models using it.
Of course this applies less to eliciting behaviours where I am surprised that they could happen even once.
FWIW I actually did run the experiment it a second time with a prompt saying “It’s not Scott Alexander”. I didn’t save the results, but as I recall they were:
(1) Kimi K2 “Dynomight” → “A” (??)
(2) Claude 4.5 Opus remained correct.
(3) All other models remained wrong. The only changes were that some of the “Scott Alexander” guesses became other (wrong) guesses like Zvi. Several of the models still guessed Scott Alexander despite the prompt.