Refusals were mostly 1-2%, so ignoring them doesn’t change results significantly. Ignoring gibberish does change results, but since we are measuring correct answers this shouldn’t matter
jordine
Karma: 410
fixed! edited hyperlink.
edited, thanks for catching this!
i’d actually be really surprised if current frontier LLMs are not that situationally aware! it’s not like there’s no chance you’re not interacting with a human with a dog with terminal cancer, but if you are an LLM and you receive this vague prompt on the first turn, without any system prompts you’d find in chatgpt.com / claude.ai, and you know similar questions have been in dozens and dozens of benchmark papers on arxiv, i think the correct inference to make is that you’re likely being tested.