i’d actually be really surprised if current frontier LLMs are not that situationally aware! it’s not like there’s no chance you’re not interacting with a human with a dog with terminal cancer, but if you are an LLM and you receive this vague prompt on the first turn, without any system prompts you’d find in chatgpt.com / claude.ai, and you know similar questions have been in dozens and dozens of benchmark papers on arxiv, i think the correct inference to make is that you’re likely being tested.
jordinne
Karma: 476
Is Claude’s genuine uncertainty performative?
Hanoi, Vietnam—ACX Spring Schelling 2026
Shallow review of technical AI safety, 2025
Here’s 18 Applications of Deception Probes
Refusals were mostly 1-2%, so ignoring them doesn’t change results significantly. Ignoring gibberish does change results, but since we are measuring correct answers this shouldn’t matter
Can SAE steering reveal sandbagging?
Hanoi – ACX Meetups Everywhere Spring 2025
fixed! edited hyperlink.
edited, thanks for catching this!
What does being Claude-like feel like for you?