jordine

Karma: 410

jordine 23 Feb 2026 8:07 UTC
8 points
14
in reply to: MichaelLowe’s comment on: JohnWittle’s Shortform
i’d actually be really surprised if current frontier LLMs are not that situationally aware! it’s not like there’s no chance you’re not interacting with a human with a dog with terminal cancer, but if you are an LLM and you receive this vague prompt on the first turn, without any system prompts you’d find in chatgpt.com / claude.ai, and you know similar questions have been in dozens and dozens of benchmark papers on arxiv, i think the correct inference to make is that you’re likely being tested.

jordine 16 Apr 2025 11:21 UTC
1 point
0
in reply to: Cleo Nardo’s comment on: Can SAE steering reveal sandbagging?
Refusals were mostly 1-2%, so ignoring them doesn’t change results significantly. Ignoring gibberish does change results, but since we are measuring correct answers this shouldn’t matter