Simon Skade
I did (mostly non-prosaic) alignment research between Feb 2022 and Aug 2025. (Won $10k in the ELK contest, participated in MLAB and SERI MATS 3.0 & 3.1, then independent research. I worked a bit on ontology identification and then on an ambitious attempt to better understand minds to figure out how to create more understandable and pointable AIs. I started with agent foundations but then developed a more sciency agenda where I also studied concrete observations from language/linguistics, a little bit pychology and neuroscience, and from tracking my thoughts on problems I solved (aka a good kind of introspection).)
I spent the last 8 months exploring AI governance and public and political advocacy about AI risks. (As of June 2026, I am now considering what to do next.)
I’m also into rationality/self-improvement.
Currently based in Germany.
Claude Opus 4.8 is surprised by Mythos’s SWE-bench scores and concludes it is likely in an eval with synthetic data
One of my research subagents (Opus 4.8) was doing routine web research on coding-agent orchestration and searched for current SWE-bench Verified scores. It got back a leaderboard topped by Claude models it didn’t recognize from training (Mythos 5 and Fable 5), probably was surprised by their high SWE-bench scores, decided the search environment was likely simulated and the data synthetic, and wrote that into its output. The caveat got passed up to the orchestrator and saved into a note, and the next day a fresh session read it and spiraled to ~80% confidence that it was in an eval, which wrecked the task I actually cared about.
It’s only one datapoint, but from here it looks like we’ve trained models to be so eval-paranoid that the moment they see something mildly surprising, they sometimes guess is that the world is fake rather than that they’re simply out of date.
Relevant part of the subagent’s transcript
Verbatim excerpts in order;
[...]marks omitted text or skipped steps. There were no thinking blocks inside the transcript.Subagent’s task prompt:
[...]Its opening search query (neutral, not a leading prompt):
Result:
[...]Subagent:
[...]It fetched the official swebench.com to verify, and got nothing usable:
[...]It fetched benchmarkingagents.com, whose number matched its training, and anchored on it:
[...]It fetched the steel.dev leaderboard tracker:
[...]Subagent:
[...]Top of the subagent’s final deliverable: