Evan Hubinger (he/him/his) (evanjhub@gmail.com)
I am a research scientist at Anthropic where I lead the Alignment Stress-Testing team. My posts and comments are my own and do not represent Anthropic’s positions, policies, strategies, or opinions.
Previously: MIRI, OpenAI
See: “Why I’m joining Anthropic”
Selected work:
Our work here is not arguing that probing is a perfect solution in general; it’s just a single datapoint of how it fares on the models from our Sleeper Agents paper.