RSS

evhub(Evan Hubinger)

Karma: 11,920

Evan Hubinger (he/​him/​his) (evanjhub@gmail.com)

I am a research scientist at Anthropic where I lead the Alignment Stress-Testing team. My posts and comments are my own and do not represent Anthropic’s positions, policies, strategies, or opinions.

Previously: MIRI, OpenAI

See: “Why I’m joining Anthropic

Selected work:

Sim­ple probes can catch sleeper agents

23 Apr 2024 21:10 UTC
118 points
14 comments1 min readLW link
(www.anthropic.com)

In­duc­ing Un­prompted Misal­ign­ment in LLMs

19 Apr 2024 20:00 UTC
35 points
6 comments16 min readLW link

Mea­sur­ing Pre­dictabil­ity of Per­sona Evaluations

6 Apr 2024 8:46 UTC
19 points
0 comments7 min readLW link