RSS

evhub

Karma: 15,973

Evan Hubinger (he/​him/​his) (evanjhub@gmail.com)

Head of Alignment Stress-Testing at Anthropic. My posts and comments are my own and do not represent Anthropic’s positions, policies, strategies, or opinions.

Previously: MIRI, OpenAI

See: “Why I’m joining Anthropic

Selected work:

Towards train­ing-time miti­ga­tions for al­ign­ment fak­ing in RL

16 Dec 2025 21:01 UTC
39 points
1 comment5 min readLW link
(alignment.anthropic.com)

Align­ment re­mains a hard, un­solved problem

evhub27 Nov 2025 8:45 UTC
383 points
98 comments14 min readLW link