evhub(Evan Hubinger)

I (Evan Hubinger) am a safety researcher at Anthropic. Broadly, I work on inner alignment for prosaic machine learning.

See: “Why I’m joining Anthropic

Pronouns: he/​him/​his


Selected work:

Why I’m join­ing Anthropic

evhub5 Jan 2023 1:12 UTC
Dis­cov­er­ing Lan­guage Model Be­hav­iors with Model-Writ­ten Evaluations

20 Dec 2022 20:08 UTC
In defense of prob­a­bly wrong mechanis­tic models

evhub6 Dec 2022 23:24 UTC
Eng­ineer­ing Monose­man­tic­ity in Toy Models

18 Nov 2022 1:43 UTC
