evhub(Evan Hubinger)

Karma: 6,761

I (Evan Hubinger) am a safety researcher at Anthropic. Broadly, I work on inner alignment for prosaic machine learning.

See: “Why I’m joining Anthropic

Pronouns: he/​him/​his


Selected work:

Why I’m join­ing Anthropic

evhub5 Jan 2023 1:12 UTC
120 points
4 comments1 min readLW link

Dis­cov­er­ing Lan­guage Model Be­hav­iors with Model-Writ­ten Evaluations

20 Dec 2022 20:08 UTC
72 points
28 comments1 min readLW link

In defense of prob­a­bly wrong mechanis­tic models

evhub6 Dec 2022 23:24 UTC
41 points
10 comments2 min readLW link

Eng­ineer­ing Monose­man­tic­ity in Toy Models

18 Nov 2022 1:43 UTC
73 points
6 comments3 min readLW link