RSS

evhub

Karma: 14,199

Evan Hubinger (he/​him/​his) (evanjhub@gmail.com)

Head of Alignment Stress-Testing at Anthropic. My posts and comments are my own and do not represent Anthropic’s positions, policies, strategies, or opinions.

Previously: MIRI, OpenAI

See: “Why I’m joining Anthropic

Selected work:

Sticky goals: a con­crete ex­per­i­ment for un­der­stand­ing de­cep­tive alignment

evhubSep 2, 2022, 9:57 PM
39 points
13 comments3 min readLW link

AI co­or­di­na­tion needs clear wins

evhubSep 1, 2022, 11:41 PM
147 points
16 comments2 min readLW link1 review

Strat­egy For Con­di­tion­ing Gen­er­a­tive Models

Sep 1, 2022, 4:34 AM
31 points
4 comments18 min readLW link

How likely is de­cep­tive al­ign­ment?

evhubAug 30, 2022, 7:34 PM
105 points
28 comments60 min readLW link

Pre­cur­sor check­ing for de­cep­tive alignment

evhubAug 3, 2022, 10:56 PM
24 points
0 comments14 min readLW link

Ac­cept­abil­ity Ver­ifi­ca­tion: A Re­search Agenda

Jul 12, 2022, 8:11 PM
50 points
0 comments1 min readLW link
(docs.google.com)

A trans­parency and in­ter­pretabil­ity tech tree

evhubJun 16, 2022, 11:44 PM
163 points
11 comments18 min readLW link1 review

evhub’s Shortform

evhubJun 11, 2022, 12:43 AM
9 points
159 comments1 min readLW link

Learn­ing the smooth prior

Apr 29, 2022, 9:10 PM
35 points
0 comments12 min readLW link

Towards a bet­ter cir­cuit prior: Im­prov­ing on ELK state-of-the-art

Mar 29, 2022, 1:56 AM
23 points
0 comments15 min readLW link

Mus­ings on the Speed Prior

evhubMar 2, 2022, 4:04 AM
33 points
4 comments10 min readLW link

Trans­former Circuits

evhubDec 22, 2021, 9:09 PM
144 points
4 comments3 min readLW link
(transformer-circuits.pub)

ML Align­ment The­ory Pro­gram un­der Evan Hubinger

Dec 6, 2021, 12:03 AM
82 points
3 comments2 min readLW link

A pos­i­tive case for how we might suc­ceed at pro­saic AI alignment

evhubNov 16, 2021, 1:49 AM
81 points
46 comments6 min readLW link

How do we be­come con­fi­dent in the safety of a ma­chine learn­ing sys­tem?

evhubNov 8, 2021, 10:49 PM
134 points
5 comments31 min readLW link

You can talk to EA Funds be­fore applying

evhub28 Sep 2021 20:39 UTC
71 points
2 comments1 min readLW link

Au­tomat­ing Au­dit­ing: An am­bi­tious con­crete tech­ni­cal re­search proposal

evhub11 Aug 2021 20:32 UTC
89 points
13 comments14 min readLW link1 review

LCDT, A My­opic De­ci­sion Theory

3 Aug 2021 22:41 UTC
57 points
50 comments15 min readLW link

An­swer­ing ques­tions hon­estly in­stead of pre­dict­ing hu­man an­swers: lots of prob­lems and some solutions

evhub13 Jul 2021 18:49 UTC
62 points
24 comments31 min readLW link

Knowl­edge Neu­rons in Pre­trained Transformers

evhub17 May 2021 22:54 UTC
100 points
7 comments2 min readLW link
(arxiv.org)