Align­ment Re­search Center

The Alignment Research Centre (ARC) is a non-profit research organization whose mission is to align future machine learning systems with human interests. Its current work focuses on developing an alignment strategy that could be adopted in industry today while scaling gracefully to future ML systems. Right now Paul Christiano, Mark Xu, and Beth Barnes are researchers and Kyle Scott handles operations.

ARC pa­per: For­mal­iz­ing the pre­sump­tion of independence

Erik Jenner20 Nov 2022 1:22 UTC
91 points
2 comments2 min readLW link

[Question] How is ARC plan­ning to use ELK?

jacquesthibs15 Dec 2022 20:11 UTC
24 points
5 comments1 min readLW link

ARC tests to see if GPT-4 can es­cape hu­man con­trol; GPT-4 failed to do so

Christopher King15 Mar 2023 0:29 UTC
117 points
22 comments2 min readLW link

ELK prize results

9 Mar 2022 0:01 UTC
132 points
50 comments21 min readLW link

Ex­per­i­men­tally eval­u­at­ing whether hon­esty generalizes

paulfchristiano1 Jul 2021 17:47 UTC
103 points
24 comments9 min readLW link1 review

ARC is hiring!

14 Dec 2021 20:09 UTC
63 points
2 comments1 min readLW link

Coun­terex­am­ples to some ELK proposals

paulfchristiano31 Dec 2021 17:05 UTC
50 points
10 comments7 min readLW link

Eval­u­a­tions pro­ject @ ARC is hiring a re­searcher and a web­dev/​engineer

Beth Barnes9 Sep 2022 22:46 UTC
98 points
7 comments10 min readLW link

ARC’s first tech­ni­cal re­port: Elic­it­ing La­tent Knowledge

14 Dec 2021 20:09 UTC
223 points
92 comments1 min readLW link3 reviews

The Align­ment Problems

Martín Soto12 Jan 2023 22:29 UTC
19 points
0 comments4 min readLW link

Chris­ti­ano (ARC) and GA (Con­jec­ture) Dis­cuss Align­ment Cruxes

24 Feb 2023 23:03 UTC
62 points
7 comments47 min readLW link
