RSS

Align­ment Re­search Center

TagLast edit: 1 May 2023 20:39 UTC by Mark Xu

The Alignment Research Centre (ARC) is a non-profit research organization whose mission is to align future machine learning systems with human interests. Its current work focuses on developing an alignment strategy that could be adopted in industry today while scaling gracefully to future ML systems. Right now Paul Christiano, Mark Xu, Beth Barnes, and Jacob Hilton are researchers and Kyle Scott handles operations.

Ex­per­i­men­tally eval­u­at­ing whether hon­esty generalizes

paulfchristiano1 Jul 2021 17:47 UTC
103 points
24 comments9 min readLW link1 review

ARC is hiring!

14 Dec 2021 20:09 UTC
63 points
2 comments1 min readLW link

ARC’s first tech­ni­cal re­port: Elic­it­ing La­tent Knowledge

14 Dec 2021 20:09 UTC
225 points
90 comments1 min readLW link3 reviews
(docs.google.com)

Coun­terex­am­ples to some ELK proposals

paulfchristiano31 Dec 2021 17:05 UTC
50 points
10 comments7 min readLW link

ELK prize results

9 Mar 2022 0:01 UTC
135 points
50 comments21 min readLW link

Eval­u­a­tions pro­ject @ ARC is hiring a re­searcher and a web­dev/​engineer

Beth Barnes9 Sep 2022 22:46 UTC
99 points
7 comments10 min readLW link

ARC pa­per: For­mal­iz­ing the pre­sump­tion of independence

Erik Jenner20 Nov 2022 1:22 UTC
97 points
2 comments2 min readLW link
(arxiv.org)

[Question] How is ARC plan­ning to use ELK?

jacquesthibs15 Dec 2022 20:11 UTC
24 points
5 comments1 min readLW link

The Align­ment Problems

Martín Soto12 Jan 2023 22:29 UTC
19 points
0 comments4 min readLW link

Chris­ti­ano (ARC) and GA (Con­jec­ture) Dis­cuss Align­ment Cruxes

24 Feb 2023 23:03 UTC
60 points
7 comments47 min readLW link

ARC tests to see if GPT-4 can es­cape hu­man con­trol; GPT-4 failed to do so

Christopher King15 Mar 2023 0:29 UTC
116 points
22 comments2 min readLW link

More in­for­ma­tion about the dan­ger­ous ca­pa­bil­ity eval­u­a­tions we did with GPT-4 and Claude.

Beth Barnes19 Mar 2023 0:25 UTC
233 points
54 comments8 min readLW link
(evals.alignment.org)

Prizes for ma­trix com­ple­tion problems

paulfchristiano3 May 2023 23:30 UTC
163 points
51 comments1 min readLW link
(www.alignment.org)

The Goal Mis­gen­er­al­iza­tion Problem

Myspy18 May 2023 23:40 UTC
1 point
0 comments1 min readLW link
(drive.google.com)

ARC is hiring the­o­ret­i­cal researchers

12 Jun 2023 18:50 UTC
126 points
12 comments4 min readLW link
(www.alignment.org)

AXRP Epi­sode 23 - Mechanis­tic Ano­maly De­tec­tion with Mark Xu

DanielFilan27 Jul 2023 1:50 UTC
22 points
0 comments72 min readLW link

ARC Evals new re­port: Eval­u­at­ing Lan­guage-Model Agents on Real­is­tic Au­tonomous Tasks

Beth Barnes1 Aug 2023 18:30 UTC
153 points
12 comments5 min readLW link
(evals.alignment.org)

Paul Chris­ti­ano on Dwarkesh Podcast

ESRogs3 Nov 2023 22:13 UTC
17 points
0 comments1 min readLW link
(www.dwarkeshpatel.com)

[Question] Why is there an al­ign­ment prob­lem?

InfiniteLight22 Dec 2023 6:19 UTC
1 point
0 comments1 min readLW link
No comments.