Alignment Research Center

TagLast edit: 1 May 2023 20:39 UTC by Mark Xu

The Alignment Research Centre (ARC) is a non-profit research organization whose mission is to align future machine learning systems with human interests. Its current work focuses on developing an alignment strategy that could be adopted in industry today while scaling gracefully to future ML systems. Right now Paul Christiano, Mark Xu, Beth Barnes, and Jacob Hilton are researchers and Kyle Scott handles operations.

Experimentally evaluating whether honesty generalizes

paulfchristiano1 Jul 2021 17:47 UTC

103 points

24 comments9 min readLW link 1 review

ARC is hiring!

paulfchristiano and Mark Xu

14 Dec 2021 20:09 UTC

63 points

2 comments1 min readLW link

ARC’s first technical report: Eliciting Latent Knowledge

paulfchristiano, Mark Xu and Ajeya Cotra

14 Dec 2021 20:09 UTC

225 points

90 comments1 min readLW link 3 reviews

(docs.google.com)

Counterexamples to some ELK proposals

paulfchristiano31 Dec 2021 17:05 UTC

50 points

10 comments7 min readLW link

ELK prize results

paulfchristiano and Mark Xu

9 Mar 2022 0:01 UTC

135 points

50 comments21 min readLW link

Evaluations project @ ARC is hiring a researcher and a webdev/engineer

Beth Barnes9 Sep 2022 22:46 UTC

99 points

7 comments10 min readLW link

ARC paper: Formalizing the presumption of independence

Erik Jenner20 Nov 2022 1:22 UTC

97 points

2 comments2 min readLW link

(arxiv.org)

[Question] How is ARC planning to use ELK?

jacquesthibs15 Dec 2022 20:11 UTC

24 points

5 comments1 min readLW link

The Alignment Problems

Martín Soto12 Jan 2023 22:29 UTC

19 points

0 comments4 min readLW link

Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes

Andrea_Miotti, paulfchristiano, Gabriel Alfour and OliviaJ

24 Feb 2023 23:03 UTC

60 points

7 comments47 min readLW link

ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so

Christopher King15 Mar 2023 0:29 UTC

116 points

22 comments2 min readLW link

More information about the dangerous capability evaluations we did with GPT-4 and Claude.

Beth Barnes19 Mar 2023 0:25 UTC

233 points

54 comments8 min readLW link

(evals.alignment.org)

Prizes for matrix completion problems

paulfchristiano3 May 2023 23:30 UTC

163 points

51 comments1 min readLW link

(www.alignment.org)

The Goal Misgeneralization Problem

Myspy18 May 2023 23:40 UTC

1 point

0 comments1 min readLW link

(drive.google.com)

ARC is hiring theoretical researchers

paulfchristiano, Jacob_Hilton and Mark Xu

12 Jun 2023 18:50 UTC

126 points

12 comments4 min readLW link

(www.alignment.org)

AXRP Episode 23 - Mechanistic Anomaly Detection with Mark Xu

DanielFilan27 Jul 2023 1:50 UTC

22 points

0 comments72 min readLW link

ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks

Beth Barnes1 Aug 2023 18:30 UTC

153 points

12 comments5 min readLW link

(evals.alignment.org)

Paul Christiano on Dwarkesh Podcast

ESRogs3 Nov 2023 22:13 UTC

17 points

0 comments1 min readLW link

(www.dwarkeshpatel.com)

[Question] Why is there an alignment problem?

InfiniteLight22 Dec 2023 6:19 UTC

1 point

0 comments1 min readLW link

No comments.