Alignment Research Center (ARC)

TagLast edit: 30 Dec 2024 9:23 UTC by Dakara

Alignment Research Centre (ARC) is a non-profit research organization whose mission is to align future machine learning systems with human interests. Its current work focuses on developing an alignment strategy that could be adopted in industry today while scaling gracefully to future ML systems. Right now Paul Christiano, Mark Xu, and Jacob Hilton are researchers and Kyle Scott handles operations.

ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks

Beth Barnes1 Aug 2023 18:30 UTC

153 points

12 comments5 min readLW link

(evals.alignment.org)

Steelmanning heuristic arguments

Dmitry Vaintrob13 Apr 2025 1:09 UTC

77 points

1 comment17 min readLW link

Obstacles in ARC’s agenda: Low Probability Estimation

David Matolcsi2 May 2025 19:38 UTC

44 points

0 comments6 min readLW link

Obstacles in ARC’s agenda: Finding explanations

David Matolcsi30 Apr 2025 23:03 UTC

128 points

10 comments17 min readLW link

[Question] How is ARC planning to use ELK?

jacquesthibs15 Dec 2022 20:11 UTC

24 points

5 comments1 min readLW link

Obstacles in ARC’s agenda: Mechanistic Anomaly Detection

David Matolcsi1 May 2025 20:51 UTC

43 points

1 comment11 min readLW link

Paul Christiano on Dwarkesh Podcast

ESRogs3 Nov 2023 22:13 UTC

19 points

0 comments1 min readLW link

(www.dwarkeshpatel.com)

ARC is hiring theoretical researchers

paulfchristiano, Jacob_Hilton and Mark Xu

12 Jun 2023 18:50 UTC

126 points

12 comments4 min readLW link

(www.alignment.org)

Low Probability Estimation in Language Models

Gabriel Wu18 Oct 2024 15:50 UTC

50 points

0 comments10 min readLW link

(www.alignment.org)

Prizes for matrix completion problems

paulfchristiano3 May 2023 23:30 UTC

164 points

52 comments1 min readLW link

(www.alignment.org)

ARC’s first technical report: Eliciting Latent Knowledge

paulfchristiano, Mark Xu and Ajeya Cotra

14 Dec 2021 20:09 UTC

231 points

90 comments1 min readLW link 3 reviews

(docs.google.com)

A bird’s eye view of ARC’s research

Jacob_Hilton23 Oct 2024 15:50 UTC

121 points

12 comments7 min readLW link

(www.alignment.org)

Estimating Tail Risk in Neural Networks

Mark Xu13 Sep 2024 20:00 UTC

68 points

9 comments23 min readLW link

(www.alignment.org)

AXRP Episode 23 - Mechanistic Anomaly Detection with Mark Xu

DanielFilan27 Jul 2023 1:50 UTC

22 points

0 comments72 min readLW link

ARC paper: Formalizing the presumption of independence

Erik Jenner20 Nov 2022 1:22 UTC

97 points

2 comments2 min readLW link

(arxiv.org)

ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so

Christopher King15 Mar 2023 0:29 UTC

116 points

22 comments2 min readLW link

More information about the dangerous capability evaluations we did with GPT-4 and Claude.

Beth Barnes19 Mar 2023 0:25 UTC

233 points

54 comments8 min readLW link

(evals.alignment.org)

Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes

Andrea_Miotti, paulfchristiano, Gabriel Alfour and Olive Branch

24 Feb 2023 23:03 UTC

61 points

7 comments47 min readLW link

Counterexamples to some ELK proposals

paulfchristiano31 Dec 2021 17:05 UTC

53 points

10 comments7 min readLW link

ARC is hiring!

paulfchristiano and Mark Xu

14 Dec 2021 20:09 UTC

64 points

2 comments1 min readLW link

Concrete Methods for Heuristic Estimation on Neural Networks

Oliver Daniels14 Nov 2024 5:07 UTC

35 points

0 comments27 min readLW link

ONTOLOGICAL ALIGNMENT AS THE MISSING LAYER

fiduciarysentinel16 Jan 2026 3:09 UTC

1 point

0 comments3 min readLW link

ARC progress update: Competing with sampling

Eric Neyman, Victor Lecomte, Wilson Wu, Mikewins, Jacob_Hilton and George Robinson

18 Nov 2025 17:22 UTC

132 points

11 comments21 min readLW link

Exploring a Vision for AI as Compassionate, Emotionally Intelligent Partners — Seeking Collaboration and Insights

theophilos14 Jul 2025 23:22 UTC

1 point

0 comments1 min readLW link

Experimentally evaluating whether honesty generalizes

paulfchristiano1 Jul 2021 17:47 UTC

103 points

24 comments9 min readLW link 1 review

Evaluations project @ ARC is hiring a researcher and a webdev/engineer

Beth Barnes9 Sep 2022 22:46 UTC

99 points

7 comments10 min readLW link

The Alignment Problems

Martín Soto12 Jan 2023 22:29 UTC

20 points

0 comments4 min readLW link

1.75 ASR HARMBENCH & 0% HARMFUL RESPONSES FOR MISALIGNMENT.

jfdom10 Nov 2025 20:43 UTC

1 point

0 comments1 min readLW link

Empirical Proof of Systemic Incoherence in LLMs (Gemini Case Study

arayun6 Nov 2025 14:23 UTC

1 point

0 comments1 min readLW link

AlgZoo: uninterpreted models with fewer than 1,500 parameters

Jacob_Hilton26 Jan 2026 17:30 UTC

181 points

7 comments10 min readLW link

(www.alignment.org)

[Question] Why is there an alignment problem?

InfiniteLight22 Dec 2023 6:19 UTC

1 point

0 comments1 min readLW link

Purpose-Internalisation Architecture (PIA) as a Complement to Constraint-Based Alignment: A Thermodynamic Argument

Gerhard Diedericks10 Feb 2026 12:37 UTC

1 point

0 comments12 min readLW link

The Goal Misgeneralization Problem

Myspy18 May 2023 23:40 UTC

1 point

0 comments1 min readLW link

(drive.google.com)

ELK prize results

paulfchristiano and Mark Xu

9 Mar 2022 0:01 UTC

139 points

50 comments21 min readLW link

No comments.