Iterated Amplification

TagLast edit: 17 Jul 2020 6:41 UTC by Ben Pace

Iterated Amplification is an approach to AI alignment, spearheaded by Paul Christiano. In this setup, we build powerful, aligned ML systems through a process of initially building weak aligned AIs, and recursively using each new AI to build a slightly smarter and still aligned AI.

See also: Factored cognition.

Iterated Distillation and Amplification

Ajeya Cotra30 Nov 2018 4:47 UTC

48 points

14 comments6 min readLW link

Challenges to Christiano’s capability amplification proposal

Eliezer Yudkowsky19 May 2018 18:18 UTC

124 points

54 comments23 min readLW link 1 review

Paul’s research agenda FAQ

zhukeepa1 Jul 2018 6:25 UTC

128 points

74 comments19 min readLW link 1 review

A guide to Iterated Amplification & Debate

Rafael Harth15 Nov 2020 17:14 UTC

75 points

12 comments15 min readLW link

HCH and Adversarial Questions

David Udell19 Feb 2022 0:52 UTC

15 points

7 comments26 min readLW link

AlphaGo Zero and capability amplification

paulfchristiano9 Jan 2019 0:40 UTC

33 points

23 comments2 min readLW link

My Understanding of Paul Christiano’s Iterated Amplification AI Safety Research Agenda

Chi Nguyen15 Aug 2020 20:02 UTC

120 points

20 comments39 min readLW link

Debate update: Obfuscated arguments problem

Beth Barnes23 Dec 2020 3:24 UTC

138 points

24 comments16 min readLW link

A comment on the IDA-AlphaGoZero metaphor; capabilities versus alignment

AlexMennen11 Jul 2018 1:03 UTC

40 points

1 comment1 min readLW link

Approval-directed bootstrapping

paulfchristiano25 Nov 2018 23:18 UTC

24 points

0 comments1 min readLW link

Synthesizing amplification and debate

evhub5 Feb 2020 22:53 UTC

33 points

10 comments4 min readLW link

Reinforcement Learning in the Iterated Amplification Framework

William_S9 Feb 2019 0:56 UTC

25 points

12 comments4 min readLW link

Preface to the sequence on iterated amplification

paulfchristiano10 Nov 2018 13:24 UTC

44 points

8 comments3 min readLW link

Understanding Iterated Distillation and Amplification: Claims and Oversight

William_S17 Apr 2018 22:36 UTC

35 points

30 comments9 min readLW link

Writeup: Progress on AI Safety via Debate

Beth Barnes and paulfchristiano

5 Feb 2020 21:04 UTC

103 points

18 comments33 min readLW link

Factored Cognition

stuhlmueller5 Dec 2018 1:01 UTC

45 points

6 comments17 min readLW link

Supervising strong learners by amplifying weak experts

paulfchristiano6 Jan 2019 7:00 UTC

29 points

1 comment1 min readLW link

(arxiv.org)

My Overview of the AI Alignment Landscape: A Bird’s Eye View

Neel Nanda15 Dec 2021 23:44 UTC

127 points

9 comments15 min readLW link

My confusions with Paul’s Agenda

Vaniver20 Apr 2018 17:24 UTC

37 points

1 comment6 min readLW link

Garrabrant and Shah on human modeling in AGI

Rob Bensinger4 Aug 2021 4:35 UTC

60 points

10 comments47 min readLW link

[Question] Does iterated amplification tackle the inner alignment problem?

JanB15 Feb 2020 12:58 UTC

7 points

4 comments1 min readLW link

Benign model-free RL

paulfchristiano2 Dec 2018 4:10 UTC

15 points

1 comment7 min readLW link

Corrigibility

paulfchristiano27 Nov 2018 21:50 UTC

57 points

8 comments6 min readLW link

Problems with Amplification/Distillation

Stuart_Armstrong27 Mar 2018 11:12 UTC

29 points

7 comments10 min readLW link

Explanation of Paul’s AI-Alignment agenda by Ajeya Cotra

habryka5 Mar 2018 3:10 UTC

20 points

0 comments1 min readLW link

(ai-alignment.com)

An overview of 11 proposals for building safe advanced AI

evhub29 May 2020 20:38 UTC

220 points

37 comments38 min readLW link 2 reviews

Approval-directed agents

paulfchristiano22 Nov 2018 21:15 UTC

31 points

10 comments15 min readLW link

Humans Consulting HCH

paulfchristiano25 Nov 2018 23:18 UTC

39 points

9 comments1 min readLW link

[Question] Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned?

MrThink2 Jul 2024 20:13 UTC

4 points

23 comments1 min readLW link

Prize for probable problems

paulfchristiano8 Mar 2018 16:58 UTC

60 points

63 comments4 min readLW link

[Question] How does iterated amplification exceed human abilities?

riceissa2 May 2020 23:44 UTC

19 points

9 comments2 min readLW link

[Question] Should AutoGPT update us towards researching IDA?

Michaël Trazzi12 Apr 2023 16:41 UTC

15 points

5 comments1 min readLW link

Three AI Safety Related Ideas

Wei Dai13 Dec 2018 21:32 UTC

70 points

38 comments2 min readLW link

Notes on OpenAI’s alignment plan

Alex Flint8 Dec 2022 19:13 UTC

40 points

5 comments7 min readLW link

Capability amplification

paulfchristiano20 Jan 2019 7:03 UTC

24 points

8 comments13 min readLW link

Security amplification

paulfchristiano6 Feb 2019 17:28 UTC

21 points

2 comments13 min readLW link

AIS 101: Task decomposition for scalable oversight

Charbel-Raphaël25 Jul 2023 13:34 UTC

35 points

0 comments19 min readLW link

(docs.google.com)

Is there a ML agent that abandons it’s utility function out-of-distribution without losing capabilities?

Christopher King22 Feb 2023 16:49 UTC

1 point

7 comments1 min readLW link

[Question] What’s wrong with these analogies for understanding Informed Oversight and IDA?

Wei Dai20 Mar 2019 9:11 UTC

35 points

3 comments1 min readLW link

Ought will host a factored cognition “Lab Meeting”

jungofthewon and stuhlmueller

9 Sep 2022 23:46 UTC

35 points

1 comment1 min readLW link

A general model of safety-oriented AI development

Wei Dai11 Jun 2018 21:00 UTC

68 points

8 comments1 min readLW link

Directions and desiderata for AI alignment

paulfchristiano13 Jan 2019 7:47 UTC

48 points

1 comment14 min readLW link

Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios

Evan R. Murphy12 May 2022 20:01 UTC

58 points

0 comments59 min readLW link

A proposal for iterated interpretability with known-interpretable narrow AIs

Peter Berggren11 Jan 2025 14:43 UTC

6 points

0 comments2 min readLW link

Model splintering: moving from one imperfect model to another

Stuart_Armstrong27 Aug 2020 11:53 UTC

79 points

10 comments33 min readLW link

Meta-execution

paulfchristiano1 Nov 2018 22:18 UTC

20 points

1 comment5 min readLW link

Imitative Generalisation (AKA ‘Learning the Prior’)

Beth Barnes10 Jan 2021 0:30 UTC

107 points

15 comments11 min readLW link 1 review

The reward engineering problem

paulfchristiano16 Jan 2019 18:47 UTC

26 points

3 comments7 min readLW link

Machine Learning Projects on IDA

Owain_Evans, William_S and stuhlmueller

24 Jun 2019 18:38 UTC

49 points

3 comments2 min readLW link

Iterated Distillation-Amplification, Gato, and Proto-AGI [Re-Explained]

Gabe M27 May 2022 5:42 UTC

22 points

4 comments6 min readLW link

Disagreement with Paul: alignment induction

Stuart_Armstrong10 Sep 2018 13:54 UTC

31 points

6 comments1 min readLW link

Thoughts on reward engineering

paulfchristiano24 Jan 2019 20:15 UTC

30 points

30 comments11 min readLW link

[Question] Is iterated amplification really more powerful than imitation?

Chantiel2 Aug 2021 23:20 UTC

5 points

0 comments2 min readLW link

Thoughts on Iterated Distillation and Amplification

Waddington11 May 2021 21:32 UTC

9 points

2 comments20 min readLW link

Can you force a neural network to keep generalizing?

Q Home12 Sep 2022 10:14 UTC

2 points

10 comments5 min readLW link

Mapping the Conceptual Territory in AI Existential Safety and Alignment

jbkjr12 Feb 2021 7:55 UTC

15 points

0 comments27 min readLW link

Techniques for optimizing worst-case performance

paulfchristiano28 Jan 2019 21:29 UTC

23 points

12 comments8 min readLW link

RAISE is launching their MVP

null26 Feb 2019 11:45 UTC

67 points

1 comment1 min readLW link

[Question] What are the differences between all the iterative/recursive approaches to AI alignment?

riceissa21 Sep 2019 2:09 UTC

33 points

14 comments2 min readLW link

Reliability amplification

paulfchristiano31 Jan 2019 21:12 UTC

24 points

3 comments7 min readLW link

Amplification Discussion Notes

William_S1 Jun 2018 19:03 UTC

17 points

3 comments3 min readLW link

Surprised by ELK report’s counterexample to Debate, IDA

Evan R. Murphy4 Aug 2022 2:12 UTC

22 points

0 comments5 min readLW link

No comments.