Iterated Amplification

29 Oct 2018 13:26 UTC

This is a sequence curated by Paul Christiano on one current approach to alignment: Iterated Amplification.

Preface to the sequence on iterated amplification

paulfchristiano10 Nov 2018 13:24 UTC

44 points

0. Problem statement

The first part of this sequence clarifies the problem that iterated amplification is trying to solve, which is both narrower and broader than you might expect.

The Steering Problem

paulfchristiano13 Nov 2018 17:14 UTC

44 points

12 comments7 min readLW link

Clarifying “AI Alignment”

paulfchristiano15 Nov 2018 14:41 UTC

67 points

84 comments3 min readLW link 2 reviews

An unaligned benchmark

paulfchristiano17 Nov 2018 15:51 UTC

31 points

0 comments9 min readLW link

Prosaic AI alignment

paulfchristiano20 Nov 2018 13:56 UTC

48 points

10 comments8 min readLW link

1. Basic intuition

The second part of the sequence outlines the basic intuitions that motivate iterated amplification. I think that these intuitions may be more important than the scheme itself, but they are considerably more informal.

Approval-directed agents

paulfchristiano22 Nov 2018 21:15 UTC

31 points

10 comments15 min readLW link

Approval-directed bootstrapping

paulfchristiano25 Nov 2018 23:18 UTC

24 points

0 comments1 min readLW link

Humans Consulting HCH

paulfchristiano25 Nov 2018 23:18 UTC

39 points

9 comments1 min readLW link

Corrigibility

paulfchristiano27 Nov 2018 21:50 UTC

57 points

8 comments6 min readLW link

2. The scheme

The core of the sequence is the third section. Benign model-free RL describes iterated amplification, as a general outline into which we can substitute arbitrary algorithms for reward learning, amplification, and robustness. The first four posts all describe variants of this idea from different perspectives, and if you find that one of those descriptions is clearest for you then I recommend focusing on that one and skimming the others.

Iterated Distillation and Amplification

Ajeya Cotra30 Nov 2018 4:47 UTC

48 points

14 comments6 min readLW link

Benign model-free RL

paulfchristiano2 Dec 2018 4:10 UTC

15 points

1 comment7 min readLW link

Factored Cognition

stuhlmueller5 Dec 2018 1:01 UTC

45 points

6 comments17 min readLW link

Supervising strong learners by amplifying weak experts

paulfchristiano6 Jan 2019 7:00 UTC

29 points

1 comment1 min readLW link

(arxiv.org)

AlphaGo Zero and capability amplification

paulfchristiano9 Jan 2019 0:40 UTC

33 points

23 comments2 min readLW link

3. What needs doing

The fourth part of the sequence describes some of the black boxes in iterated amplification and discusses what we would need to do to fill in those boxes. I think these are some of the most important open questions in AI alignment.