RSS

Iter­ated Am­plifi­ca­tion

TagLast edit: 17 Jul 2020 6:41 UTC by Ben Pace

Iter­ated Am­plifi­ca­tion is an ap­proach to AI al­ign­ment, spear­headed by Paul Chris­ti­ano. In this setup, we build pow­er­ful, al­igned ML sys­tems through a pro­cess of ini­tially build­ing weak al­igned AIs, and re­cur­sively us­ing each new AI to build a slightly smarter and still al­igned AI.

See also: Fac­tored cog­ni­tion.

Paul’s re­search agenda FAQ

zhukeepa1 Jul 2018 6:25 UTC
115 points
69 comments19 min readLW link2 nominations1 review

Challenges to Chris­ti­ano’s ca­pa­bil­ity am­plifi­ca­tion proposal

Eliezer Yudkowsky19 May 2018 18:18 UTC
180 points
53 comments23 min readLW link3 nominations1 review

Iter­ated Distil­la­tion and Amplification

Ajeya Cotra30 Nov 2018 4:47 UTC
44 points
12 comments6 min readLW link

A guide to Iter­ated Am­plifi­ca­tion & Debate

Rafael Harth15 Nov 2020 17:14 UTC
55 points
5 comments15 min readLW link

AlphaGo Zero and ca­pa­bil­ity amplification

paulfchristiano9 Jan 2019 0:40 UTC
30 points
23 comments2 min readLW link

My Un­der­stand­ing of Paul Chris­ti­ano’s Iter­ated Am­plifi­ca­tion AI Safety Re­search Agenda

Chi Nguyen15 Aug 2020 20:02 UTC
107 points
20 comments39 min readLW link

A com­ment on the IDA-AlphaGoZero metaphor; ca­pa­bil­ities ver­sus alignment

AlexMennen11 Jul 2018 1:03 UTC
39 points
1 comment1 min readLW link

An overview of 11 pro­pos­als for build­ing safe ad­vanced AI

evhub29 May 2020 20:38 UTC
154 points
30 comments38 min readLW link

Wri­teup: Progress on AI Safety via Debate

5 Feb 2020 21:04 UTC
95 points
17 comments33 min readLW link

My con­fu­sions with Paul’s Agenda

Vaniver20 Apr 2018 17:24 UTC
91 points
1 comment6 min readLW link

Ex­pla­na­tion of Paul’s AI-Align­ment agenda by Ajeya Cotra

habryka5 Mar 2018 3:10 UTC
55 points
0 comments1 min readLW link
(ai-alignment.com)

Un­der­stand­ing Iter­ated Distil­la­tion and Am­plifi­ca­tion: Claims and Oversight

William_S17 Apr 2018 22:36 UTC
73 points
30 comments9 min readLW link

Pre­face to the se­quence on iter­ated amplification

paulfchristiano10 Nov 2018 13:24 UTC
42 points
5 comments3 min readLW link

Re­in­force­ment Learn­ing in the Iter­ated Am­plifi­ca­tion Framework

William_S9 Feb 2019 0:56 UTC
26 points
12 comments4 min readLW link

[Question] How does iter­ated am­plifi­ca­tion ex­ceed hu­man abil­ities?

riceissa2 May 2020 23:44 UTC
21 points
9 comments2 min readLW link

[Question] Does iter­ated am­plifi­ca­tion tackle the in­ner al­ign­ment prob­lem?

JanBrauner15 Feb 2020 12:58 UTC
7 points
4 comments1 min readLW link

Prize for prob­a­ble problems

paulfchristiano8 Mar 2018 16:58 UTC
144 points
63 comments4 min readLW link

Ap­proval-di­rected agents

paulfchristiano22 Nov 2018 21:15 UTC
30 points
11 comments15 min readLW link

Ap­proval-di­rected bootstrapping

paulfchristiano25 Nov 2018 23:18 UTC
19 points
0 comments1 min readLW link

Hu­mans Con­sult­ing HCH

paulfchristiano25 Nov 2018 23:18 UTC
22 points
10 comments1 min readLW link

Corrigibility

paulfchristiano27 Nov 2018 21:50 UTC
42 points
4 comments6 min readLW link

Benign model-free RL

paulfchristiano2 Dec 2018 4:10 UTC
14 points
1 comment7 min readLW link

Fac­tored Cognition

stuhlmueller5 Dec 2018 1:01 UTC
40 points
6 comments17 min readLW link

Su­per­vis­ing strong learn­ers by am­plify­ing weak experts

paulfchristiano6 Jan 2019 7:00 UTC
29 points
0 comments1 min readLW link
(arxiv.org)

Prob­lems with Am­plifi­ca­tion/​Distillation

Stuart_Armstrong27 Mar 2018 11:12 UTC
67 points
7 comments10 min readLW link

Re­laxed ad­ver­sar­ial train­ing for in­ner alignment

evhub10 Sep 2019 23:03 UTC
51 points
10 comments27 min readLW link

Syn­the­siz­ing am­plifi­ca­tion and debate

evhub5 Feb 2020 22:53 UTC
39 points
10 comments4 min readLW link

Ca­pa­bil­ity amplification

paulfchristiano20 Jan 2019 7:03 UTC
24 points
8 comments13 min readLW link

RAISE is launch­ing their MVP

toonalfrink26 Feb 2019 11:45 UTC
85 points
1 comment1 min readLW link

Am­plifi­ca­tion Dis­cus­sion Notes

William_S1 Jun 2018 19:03 UTC
43 points
3 comments3 min readLW link

Ma­chine Learn­ing Pro­jects on IDA

24 Jun 2019 18:38 UTC
51 points
3 comments2 min readLW link

A gen­eral model of safety-ori­ented AI development

Wei_Dai11 Jun 2018 21:00 UTC
71 points
8 comments1 min readLW link

[Question] What’s wrong with these analo­gies for un­der­stand­ing In­formed Over­sight and IDA?

Wei_Dai20 Mar 2019 9:11 UTC
39 points
3 comments1 min readLW link

Disagree­ment with Paul: al­ign­ment induction

Stuart_Armstrong10 Sep 2018 13:54 UTC
33 points
6 comments1 min readLW link

Thoughts on re­ward en­g­ineer­ing

paulfchristiano24 Jan 2019 20:15 UTC
31 points
30 comments11 min readLW link

The re­ward en­g­ineer­ing prob­lem

paulfchristiano16 Jan 2019 18:47 UTC
24 points
3 comments7 min readLW link

[Question] What are the differ­ences be­tween all the iter­a­tive/​re­cur­sive ap­proaches to AI al­ign­ment?

riceissa21 Sep 2019 2:09 UTC
30 points
14 comments2 min readLW link

Direc­tions and desider­ata for AI alignment

paulfchristiano13 Jan 2019 7:47 UTC
30 points
1 comment14 min readLW link

Model splin­ter­ing: mov­ing from one im­perfect model to another

Stuart_Armstrong27 Aug 2020 11:53 UTC
34 points
3 comments33 min readLW link

Tech­niques for op­ti­miz­ing worst-case performance

paulfchristiano28 Jan 2019 21:29 UTC
24 points
12 comments8 min readLW link

Reli­a­bil­ity am­plifi­ca­tion

paulfchristiano31 Jan 2019 21:12 UTC
24 points
3 comments7 min readLW link

Se­cu­rity amplification

paulfchristiano6 Feb 2019 17:28 UTC
22 points
0 comments13 min readLW link

Meta-execution

paulfchristiano1 Nov 2018 22:18 UTC
17 points
1 comment5 min readLW link
No comments.