RSS

Iter­ated Am­plifi­ca­tion

TagLast edit: 14 Jul 2020 23:03 UTC by jacobjacob

Iterated Amplification is an approach to AI alignment.

See also: Factored cognition.

Ex­pla­na­tion of Paul’s AI-Align­ment agenda by Ajeya Cotra

habryka5 Mar 2018 3:10 UTC
20 points
0 comments1 min readLW link
(ai-alignment.com)

Prize for prob­a­ble problems

paulfchristiano8 Mar 2018 16:58 UTC
60 points
63 comments4 min readLW link

Prob­lems with Am­plifi­ca­tion/​Distillation

Stuart_Armstrong27 Mar 2018 11:12 UTC
29 points
7 comments10 min readLW link

Un­der­stand­ing Iter­ated Distil­la­tion and Am­plifi­ca­tion: Claims and Oversight

William_S17 Apr 2018 22:36 UTC
34 points
30 comments9 min readLW link

My con­fu­sions with Paul’s Agenda

Vaniver20 Apr 2018 17:24 UTC
37 points
1 comment6 min readLW link

Challenges to Chris­ti­ano’s ca­pa­bil­ity am­plifi­ca­tion proposal

Eliezer Yudkowsky19 May 2018 18:18 UTC
124 points
54 comments23 min readLW link1 review

Am­plifi­ca­tion Dis­cus­sion Notes

William_S1 Jun 2018 19:03 UTC
17 points
3 comments3 min readLW link

A gen­eral model of safety-ori­ented AI development

Wei Dai11 Jun 2018 21:00 UTC
65 points
8 comments1 min readLW link

Paul’s re­search agenda FAQ

zhukeepa1 Jul 2018 6:25 UTC
126 points
74 comments19 min readLW link1 review

A com­ment on the IDA-AlphaGoZero metaphor; ca­pa­bil­ities ver­sus alignment

AlexMennen11 Jul 2018 1:03 UTC
40 points
1 comment1 min readLW link

Disagree­ment with Paul: al­ign­ment induction

Stuart_Armstrong10 Sep 2018 13:54 UTC
31 points
6 comments1 min readLW link

Meta-execution

paulfchristiano1 Nov 2018 22:18 UTC
20 points
1 comment5 min readLW link

Pre­face to the se­quence on iter­ated amplification

paulfchristiano10 Nov 2018 13:24 UTC
43 points
8 comments3 min readLW link

Ap­proval-di­rected agents

paulfchristiano22 Nov 2018 21:15 UTC
31 points
10 comments15 min readLW link

Ap­proval-di­rected bootstrapping

paulfchristiano25 Nov 2018 23:18 UTC
22 points
0 comments1 min readLW link

Hu­mans Con­sult­ing HCH

paulfchristiano25 Nov 2018 23:18 UTC
33 points
9 comments1 min readLW link

Corrigibility

paulfchristiano27 Nov 2018 21:50 UTC
57 points
8 comments6 min readLW link

Iter­ated Distil­la­tion and Amplification

Ajeya Cotra30 Nov 2018 4:47 UTC
47 points
14 comments6 min readLW link

Benign model-free RL

paulfchristiano2 Dec 2018 4:10 UTC
15 points
1 comment7 min readLW link

Fac­tored Cognition

stuhlmueller5 Dec 2018 1:01 UTC
45 points
6 comments17 min readLW link

Three AI Safety Re­lated Ideas

Wei Dai13 Dec 2018 21:32 UTC
68 points
38 comments2 min readLW link

Su­per­vis­ing strong learn­ers by am­plify­ing weak experts

paulfchristiano6 Jan 2019 7:00 UTC
29 points
1 comment1 min readLW link
(arxiv.org)

AlphaGo Zero and ca­pa­bil­ity amplification

paulfchristiano9 Jan 2019 0:40 UTC
33 points
23 comments2 min readLW link

Direc­tions and desider­ata for AI alignment

paulfchristiano13 Jan 2019 7:47 UTC
47 points
1 comment14 min readLW link

The re­ward en­g­ineer­ing prob­lem

paulfchristiano16 Jan 2019 18:47 UTC
26 points
3 comments7 min readLW link

Ca­pa­bil­ity amplification

paulfchristiano20 Jan 2019 7:03 UTC
24 points
8 comments13 min readLW link

Thoughts on re­ward en­g­ineer­ing

paulfchristiano24 Jan 2019 20:15 UTC
30 points
30 comments11 min readLW link

Tech­niques for op­ti­miz­ing worst-case performance

paulfchristiano28 Jan 2019 21:29 UTC
23 points
12 comments8 min readLW link

Reli­a­bil­ity am­plifi­ca­tion

paulfchristiano31 Jan 2019 21:12 UTC
24 points
3 comments7 min readLW link

Se­cu­rity amplification

paulfchristiano6 Feb 2019 17:28 UTC
21 points
0 comments13 min readLW link

Re­in­force­ment Learn­ing in the Iter­ated Am­plifi­ca­tion Framework

William_S9 Feb 2019 0:56 UTC
25 points
12 comments4 min readLW link

RAISE is launch­ing their MVP

null26 Feb 2019 11:45 UTC
67 points
1 comment1 min readLW link

[Question] What’s wrong with these analo­gies for un­der­stand­ing In­formed Over­sight and IDA?

Wei Dai20 Mar 2019 9:11 UTC
35 points
3 comments1 min readLW link

Ma­chine Learn­ing Pro­jects on IDA

24 Jun 2019 18:38 UTC
49 points
3 comments2 min readLW link

[Question] What are the differ­ences be­tween all the iter­a­tive/​re­cur­sive ap­proaches to AI al­ign­ment?

riceissa21 Sep 2019 2:09 UTC
30 points
14 comments2 min readLW link

Wri­teup: Progress on AI Safety via Debate

5 Feb 2020 21:04 UTC
100 points
18 comments33 min readLW link

Syn­the­siz­ing am­plifi­ca­tion and debate

evhub5 Feb 2020 22:53 UTC
33 points
10 comments4 min readLW link

[Question] Does iter­ated am­plifi­ca­tion tackle the in­ner al­ign­ment prob­lem?

JanB15 Feb 2020 12:58 UTC
7 points
4 comments1 min readLW link

[Question] How does iter­ated am­plifi­ca­tion ex­ceed hu­man abil­ities?

riceissa2 May 2020 23:44 UTC
19 points
9 comments2 min readLW link

An overview of 11 pro­pos­als for build­ing safe ad­vanced AI

evhub29 May 2020 20:38 UTC
205 points
36 comments38 min readLW link2 reviews

My Un­der­stand­ing of Paul Chris­ti­ano’s Iter­ated Am­plifi­ca­tion AI Safety Re­search Agenda

Chi Nguyen15 Aug 2020 20:02 UTC
120 points
20 comments39 min readLW link

Model splin­ter­ing: mov­ing from one im­perfect model to another

Stuart_Armstrong27 Aug 2020 11:53 UTC
79 points
10 comments33 min readLW link

A guide to Iter­ated Am­plifi­ca­tion & Debate

Rafael Harth15 Nov 2020 17:14 UTC
75 points
12 comments15 min readLW link

De­bate up­date: Obfus­cated ar­gu­ments problem

Beth Barnes23 Dec 2020 3:24 UTC
135 points
24 comments16 min readLW link

Imi­ta­tive Gen­er­al­i­sa­tion (AKA ‘Learn­ing the Prior’)

Beth Barnes10 Jan 2021 0:30 UTC
107 points
15 comments11 min readLW link1 review

Map­ping the Con­cep­tual Ter­ri­tory in AI Ex­is­ten­tial Safety and Alignment

jbkjr12 Feb 2021 7:55 UTC
15 points
0 comments26 min readLW link

Thoughts on Iter­ated Distil­la­tion and Amplification

Waddington11 May 2021 21:32 UTC
9 points
2 comments20 min readLW link

[Question] Is iter­ated am­plifi­ca­tion re­ally more pow­er­ful than imi­ta­tion?

Chantiel2 Aug 2021 23:20 UTC
5 points
0 comments2 min readLW link

Garrabrant and Shah on hu­man mod­el­ing in AGI

Rob Bensinger4 Aug 2021 4:35 UTC
60 points
10 comments47 min readLW link

My Overview of the AI Align­ment Land­scape: A Bird’s Eye View

Neel Nanda15 Dec 2021 23:44 UTC
127 points
9 comments15 min readLW link

HCH and Ad­ver­sar­ial Questions

David Udell19 Feb 2022 0:52 UTC
15 points
7 comments26 min readLW link

In­ter­pretabil­ity’s Align­ment-Solv­ing Po­ten­tial: Anal­y­sis of 7 Scenarios

Evan R. Murphy12 May 2022 20:01 UTC
53 points
0 comments59 min readLW link

Iter­ated Distil­la­tion-Am­plifi­ca­tion, Gato, and Proto-AGI [Re-Ex­plained]

Gabriel Mukobi27 May 2022 5:42 UTC
21 points
4 comments6 min readLW link

Sur­prised by ELK re­port’s coun­terex­am­ple to De­bate, IDA

Evan R. Murphy4 Aug 2022 2:12 UTC
18 points
0 comments5 min readLW link

Ought will host a fac­tored cog­ni­tion “Lab Meet­ing”

9 Sep 2022 23:46 UTC
35 points
1 comment1 min readLW link

Can you force a neu­ral net­work to keep gen­er­al­iz­ing?

Q Home12 Sep 2022 10:14 UTC
2 points
10 comments5 min readLW link

Notes on OpenAI’s al­ign­ment plan

Alex Flint8 Dec 2022 19:13 UTC
40 points
5 comments7 min readLW link

Is there a ML agent that aban­dons it’s util­ity func­tion out-of-dis­tri­bu­tion with­out los­ing ca­pa­bil­ities?

Christopher King22 Feb 2023 16:49 UTC
1 point
7 comments1 min readLW link

[Question] Should Au­toGPT up­date us to­wards re­search­ing IDA?

Michaël Trazzi12 Apr 2023 16:41 UTC
15 points
5 comments1 min readLW link

AIS 101: Task de­com­po­si­tion for scal­able oversight

Charbel-Raphaël25 Jul 2023 13:34 UTC
27 points
0 comments19 min readLW link
(docs.google.com)