Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Iterated Amplification
Tag
Last edit:
14 Jul 2020 23:03 UTC
by
jacobjacob
Iterated Amplification
is an approach to AI alignment.
See also:
Factored cognition
.
Relevant
New
Old
AIS 101: Task decomposition for scalable oversight
Charbel-Raphaël
25 Jul 2023 13:34 UTC
27
points
0
comments
19
min read
LW
link
(docs.google.com)
[Question]
Should AutoGPT update us towards researching IDA?
Michaël Trazzi
12 Apr 2023 16:41 UTC
15
points
5
comments
1
min read
LW
link
Is there a ML agent that abandons it’s utility function out-of-distribution without losing capabilities?
Christopher King
22 Feb 2023 16:49 UTC
1
point
7
comments
1
min read
LW
link
Notes on OpenAI’s alignment plan
Alex Flint
8 Dec 2022 19:13 UTC
40
points
5
comments
7
min read
LW
link
Can you force a neural network to keep generalizing?
Q Home
12 Sep 2022 10:14 UTC
2
points
10
comments
5
min read
LW
link
Ought will host a factored cognition “Lab Meeting”
jungofthewon
and
stuhlmueller
9 Sep 2022 23:46 UTC
35
points
1
comment
1
min read
LW
link
Surprised by ELK report’s counterexample to Debate, IDA
Evan R. Murphy
4 Aug 2022 2:12 UTC
18
points
0
comments
5
min read
LW
link
Iterated Distillation-Amplification, Gato, and Proto-AGI [Re-Explained]
Gabriel Mukobi
27 May 2022 5:42 UTC
21
points
4
comments
6
min read
LW
link
Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios
Evan R. Murphy
12 May 2022 20:01 UTC
53
points
0
comments
59
min read
LW
link
HCH and Adversarial Questions
David Udell
19 Feb 2022 0:52 UTC
15
points
7
comments
26
min read
LW
link
My Overview of the AI Alignment Landscape: A Bird’s Eye View
Neel Nanda
15 Dec 2021 23:44 UTC
127
points
9
comments
15
min read
LW
link
Garrabrant and Shah on human modeling in AGI
Rob Bensinger
4 Aug 2021 4:35 UTC
60
points
10
comments
47
min read
LW
link
[Question]
Is iterated amplification really more powerful than imitation?
Chantiel
2 Aug 2021 23:20 UTC
5
points
0
comments
2
min read
LW
link
Thoughts on Iterated Distillation and Amplification
Waddington
11 May 2021 21:32 UTC
9
points
2
comments
20
min read
LW
link
Mapping the Conceptual Territory in AI Existential Safety and Alignment
jbkjr
12 Feb 2021 7:55 UTC
15
points
0
comments
26
min read
LW
link
Imitative Generalisation (AKA ‘Learning the Prior’)
Beth Barnes
10 Jan 2021 0:30 UTC
107
points
15
comments
11
min read
LW
link
1
review
Debate update: Obfuscated arguments problem
Beth Barnes
23 Dec 2020 3:24 UTC
135
points
24
comments
16
min read
LW
link
A guide to Iterated Amplification & Debate
Rafael Harth
15 Nov 2020 17:14 UTC
75
points
12
comments
15
min read
LW
link
Model splintering: moving from one imperfect model to another
Stuart_Armstrong
27 Aug 2020 11:53 UTC
79
points
10
comments
33
min read
LW
link
My Understanding of Paul Christiano’s Iterated Amplification AI Safety Research Agenda
Chi Nguyen
15 Aug 2020 20:02 UTC
120
points
20
comments
39
min read
LW
link
An overview of 11 proposals for building safe advanced AI
evhub
29 May 2020 20:38 UTC
205
points
36
comments
38
min read
LW
link
2
reviews
[Question]
How does iterated amplification exceed human abilities?
riceissa
2 May 2020 23:44 UTC
19
points
9
comments
2
min read
LW
link
[Question]
Does iterated amplification tackle the inner alignment problem?
JanB
15 Feb 2020 12:58 UTC
7
points
4
comments
1
min read
LW
link
Synthesizing amplification and debate
evhub
5 Feb 2020 22:53 UTC
33
points
10
comments
4
min read
LW
link
Writeup: Progress on AI Safety via Debate
Beth Barnes
and
paulfchristiano
5 Feb 2020 21:04 UTC
100
points
18
comments
33
min read
LW
link
[Question]
What are the differences between all the iterative/recursive approaches to AI alignment?
riceissa
21 Sep 2019 2:09 UTC
30
points
14
comments
2
min read
LW
link
Machine Learning Projects on IDA
Owain_Evans
,
William_S
and
stuhlmueller
24 Jun 2019 18:38 UTC
49
points
3
comments
2
min read
LW
link
[Question]
What’s wrong with these analogies for understanding Informed Oversight and IDA?
Wei Dai
20 Mar 2019 9:11 UTC
35
points
3
comments
1
min read
LW
link
RAISE is launching their MVP
null
26 Feb 2019 11:45 UTC
67
points
1
comment
1
min read
LW
link
Reinforcement Learning in the Iterated Amplification Framework
William_S
9 Feb 2019 0:56 UTC
25
points
12
comments
4
min read
LW
link
Security amplification
paulfchristiano
6 Feb 2019 17:28 UTC
21
points
2
comments
13
min read
LW
link
Reliability amplification
paulfchristiano
31 Jan 2019 21:12 UTC
24
points
3
comments
7
min read
LW
link
Techniques for optimizing worst-case performance
paulfchristiano
28 Jan 2019 21:29 UTC
23
points
12
comments
8
min read
LW
link
Thoughts on reward engineering
paulfchristiano
24 Jan 2019 20:15 UTC
30
points
30
comments
11
min read
LW
link
Capability amplification
paulfchristiano
20 Jan 2019 7:03 UTC
24
points
8
comments
13
min read
LW
link
The reward engineering problem
paulfchristiano
16 Jan 2019 18:47 UTC
26
points
3
comments
7
min read
LW
link
Directions and desiderata for AI alignment
paulfchristiano
13 Jan 2019 7:47 UTC
47
points
1
comment
14
min read
LW
link
AlphaGo Zero and capability amplification
paulfchristiano
9 Jan 2019 0:40 UTC
33
points
23
comments
2
min read
LW
link
Supervising strong learners by amplifying weak experts
paulfchristiano
6 Jan 2019 7:00 UTC
29
points
1
comment
1
min read
LW
link
(arxiv.org)
Three AI Safety Related Ideas
Wei Dai
13 Dec 2018 21:32 UTC
68
points
38
comments
2
min read
LW
link
Factored Cognition
stuhlmueller
5 Dec 2018 1:01 UTC
45
points
6
comments
17
min read
LW
link
Benign model-free RL
paulfchristiano
2 Dec 2018 4:10 UTC
15
points
1
comment
7
min read
LW
link
Iterated Distillation and Amplification
Ajeya Cotra
30 Nov 2018 4:47 UTC
47
points
14
comments
6
min read
LW
link
Corrigibility
paulfchristiano
27 Nov 2018 21:50 UTC
57
points
8
comments
6
min read
LW
link
Humans Consulting HCH
paulfchristiano
25 Nov 2018 23:18 UTC
33
points
9
comments
1
min read
LW
link
Approval-directed bootstrapping
paulfchristiano
25 Nov 2018 23:18 UTC
22
points
0
comments
1
min read
LW
link
Approval-directed agents
paulfchristiano
22 Nov 2018 21:15 UTC
31
points
10
comments
15
min read
LW
link
Preface to the sequence on iterated amplification
paulfchristiano
10 Nov 2018 13:24 UTC
43
points
8
comments
3
min read
LW
link
Meta-execution
paulfchristiano
1 Nov 2018 22:18 UTC
20
points
1
comment
5
min read
LW
link
Disagreement with Paul: alignment induction
Stuart_Armstrong
10 Sep 2018 13:54 UTC
31
points
6
comments
1
min read
LW
link
A comment on the IDA-AlphaGoZero metaphor; capabilities versus alignment
AlexMennen
11 Jul 2018 1:03 UTC
40
points
1
comment
1
min read
LW
link
Paul’s research agenda FAQ
zhukeepa
1 Jul 2018 6:25 UTC
126
points
74
comments
19
min read
LW
link
1
review
A general model of safety-oriented AI development
Wei Dai
11 Jun 2018 21:00 UTC
65
points
8
comments
1
min read
LW
link
Amplification Discussion Notes
William_S
1 Jun 2018 19:03 UTC
17
points
3
comments
3
min read
LW
link
Challenges to Christiano’s capability amplification proposal
Eliezer Yudkowsky
19 May 2018 18:18 UTC
124
points
54
comments
23
min read
LW
link
1
review
My confusions with Paul’s Agenda
Vaniver
20 Apr 2018 17:24 UTC
37
points
1
comment
6
min read
LW
link
Understanding Iterated Distillation and Amplification: Claims and Oversight
William_S
17 Apr 2018 22:36 UTC
34
points
30
comments
9
min read
LW
link
Problems with Amplification/Distillation
Stuart_Armstrong
27 Mar 2018 11:12 UTC
29
points
7
comments
10
min read
LW
link
Prize for probable problems
paulfchristiano
8 Mar 2018 16:58 UTC
60
points
63
comments
4
min read
LW
link
Explanation of Paul’s AI-Alignment agenda by Ajeya Cotra
habryka
5 Mar 2018 3:10 UTC
20
points
0
comments
1
min read
LW
link
(ai-alignment.com)
No comments.
Back to top