De­bate (AI safety tech­nique)

TagLast edit: 14 Jul 2020 23:24 UTC by jacobjacob

Debate is a proposed technique for allowing human evaluators to get correct and helpful answers from experts, even if the evaluator is not themselves an expert or able to fully verify the answers [1]. The technique was suggested as part of an approach to build advanced AI systems that are aligned with human values, and to safely apply machine learning techniques to problems that have high stakes, but are not well-defined (such as advancing science or increase a company’s revenue) [2, 3].

Wri­teup: Progress on AI Safety via Debate

5 Feb 2020 21:04 UTC
88 points
17 comments33 min readLW link

A guide to Iter­ated Am­plifi­ca­tion & Debate

Rafael Harth15 Nov 2020 17:14 UTC
58 points
8 comments15 min readLW link

AI Safety via Debate

ESRogs5 May 2018 2:11 UTC
26 points
12 comments1 min readLW link

Thoughts on AI Safety via Debate

Vaniver9 May 2018 19:46 UTC
35 points
12 comments6 min readLW link

[Question] How should AI de­bate be judged?

abramdemski15 Jul 2020 22:20 UTC
48 points
27 comments6 min readLW link

De­bate up­date: Obfus­cated ar­gu­ments problem

Beth Barnes23 Dec 2020 3:24 UTC
105 points
20 comments16 min readLW link

Op­ti­mal play in hu­man-judged De­bate usu­ally won’t an­swer your question

Joe_Collman27 Jan 2021 7:34 UTC
32 points
8 comments12 min readLW link

Split­ting De­bate up into Two Subsystems

Nandi3 Jul 2020 20:11 UTC
13 points
5 comments4 min readLW link

Thoughts on “AI safety via de­bate”

G Gordon Worley III10 May 2018 0:44 UTC
12 points
4 comments5 min readLW link

Com­par­ing AI Align­ment Ap­proaches to Min­i­mize False Pos­i­tive Risk

G Gordon Worley III30 Jun 2020 19:34 UTC
5 points
0 comments9 min readLW link

An overview of 11 pro­pos­als for build­ing safe ad­vanced AI

evhub29 May 2020 20:38 UTC
147 points
30 comments38 min readLW link

Three men­tal images from think­ing about AGI de­bate & corrigibility

Steven Byrnes3 Aug 2020 14:29 UTC
50 points
35 comments4 min readLW link

Syn­the­siz­ing am­plifi­ca­tion and debate

evhub5 Feb 2020 22:53 UTC
32 points
10 comments4 min readLW link

Look­ing for ad­ver­sar­ial col­lab­o­ra­tors to test our De­bate protocol

Beth Barnes19 Aug 2020 3:15 UTC
52 points
5 comments1 min readLW link

Clar­ify­ing Fac­tored Cognition

Rafael Harth13 Dec 2020 20:02 UTC
23 points
2 comments3 min readLW link

Ideal­ized Fac­tored Cognition

Rafael Harth30 Nov 2020 18:49 UTC
33 points
6 comments11 min readLW link

Travers­ing a Cog­ni­tion Space

Rafael Harth7 Dec 2020 18:32 UTC
16 points
5 comments12 min readLW link

Why I’m ex­cited about Debate

Richard_Ngo15 Jan 2021 23:37 UTC
66 points
12 comments7 min readLW link

FC fi­nal: Can Fac­tored Cog­ni­tion schemes scale?

Rafael Harth24 Jan 2021 22:18 UTC
14 points
0 comments17 min readLW link

Par­allels Between AI Safety by De­bate and Ev­i­dence Law

Cullen_OKeefe20 Jul 2020 22:52 UTC
10 points
1 comment2 min readLW link

AI Safety De­bate and Its Applications

VojtaKovarik23 Jul 2019 22:31 UTC
36 points
5 comments12 min readLW link

New pa­per: (When) is Truth-tel­ling Fa­vored in AI de­bate?

VojtaKovarik26 Dec 2019 19:59 UTC
32 points
7 comments5 min readLW link

Prob­lems with AI debate

Stuart_Armstrong26 Aug 2019 19:21 UTC
21 points
3 comments5 min readLW link

De­bate Minus Fac­tored Cognition

abramdemski29 Dec 2020 22:59 UTC
37 points
42 comments11 min readLW link

Can there be an in­de­scrib­able hel­l­world?

Stuart_Armstrong29 Jan 2019 15:00 UTC
33 points
19 comments2 min readLW link

Imi­ta­tive Gen­er­al­i­sa­tion (AKA ‘Learn­ing the Prior’)

Beth Barnes10 Jan 2021 0:30 UTC
74 points
12 comments12 min readLW link

Map­ping the Con­cep­tual Ter­ri­tory in AI Ex­is­ten­tial Safety and Alignment

jbkjr12 Feb 2021 7:55 UTC
15 points
0 comments26 min readLW link
No comments.