RSS

Corrigibility

TagLast edit: 17 Jul 2020 6:33 UTC by Ben Pace

A corrigible agent is one that doesn’t interfere with what we would intuitively see as attempts to ‘correct’ the agent, or ‘correct’ our mistakes in building it; and permits these ‘corrections’ despite the apparent instrumentally convergent reasoning saying otherwise.

Corrigibility

paulfchristiano27 Nov 2018 21:50 UTC
40 points
4 comments6 min readLW link

Cor­rigi­bil­ity as out­side view

TurnTrout8 May 2020 21:56 UTC
36 points
11 comments4 min readLW link

Can cor­rigi­bil­ity be learned safely?

Wei_Dai1 Apr 2018 23:07 UTC
35 points
115 comments4 min readLW link

Thoughts on im­ple­ment­ing cor­rigible ro­bust alignment

Steven Byrnes26 Nov 2019 14:06 UTC
26 points
2 comments6 min readLW link

An Idea For Cor­rigible, Re­cur­sively Im­prov­ing Math Oracles

jimrandomh20 Jul 2015 3:35 UTC
6 points
0 comments2 min readLW link

Cor­rigible om­ni­scient AI ca­pa­ble of mak­ing clones

Kaj_Sotala22 Mar 2015 12:19 UTC
5 points
0 comments1 min readLW link
(www.sharelatex.com)

Cor­rigible but mis­al­igned: a su­per­in­tel­li­gent messiah

zhukeepa1 Apr 2018 6:20 UTC
28 points
26 comments5 min readLW link

The limits of corrigibility

Stuart_Armstrong10 Apr 2018 10:49 UTC
25 points
9 comments4 min readLW link

Ad­dress­ing three prob­lems with coun­ter­fac­tual cor­rigi­bil­ity: bad bets, defend­ing against back­stops, and over­con­fi­dence.

RyanCarey21 Oct 2018 12:03 UTC
22 points
1 comment6 min readLW link

Towards a mechanis­tic un­der­stand­ing of corrigibility

evhub22 Aug 2019 23:20 UTC
39 points
26 comments6 min readLW link

Three men­tal images from think­ing about AGI de­bate & corrigibility

Steven Byrnes3 Aug 2020 14:29 UTC
50 points
35 comments4 min readLW link

Do what we mean vs. do what we say

rohinmshah30 Aug 2018 22:03 UTC
34 points
14 comments1 min readLW link

AI Align­ment 2018-19 Review

rohinmshah28 Jan 2020 2:19 UTC
115 points
6 comments35 min readLW link

Non-Ob­struc­tion: A Sim­ple Con­cept Mo­ti­vat­ing Corrigibility

TurnTrout21 Nov 2020 19:35 UTC
63 points
19 comments19 min readLW link

A Cri­tique of Non-Obstruction

Joe_Collman3 Feb 2021 8:45 UTC
13 points
10 comments4 min readLW link

Solv­ing the whole AGI con­trol prob­lem, ver­sion 0.0001

Steven Byrnes8 Apr 2021 15:14 UTC
41 points
4 comments26 min readLW link

An­nounce­ment: AI al­ign­ment prize round 4 winners

cousin_it20 Jan 2019 14:46 UTC
74 points
41 comments1 min readLW link

A Gym Grid­world En­vi­ron­ment for the Treach­er­ous Turn

Michaël Trazzi28 Jul 2018 21:27 UTC
66 points
9 comments3 min readLW link
(github.com)

Boe­ing 737 MAX MCAS as an agent cor­rigi­bil­ity failure

shminux16 Mar 2019 1:46 UTC
48 points
2 comments1 min readLW link

New pa­per: Cor­rigi­bil­ity with Utility Preservation

Koen.Holtman6 Aug 2019 19:04 UTC
35 points
11 comments2 min readLW link

In­tro­duc­ing Cor­rigi­bil­ity (an FAI re­search sub­field)

So8res20 Oct 2014 21:09 UTC
52 points
28 comments3 min readLW link

Cake, or death!

Stuart_Armstrong25 Oct 2012 10:33 UTC
45 points
13 comments4 min readLW link

[Question] What are some good ex­am­ples of in­cor­rigi­bil­ity?

RyanCarey28 Apr 2019 0:22 UTC
23 points
17 comments1 min readLW link

Cor­rigi­bil­ity thoughts II: the robot operator

Stuart_Armstrong18 Jan 2017 15:52 UTC
3 points
2 comments2 min readLW link

Cor­rigi­bil­ity thoughts III: ma­nipu­lat­ing ver­sus deceiving

Stuart_Armstrong18 Jan 2017 15:57 UTC
3 points
0 comments1 min readLW link

Ques­tion: MIRI Cor­rig­bil­ity Agenda

algon3313 Mar 2019 19:38 UTC
15 points
11 comments1 min readLW link

Petrov corrigibility

Stuart_Armstrong11 Sep 2018 13:50 UTC
20 points
10 comments1 min readLW link

Cor­rigi­bil­ity doesn’t always have a good ac­tion to take

Stuart_Armstrong28 Aug 2018 20:30 UTC
19 points
0 comments1 min readLW link

Cor­rigi­bil­ity as Con­strained Optimisation

Henrik Åslund11 Apr 2019 20:09 UTC
15 points
3 comments5 min readLW link

Three AI Safety Re­lated Ideas

Wei_Dai13 Dec 2018 21:32 UTC
62 points
38 comments2 min readLW link

Coun­ter­fac­tual Plan­ning in AGI Systems

Koen.Holtman3 Feb 2021 13:54 UTC
5 points
0 comments5 min readLW link

Creat­ing AGI Safety Interlocks

Koen.Holtman5 Feb 2021 12:01 UTC
7 points
4 comments8 min readLW link

Disen­tan­gling Cor­rigi­bil­ity: 2015-2021

Koen.Holtman16 Feb 2021 18:01 UTC
15 points
20 comments9 min readLW link

Map­ping the Con­cep­tual Ter­ri­tory in AI Ex­is­ten­tial Safety and Alignment

jbkjr12 Feb 2021 7:55 UTC
15 points
0 comments26 min readLW link

Safely con­trol­ling the AGI agent re­ward function

Koen.Holtman17 Feb 2021 14:47 UTC
7 points
0 comments5 min readLW link
No comments.