RSS

Deconfusion

TagLast edit: 17 Mar 2021 19:13 UTC by abramdemski

Narrowly, deconfusion is a specific branch of AI alignment research, discussed in MIRI’s 2018 research update. More broadly, the term applies to any domain. Quoting from the research update:

By deconfusion, I mean something like “making it so that you can think about a given topic without continuously accidentally spouting nonsense.”

Look­ing Deeper at Deconfusion

adamShimi13 Jun 2021 21:29 UTC
60 points
13 comments15 min readLW link

Clas­sifi­ca­tion of AI al­ign­ment re­search: de­con­fu­sion, “good enough” non-su­per­in­tel­li­gent AI al­ign­ment, su­per­in­tel­li­gent AI alignment

philip_b14 Jul 2020 22:48 UTC
35 points
25 comments3 min readLW link

On MIRI’s new re­search directions

Rob Bensinger22 Nov 2018 23:42 UTC
53 points
12 comments1 min readLW link
(intelligence.org)

Ex­er­cises in Com­pre­hen­sive In­for­ma­tion Gathering

johnswentworth15 Feb 2020 17:27 UTC
129 points
16 comments3 min readLW link1 review

[Question] Open prob­lem: how can we quan­tify player al­ign­ment in 2x2 nor­mal-form games?

TurnTrout16 Jun 2021 2:09 UTC
23 points
59 comments1 min readLW link

Alex Turner’s Re­search, Com­pre­hen­sive In­for­ma­tion Gathering

adamShimi23 Jun 2021 9:44 UTC
15 points
3 comments3 min readLW link

Mus­ings on gen­eral sys­tems alignment

Alex Flint30 Jun 2021 18:16 UTC
31 points
11 comments3 min readLW link

Traps of For­mal­iza­tion in Deconfusion

adamShimi5 Aug 2021 22:40 UTC
24 points
7 comments6 min readLW link

Ap­pli­ca­tions for De­con­fus­ing Goal-Directedness

adamShimi8 Aug 2021 13:05 UTC
36 points
3 comments5 min readLW link1 review

Goal-Direct­ed­ness and Be­hav­ior, Redux

adamShimi9 Aug 2021 14:26 UTC
14 points
4 comments2 min readLW link

Power-seek­ing for suc­ces­sive choices

adamShimi12 Aug 2021 20:37 UTC
11 points
9 comments4 min readLW link

A re­view of “Agents and De­vices”

adamShimi13 Aug 2021 8:42 UTC
12 points
0 comments4 min readLW link

Ap­proaches to gra­di­ent hacking

adamShimi14 Aug 2021 15:16 UTC
16 points
8 comments8 min readLW link

Model­ling Trans­for­ma­tive AI Risks (MTAIR) Pro­ject: Introduction

16 Aug 2021 7:12 UTC
90 points
0 comments9 min readLW link

Strat­egy is the De­con­fu­sion of Action

ryan_b2 Jan 2019 20:56 UTC
69 points
4 comments6 min readLW link

Builder/​Breaker for Deconfusion

abramdemski29 Sep 2022 17:36 UTC
71 points
9 comments9 min readLW link

De­con­fus­ing Direct vs Amor­tised Optimization

beren2 Dec 2022 11:30 UTC
92 points
14 comments10 min readLW link

My sum­mary of the al­ign­ment problem

Peter Hroššo11 Aug 2022 19:42 UTC
16 points
3 comments2 min readLW link
(threadreaderapp.com)

Simulators

janus2 Sep 2022 12:45 UTC
598 points
114 comments41 min readLW link
(generative.ink)

[Question] Why Do AI re­searchers Rate the Prob­a­bil­ity of Doom So Low?

Aorou24 Sep 2022 2:33 UTC
7 points
6 comments3 min readLW link

[Question] How should we think about the de­ci­sion rele­vance of mod­els es­ti­mat­ing p(doom)?

Mo Putera11 May 2023 4:16 UTC
11 points
1 comment3 min readLW link

Re­ward is not the op­ti­miza­tion target

TurnTrout25 Jul 2022 0:03 UTC
291 points
109 comments10 min readLW link

Try­ing to iso­late ob­jec­tives: ap­proaches to­ward high-level interpretability

Jozdien9 Jan 2023 18:33 UTC
45 points
14 comments8 min readLW link

Re­ward is the op­ti­miza­tion tar­get (of ca­pa­bil­ities re­searchers)

Max H15 May 2023 3:22 UTC
24 points
4 comments5 min readLW link

The Point of Trade

Eliezer Yudkowsky22 Jun 2021 17:56 UTC
164 points
75 comments4 min readLW link1 review

The Plan

johnswentworth10 Dec 2021 23:41 UTC
239 points
78 comments14 min readLW link1 review

Clar­ify­ing in­ner al­ign­ment terminology

evhub9 Nov 2020 20:40 UTC
100 points
17 comments3 min readLW link1 review
No comments.