Deconfusion

TagLast edit: Mar 17, 2021, 7:13 PM by abramdemski

Narrowly, deconfusion is a specific branch of AI alignment research, discussed in MIRI’s 2018 research update. More broadly, the term applies to any domain. Quoting from the research update:

By deconfusion, I mean something like “making it so that you can think about a given topic without continuously accidentally spouting nonsense.”

Looking Deeper at Deconfusion

adamShimiJun 13, 2021, 9:29 PM

62 points

13 comments15 min readLW link

Builder/Breaker for Deconfusion

abramdemskiSep 29, 2022, 5:36 PM

72 points

9 comments9 min readLW link

Traps of Formalization in Deconfusion

adamShimiAug 5, 2021, 10:40 PM

28 points

7 comments6 min readLW link

On MIRI’s new research directions

Rob BensingerNov 22, 2018, 11:42 PM

53 points

12 comments1 min readLW link

(intelligence.org)

1. A Sense of Fairness: Deconfusing Ethics

RogerDearnaleyNov 17, 2023, 8:55 PM

17 points

8 comments15 min readLW link

Modelling Transformative AI Risks (MTAIR) Project: Introduction

Davidmanheim and Aryeh Englander

Aug 16, 2021, 7:12 AM

91 points

0 comments9 min readLW link

Strategy is the Deconfusion of Action

ryan_bJan 2, 2019, 8:56 PM

69 points

4 comments6 min readLW link

Approaches to gradient hacking

adamShimiAug 14, 2021, 3:16 PM

16 points

8 comments8 min readLW link

My research agenda in agent foundations

Alex_AltairJun 28, 2023, 6:00 PM

75 points

9 comments11 min readLW link

Alex Turner’s Research, Comprehensive Information Gathering

adamShimiJun 23, 2021, 9:44 AM

15 points

3 comments3 min readLW link

My Central Alignment Priority (2 July 2023)

Nicholas / Heather KrossJul 3, 2023, 1:46 AM

12 points

1 comment3 min readLW link

Deconfusing Direct vs Amortised Optimization

berenDec 2, 2022, 11:30 AM

136 points

19 comments10 min readLW link

Power-seeking for successive choices

adamShimiAug 12, 2021, 8:37 PM

11 points

9 comments4 min readLW link

A review of “Agents and Devices”

adamShimiAug 13, 2021, 8:42 AM

21 points

0 comments4 min readLW link

Musings on general systems alignment

Alex FlintJun 30, 2021, 6:16 PM

31 points

11 comments3 min readLW link

[Question] Open problem: how can we quantify player alignment in 2x2 normal-form games?

TurnTroutJun 16, 2021, 2:09 AM

23 points

59 comments1 min readLW link

Applications for Deconfusing Goal-Directedness

adamShimiAug 8, 2021, 1:05 PM

38 points

3 comments5 min readLW link 1 review

Deceptive Alignment and Homuncularity

Oliver Sourbut and TurnTrout

Jan 16, 2025, 1:55 PM

26 points

12 comments22 min readLW link

Classification of AI alignment research: deconfusion, “good enough” non-superintelligent AI alignment, superintelligent AI alignment

philip_bJul 14, 2020, 10:48 PM

35 points

25 comments3 min readLW link

Exercises in Comprehensive Information Gathering

johnswentworthFeb 15, 2020, 5:27 PM

143 points

18 comments3 min readLW link 1 review

Goal-Directedness and Behavior, Redux

adamShimiAug 9, 2021, 2:26 PM

16 points

4 comments2 min readLW link

My summary of the alignment problem

Peter HroššoAug 11, 2022, 7:42 PM

15 points

3 comments2 min readLW link

(threadreaderapp.com)

The Point of Trade

Eliezer YudkowskyJun 22, 2021, 5:56 PM

178 points

77 comments4 min readLW link 1 review

[Question] How should we think about the decision relevance of models estimating p(doom)?

Mo PuteraMay 11, 2023, 4:16 AM

11 points

1 comment3 min readLW link

Reality and reality-boxes

Jim PivarskiMay 13, 2023, 2:14 PM

37 points

11 comments21 min readLW link

Reward is the optimization target (of capabilities researchers)

Max HMay 15, 2023, 3:22 AM

32 points

4 comments5 min readLW link

The Plan

johnswentworthDec 10, 2021, 11:41 PM

260 points

78 comments14 min readLW link 1 review

Higher Dimension Cartesian Objects and Aligning ‘Tiling Simulators’

lukemarksJun 11, 2023, 12:13 AM

22 points

0 comments5 min readLW link

Reward is not the optimization target

TurnTroutJul 25, 2022, 12:03 AM

376 points

123 comments10 min readLW link 3 reviews

Interpreting the Learning of Deceit

RogerDearnaleyDec 18, 2023, 8:12 AM

30 points

14 comments9 min readLW link

Trying to isolate objectives: approaches toward high-level interpretability

JozdienJan 9, 2023, 6:33 PM

49 points

14 comments8 min readLW link

Clarifying inner alignment terminology

evhubNov 9, 2020, 8:40 PM

109 points

17 comments3 min readLW link 1 review

[Question] Why Do AI researchers Rate the Probability of Doom So Low?

AorouSep 24, 2022, 2:33 AM

7 points

6 comments3 min readLW link

Simulators

janusSep 2, 2022, 12:45 PM

633 points

168 comments41 min readLW link 8 reviews

(generative.ink)

No comments.