Deconfusion

TagLast edit: 17 Mar 2021 19:13 UTC by abramdemski

Narrowly, deconfusion is a specific branch of AI alignment research, discussed in MIRI’s 2018 research update. More broadly, the term applies to any domain. Quoting from the research update:

By deconfusion, I mean something like “making it so that you can think about a given topic without continuously accidentally spouting nonsense.”

Interpreting the Learning of Deceit

RogerDearnaley18 Dec 2023 8:12 UTC

30 points

8 comments9 min readLW link

1. A Sense of Fairness: Deconfusing Ethics

RogerDearnaley17 Nov 2023 20:55 UTC

15 points

8 comments15 min readLW link

A Mathematical Model for Simulators

lukemarks2 Oct 2023 6:46 UTC

11 points

0 comments2 min readLW link

My Central Alignment Priority (2 July 2023)

NicholasKross3 Jul 2023 1:46 UTC

12 points

1 comment3 min readLW link

My research agenda in agent foundations

Alex_Altair28 Jun 2023 18:00 UTC

70 points

9 comments11 min readLW link

Higher Dimension Cartesian Objects and Aligning ‘Tiling Simulators’

lukemarks11 Jun 2023 0:13 UTC

22 points

0 comments5 min readLW link

Reward is the optimization target (of capabilities researchers)

Max H15 May 2023 3:22 UTC

32 points

4 comments5 min readLW link

Reality and reality-boxes

Jim Pivarski13 May 2023 14:14 UTC

37 points

11 comments21 min readLW link

[Question] How should we think about the decision relevance of models estimating p(doom)?

Mo Putera11 May 2023 4:16 UTC

11 points

1 comment3 min readLW link

Trying to isolate objectives: approaches toward high-level interpretability

Jozdien9 Jan 2023 18:33 UTC

48 points

14 comments8 min readLW link

Deconfusing Direct vs Amortised Optimization

beren2 Dec 2022 11:30 UTC

107 points

17 comments10 min readLW link

Builder/Breaker for Deconfusion

abramdemski29 Sep 2022 17:36 UTC

72 points

9 comments9 min readLW link

[Question] Why Do AI researchers Rate the Probability of Doom So Low?

Aorou24 Sep 2022 2:33 UTC

7 points

6 comments3 min readLW link

Simulators

janus2 Sep 2022 12:45 UTC

594 points

161 comments41 min readLW link 8 reviews

(generative.ink)

My summary of the alignment problem

Peter Hroššo11 Aug 2022 19:42 UTC

16 points

3 comments2 min readLW link

(threadreaderapp.com)

Reward is not the optimization target

TurnTrout25 Jul 2022 0:03 UTC

348 points

123 comments10 min readLW link 3 reviews

The Plan

johnswentworth10 Dec 2021 23:41 UTC

254 points

78 comments14 min readLW link 1 review

Modelling Transformative AI Risks (MTAIR) Project: Introduction

Davidmanheim and Aryeh Englander

16 Aug 2021 7:12 UTC

91 points

0 comments9 min readLW link

Approaches to gradient hacking

adamShimi14 Aug 2021 15:16 UTC

16 points

8 comments8 min readLW link

A review of “Agents and Devices”

adamShimi13 Aug 2021 8:42 UTC

20 points

0 comments4 min readLW link

Power-seeking for successive choices

adamShimi12 Aug 2021 20:37 UTC

11 points

9 comments4 min readLW link

Goal-Directedness and Behavior, Redux

adamShimi9 Aug 2021 14:26 UTC

15 points

4 comments2 min readLW link

Applications for Deconfusing Goal-Directedness

adamShimi8 Aug 2021 13:05 UTC

38 points

3 comments5 min readLW link 1 review

Traps of Formalization in Deconfusion

adamShimi5 Aug 2021 22:40 UTC

25 points

7 comments6 min readLW link

Musings on general systems alignment

Alex Flint30 Jun 2021 18:16 UTC

31 points

11 comments3 min readLW link

Alex Turner’s Research, Comprehensive Information Gathering

adamShimi23 Jun 2021 9:44 UTC

15 points

3 comments3 min readLW link

The Point of Trade

Eliezer Yudkowsky22 Jun 2021 17:56 UTC

171 points

76 comments4 min readLW link 1 review

[Question] Open problem: how can we quantify player alignment in 2x2 normal-form games?

TurnTrout16 Jun 2021 2:09 UTC

23 points

59 comments1 min readLW link

Looking Deeper at Deconfusion

adamShimi13 Jun 2021 21:29 UTC

61 points

13 comments15 min readLW link

Clarifying inner alignment terminology

evhub9 Nov 2020 20:40 UTC

102 points

17 comments3 min readLW link 1 review

Classification of AI alignment research: deconfusion, “good enough” non-superintelligent AI alignment, superintelligent AI alignment

philip_b14 Jul 2020 22:48 UTC

35 points

25 comments3 min readLW link

Exercises in Comprehensive Information Gathering

johnswentworth15 Feb 2020 17:27 UTC

137 points

18 comments3 min readLW link 1 review

Strategy is the Deconfusion of Action

ryan_b2 Jan 2019 20:56 UTC

69 points

4 comments6 min readLW link

On MIRI’s new research directions

Rob Bensinger22 Nov 2018 23:42 UTC

53 points

12 comments1 min readLW link

(intelligence.org)

No comments.