Narrowly, deconfusion is a specific branch of AI alignment research, discussed in MIRI’s 2018 research update. More broadly, the term applies to any domain. Quoting from the research update:

By deconfusion, I mean something like “making it so that you can think about a given topic without continuously accidentally spouting nonsense.”

Look­ing Deeper at Deconfusion

adamShimi13 Jun 2021



Builder/​Breaker for Deconfusion

abramdemski29 Sep 2022



Traps of For­mal­iza­tion in Deconfusion

adamShimi5 Aug 2021



On MIRI’s new re­search directions

Rob Bensinger22 Nov 2018



1. A Sense of Fair­ness: De­con­fus­ing Ethics

RogerDearnaley17 Nov 2023



[Question] Open prob­lem: how can we quan­tify player al­ign­ment in 2x2 nor­mal-form games?

TurnTrout16 Jun 2021



De­con­fus­ing Direct vs Amor­tised Optimization

beren2 Dec 2022



My re­search agenda in agent foundations

Alex_Altair28 Jun 2023



My Cen­tral Align­ment Pri­or­ity (2 July 2023)

Nicholas / Heather Kross3 Jul 2023



Strat­egy is the De­con­fu­sion of Action

ryan_b2 Jan 2019



Clas­sifi­ca­tion of AI al­ign­ment re­search: de­con­fu­sion, “good enough” non-su­per­in­tel­li­gent AI al­ign­ment, su­per­in­tel­li­gent AI alignment

philip_b14 Jul 2020



Ex­er­cises in Com­pre­hen­sive In­for­ma­tion Gathering

johnswentworth15 Feb 2020



Alex Turner’s Re­search, Com­pre­hen­sive In­for­ma­tion Gathering

adamShimi23 Jun 2021



Mus­ings on gen­eral sys­tems alignment

Alex Flint30 Jun 2021



Ap­pli­ca­tions for De­con­fus­ing Goal-Directedness

adamShimi8 Aug 2021



Goal-Direct­ed­ness and Be­hav­ior, Redux

adamShimi9 Aug 2021



Power-seek­ing for suc­ces­sive choices

adamShimi12 Aug 2021



A re­view of “Agents and De­vices”

adamShimi13 Aug 2021



Ap­proaches to gra­di­ent hacking

adamShimi14 Aug 2021



Model­ling Trans­for­ma­tive AI Risks (MTAIR) Pro­ject: Introduction

16 Aug 2021



My sum­mary of the al­ign­ment problem

Peter Hroššo11 Aug 2022



Re­ward is the op­ti­miza­tion tar­get (of ca­pa­bil­ities re­searchers)

Max H15 May 2023



Higher Di­men­sion Carte­sian Ob­jects and Align­ing ‘Tiling Si­mu­la­tors’

lukemarks11 Jun 2023




janus2 Sep 2022



[Question] Why Do AI re­searchers Rate the Prob­a­bil­ity of Doom So Low?

Aorou24 Sep 2022



Real­ity and re­al­ity-boxes

Jim Pivarski13 May 2023



Re­ward is not the op­ti­miza­tion target

TurnTrout25 Jul 2022



Try­ing to iso­late ob­jec­tives: ap­proaches to­ward high-level interpretability

Jozdien9 Jan 2023



[Question] How should we think about the de­ci­sion rele­vance of mod­els es­ti­mat­ing p(doom)?

Mo Putera11 May 2023



In­ter­pret­ing the Learn­ing of Deceit

RogerDearnaley18 Dec 2023



The Point of Trade

Eliezer Yudkowsky22 Jun 2021



The Plan

johnswentworth10 Dec 2021



Clar­ify­ing in­ner al­ign­ment terminology

evhub9 Nov 2020


