RSS

Deconfusion

TagLast edit: 17 Mar 2021 19:13 UTC by abramdemski

Narrowly, deconfusion is a specific branch of AI alignment research, discussed in MIRI’s 2018 research update. More broadly, the term applies to any domain. Quoting from the research update:

By deconfusion, I mean something like “making it so that you can think about a given topic without continuously accidentally spouting nonsense.”

Look­ing Deeper at Deconfusion

adamShimi13 Jun 2021 21:29 UTC
61 points
13 comments15 min readLW link

Builder/​Breaker for Deconfusion

abramdemski29 Sep 2022 17:36 UTC
72 points
9 comments9 min readLW link

Traps of For­mal­iza­tion in Deconfusion

adamShimi5 Aug 2021 22:40 UTC
25 points
7 comments6 min readLW link

On MIRI’s new re­search directions

Rob Bensinger22 Nov 2018 23:42 UTC
53 points
12 comments1 min readLW link
(intelligence.org)

1. A Sense of Fair­ness: De­con­fus­ing Ethics

RogerDearnaley17 Nov 2023 20:55 UTC
15 points
8 comments15 min readLW link

[Question] Open prob­lem: how can we quan­tify player al­ign­ment in 2x2 nor­mal-form games?

TurnTrout16 Jun 2021 2:09 UTC
23 points
59 comments1 min readLW link

De­con­fus­ing Direct vs Amor­tised Optimization

beren2 Dec 2022 11:30 UTC
111 points
17 comments10 min readLW link

My re­search agenda in agent foundations

Alex_Altair28 Jun 2023 18:00 UTC
70 points
9 comments11 min readLW link

My Cen­tral Align­ment Pri­or­ity (2 July 2023)

NicholasKross3 Jul 2023 1:46 UTC
12 points
1 comment3 min readLW link

Strat­egy is the De­con­fu­sion of Action

ryan_b2 Jan 2019 20:56 UTC
69 points
4 comments6 min readLW link

Clas­sifi­ca­tion of AI al­ign­ment re­search: de­con­fu­sion, “good enough” non-su­per­in­tel­li­gent AI al­ign­ment, su­per­in­tel­li­gent AI alignment

philip_b14 Jul 2020 22:48 UTC
35 points
25 comments3 min readLW link

Ex­er­cises in Com­pre­hen­sive In­for­ma­tion Gathering

johnswentworth15 Feb 2020 17:27 UTC
138 points
18 comments3 min readLW link1 review

Alex Turner’s Re­search, Com­pre­hen­sive In­for­ma­tion Gathering

adamShimi23 Jun 2021 9:44 UTC
15 points
3 comments3 min readLW link

Mus­ings on gen­eral sys­tems alignment

Alex Flint30 Jun 2021 18:16 UTC
31 points
11 comments3 min readLW link

Ap­pli­ca­tions for De­con­fus­ing Goal-Directedness

adamShimi8 Aug 2021 13:05 UTC
38 points
3 comments5 min readLW link1 review

Goal-Direct­ed­ness and Be­hav­ior, Redux

adamShimi9 Aug 2021 14:26 UTC
15 points
4 comments2 min readLW link

Power-seek­ing for suc­ces­sive choices

adamShimi12 Aug 2021 20:37 UTC
11 points
9 comments4 min readLW link

A re­view of “Agents and De­vices”

adamShimi13 Aug 2021 8:42 UTC
21 points
0 comments4 min readLW link

Ap­proaches to gra­di­ent hacking

adamShimi14 Aug 2021 15:16 UTC
16 points
8 comments8 min readLW link

Model­ling Trans­for­ma­tive AI Risks (MTAIR) Pro­ject: Introduction

16 Aug 2021 7:12 UTC
91 points
0 comments9 min readLW link

My sum­mary of the al­ign­ment problem

Peter Hroššo11 Aug 2022 19:42 UTC
16 points
3 comments2 min readLW link
(threadreaderapp.com)

Re­ward is the op­ti­miza­tion tar­get (of ca­pa­bil­ities re­searchers)

Max H15 May 2023 3:22 UTC
32 points
4 comments5 min readLW link

Higher Di­men­sion Carte­sian Ob­jects and Align­ing ‘Tiling Si­mu­la­tors’

lukemarks11 Jun 2023 0:13 UTC
22 points
0 comments5 min readLW link

Simulators

janus2 Sep 2022 12:45 UTC
597 points
161 comments41 min readLW link8 reviews
(generative.ink)

[Question] Why Do AI re­searchers Rate the Prob­a­bil­ity of Doom So Low?

Aorou24 Sep 2022 2:33 UTC
7 points
6 comments3 min readLW link

Real­ity and re­al­ity-boxes

Jim Pivarski13 May 2023 14:14 UTC
37 points
11 comments21 min readLW link

Re­ward is not the op­ti­miza­tion target

TurnTrout25 Jul 2022 0:03 UTC
361 points
123 comments10 min readLW link3 reviews

Try­ing to iso­late ob­jec­tives: ap­proaches to­ward high-level interpretability

Jozdien9 Jan 2023 18:33 UTC
48 points
14 comments8 min readLW link

[Question] How should we think about the de­ci­sion rele­vance of mod­els es­ti­mat­ing p(doom)?

Mo Putera11 May 2023 4:16 UTC
11 points
1 comment3 min readLW link

In­ter­pret­ing the Learn­ing of Deceit

RogerDearnaley18 Dec 2023 8:12 UTC
30 points
10 comments9 min readLW link

The Point of Trade

Eliezer Yudkowsky22 Jun 2021 17:56 UTC
171 points
76 comments4 min readLW link1 review

The Plan

johnswentworth10 Dec 2021 23:41 UTC
254 points
78 comments14 min readLW link1 review

Clar­ify­ing in­ner al­ign­ment terminology

evhub9 Nov 2020 20:40 UTC
102 points
17 comments3 min readLW link1 review
No comments.