

Par­tial Agency

abramdemski27 Sep 2019 22:04 UTC
72 points
18 comments9 min readLW link

The Credit As­sign­ment Problem

abramdemski8 Nov 2019 2:50 UTC
98 points
40 comments17 min readLW link1 review

How LLMs are and are not myopic

janus25 Jul 2023 2:19 UTC
126 points
14 comments8 min readLW link

Towards a mechanis­tic un­der­stand­ing of corrigibility

evhub22 Aug 2019 23:20 UTC
47 points
26 comments6 min readLW link

Open Prob­lems with Myopia

10 Mar 2021 18:38 UTC
65 points
16 comments8 min readLW link

Steer­ing Be­havi­our: Test­ing for (Non-)My­opia in Lan­guage Models

5 Dec 2022 20:28 UTC
40 points
19 comments10 min readLW link

Defin­ing Myopia

abramdemski19 Oct 2019 21:32 UTC
32 points
18 comments8 min readLW link

LCDT, A My­opic De­ci­sion Theory

3 Aug 2021 22:41 UTC
57 points
50 comments15 min readLW link

Ar­gu­ments against my­opic training

Richard_Ngo9 Jul 2020 16:07 UTC
62 points
39 comments12 min readLW link

You can still fetch the coffee to­day if you’re dead tomorrow

davidad9 Dec 2022 14:06 UTC
84 points
19 comments5 min readLW link

2019 Re­view Rewrite: Seek­ing Power is Often Ro­bustly In­stru­men­tal in MDPs

TurnTrout23 Dec 2020 17:16 UTC
35 points
0 comments4 min readLW link

Seek­ing Power is Often Con­ver­gently In­stru­men­tal in MDPs

5 Dec 2019 2:33 UTC
162 points
39 comments17 min readLW link2 reviews

An overview of 11 pro­pos­als for build­ing safe ad­vanced AI

evhub29 May 2020 20:38 UTC
211 points
36 comments38 min readLW link2 reviews

Un­der­stand­ing and con­trol­ling auto-in­duced dis­tri­bu­tional shift

L Rudolf L13 Dec 2021 14:59 UTC
33 points
4 comments16 min readLW link

Evan Hub­inger on Ho­mo­gene­ity in Take­off Speeds, Learned Op­ti­miza­tion and Interpretability

Michaël Trazzi8 Jun 2021 19:20 UTC
28 points
0 comments55 min readLW link

Bayesian Evolv­ing-to-Extinction

abramdemski14 Feb 2020 23:55 UTC
39 points
13 comments5 min readLW link

Ran­dom Thoughts on Pre­dict-O-Matic

abramdemski17 Oct 2019 23:39 UTC
35 points
3 comments9 min readLW link

The Parable of Pre­dict-O-Matic

abramdemski15 Oct 2019 0:49 UTC
342 points
41 comments14 min readLW link2 reviews

Self-Fulfilling Prophe­cies Aren’t Always About Self-Awareness

John_Maxwell18 Nov 2019 23:11 UTC
14 points
7 comments4 min readLW link

The Dual­ist Pre­dict-O-Matic ($100 prize)

John_Maxwell17 Oct 2019 6:45 UTC
19 points
35 comments5 min readLW link

Why GPT wants to mesa-op­ti­mize & how we might change this

John_Maxwell19 Sep 2020 13:48 UTC
55 points
33 comments9 min readLW link

Un­der­speci­fi­ca­tion of Or­a­cle AI

15 Jan 2023 20:10 UTC
30 points
12 comments19 min readLW link

GPT-4 busted? Clear self-in­ter­est when sum­ma­riz­ing ar­ti­cles about it­self vs when ar­ti­cle talks about Claude, LLaMA, or DALL·E 2

Christopher King31 Mar 2023 17:05 UTC
6 points
4 comments4 min readLW link

Non-my­opia stories

lberglund13 Nov 2023 17:52 UTC
28 points
10 comments7 min readLW link

Fight­ing Akra­sia: In­cen­tivis­ing Action

Gordon Seidoh Worley29 Apr 2009 13:48 UTC
12 points
58 comments2 min readLW link

Graph­i­cal World Models, Coun­ter­fac­tu­als, and Ma­chine Learn­ing Agents

Koen.Holtman17 Feb 2021 11:07 UTC
6 points
2 comments10 min readLW link

Trans­form­ing my­opic op­ti­miza­tion to or­di­nary op­ti­miza­tion—Do we want to seek con­ver­gence for my­opic op­ti­miza­tion prob­lems?

tailcalled11 Dec 2021 20:38 UTC
12 points
1 comment5 min readLW link

How com­plex are my­opic imi­ta­tors?

Vivek Hebbar8 Feb 2022 12:00 UTC
26 points
1 comment15 min readLW link

AI safety via mar­ket making

evhub26 Jun 2020 23:07 UTC
71 points
45 comments1 min readLW link

In­ter­pretabil­ity’s Align­ment-Solv­ing Po­ten­tial: Anal­y­sis of 7 Scenarios

Evan R. Murphy12 May 2022 20:01 UTC
53 points
0 comments59 min readLW link

Ac­cept­abil­ity Ver­ifi­ca­tion: A Re­search Agenda

12 Jul 2022 20:11 UTC
50 points
0 comments1 min readLW link

Laz­i­ness in AI

Richard Henage2 Sep 2022 17:04 UTC
13 points
5 comments1 min readLW link

Gen­er­a­tive, Epi­sodic Ob­jec­tives for Safe AI

Michael Glass5 Oct 2022 23:18 UTC
11 points
3 comments8 min readLW link

Limit­ing an AGI’s Con­text Temporally

EulersApprentice17 Feb 2019 3:29 UTC
5 points
11 comments1 min readLW link


janus2 Sep 2022 12:45 UTC
601 points
161 comments41 min readLW link8 reviews

GPT-4 al­ign­ing with aca­sual de­ci­sion the­ory when in­structed to play games, but in­cludes a CDT ex­pla­na­tion that’s in­cor­rect if they differ

Christopher King23 Mar 2023 16:16 UTC
7 points
4 comments8 min readLW link