RSS

For­mal Philos­o­phy and Align­ment Pos­si­ble Projects

Whispermute30 Jun 2022 10:42 UTC
31 points
5 comments8 min readLW link

What Is The True Name of Mo­du­lar­ity?

1 Jul 2022 14:55 UTC
19 points
3 comments12 min readLW link

Call For Distillers

johnswentworth4 Apr 2022 18:25 UTC
186 points
36 comments3 min readLW link

[Linkpost] Ex­is­ten­tial Risk Anal­y­sis in Em­piri­cal Re­search Papers

Dan Hendrycks2 Jul 2022 0:09 UTC
29 points
0 comments1 min readLW link
(arxiv.org)

An­nounc­ing the In­verse Scal­ing Prize ($250k Prize Pool)

27 Jun 2022 15:58 UTC
157 points
12 comments7 min readLW link

AXRP Epi­sode 16 - Prepar­ing for De­bate AI with Ge­offrey Irving

DanielFilan1 Jul 2022 22:20 UTC
11 points
0 comments37 min readLW link

La­tent Ad­ver­sar­ial Training

Adam Jermyn29 Jun 2022 20:04 UTC
18 points
3 comments5 min readLW link

Ex­plor­ing Mild Be­havi­our in Embed­ded Agents

Megan Kinniment27 Jun 2022 18:56 UTC
19 points
3 comments18 min readLW link

A de­scrip­tive, not pre­scrip­tive, overview of cur­rent AI Align­ment Research

6 Jun 2022 21:59 UTC
94 points
17 comments7 min readLW link

Will Ca­pa­bil­ities Gen­er­al­ise More?

Ramana Kumar29 Jun 2022 17:12 UTC
52 points
10 comments4 min readLW link

Utility Max­i­miza­tion = De­scrip­tion Length Minimization

johnswentworth18 Feb 2021 18:04 UTC
166 points
38 comments5 min readLW link

[In­tro to brain-like-AGI safety] 15. Con­clu­sion: Open prob­lems, how to help, AMA

Steven Byrnes17 May 2022 15:11 UTC
70 points
10 comments14 min readLW link

Why Subagents?

johnswentworth1 Aug 2019 22:17 UTC
154 points
38 comments7 min readLW link1 review

Where I agree and dis­agree with Eliezer

paulfchristiano19 Jun 2022 19:15 UTC
684 points
191 comments20 min readLW link

The Big Pic­ture Of Align­ment (Talk Part 2)

johnswentworth25 Feb 2022 2:53 UTC
32 points
12 comments1 min readLW link
(www.youtube.com)

Open Prob­lems in Nega­tive Side Effect Minimization

6 May 2022 9:37 UTC
12 points
4 comments17 min readLW link

A cen­tral AI al­ign­ment prob­lem: ca­pa­bil­ities gen­er­al­iza­tion, and the sharp left turn

So8res15 Jun 2022 13:10 UTC
200 points
36 comments10 min readLW link

The Case for a Jour­nal of AI Alignment

adamShimi9 Jan 2021 18:13 UTC
45 points
31 comments4 min readLW link

Op­ti­mal­ity is the tiger, and agents are its teeth

Veedrac2 Apr 2022 0:46 UTC
172 points
28 comments16 min readLW link

AGI Ruin: A List of Lethalities

Eliezer Yudkowsky5 Jun 2022 22:05 UTC
666 points
629 comments30 min readLW link