Without spe­cific coun­ter­mea­sures, the eas­iest path to trans­for­ma­tive AI likely leads to AI takeover

Ajeya Cotra18 Jul 2022 19:06 UTC
364 points
94 comments75 min readLW link1 review

Re­ward is not the op­ti­miza­tion target

TurnTrout25 Jul 2022 0:03 UTC
348 points
123 comments10 min readLW link3 reviews

What should you change in re­sponse to an “emer­gency”? And AI risk

AnnaSalamon18 Jul 2022 1:11 UTC
329 points
60 comments6 min readLW link1 review

Look­ing back on my al­ign­ment PhD

TurnTrout1 Jul 2022 3:19 UTC
318 points
63 comments11 min readLW link

On how var­i­ous plans miss the hard bits of the al­ign­ment challenge

So8res12 Jul 2022 2:49 UTC
302 points
88 comments29 min readLW link3 reviews

Toni Kurz and the In­san­ity of Climb­ing Mountains

GeneSmith3 Jul 2022 20:51 UTC
268 points
67 comments11 min readLW link2 reviews

Chang­ing the world through slack & hobbies

Steven Byrnes21 Jul 2022 18:11 UTC
258 points
13 comments10 min readLW link

Safetywashing

Adam Scholl1 Jul 2022 11:56 UTC
255 points
20 comments1 min readLW link2 reviews

Sex­ual Abuse at­ti­tudes might be infohazardous

Pseudonymous Otter19 Jul 2022 18:06 UTC
254 points
71 comments1 min readLW link

Unify­ing Bar­gain­ing No­tions (1/​2)

Diffractor25 Jul 2022 0:28 UTC
204 points
41 comments16 min readLW link

Hu­mans provide an un­tapped wealth of ev­i­dence about alignment

14 Jul 2022 2:31 UTC
197 points
94 comments9 min readLW link1 review

Con­nor Leahy on Dy­ing with Dig­nity, EleutherAI and Conjecture

Michaël Trazzi22 Jul 2022 18:44 UTC
194 points
29 comments14 min readLW link
(theinsideview.ai)

A note about differ­en­tial tech­nolog­i­cal development

So8res15 Jul 2022 4:46 UTC
192 points
32 comments6 min readLW link

AGI ruin sce­nar­ios are likely (and dis­junc­tive)

So8res27 Jul 2022 3:21 UTC
170 points
38 comments6 min readLW link

ITT-pass­ing and ci­vil­ity are good; “char­ity” is bad; steel­man­ning is niche

Rob Bensinger5 Jul 2022 0:15 UTC
161 points
36 comments6 min readLW link1 review

«Boundaries», Part 1: a key miss­ing con­cept from util­ity theory

Andrew_Critch26 Jul 2022 23:03 UTC
158 points
32 comments7 min readLW link

Re­solve Cycles

CFAR!Duncan16 Jul 2022 23:17 UTC
134 points
8 comments10 min readLW link

Brain­storm of things that could force an AI team to burn their lead

So8res24 Jul 2022 23:58 UTC
134 points
8 comments13 min readLW link

Car­ry­ing the Torch: A Re­sponse to Anna Sala­mon by the Guild of the Rose

moridinamael6 Jul 2022 14:20 UTC
133 points
16 comments6 min readLW link

AI Fore­cast­ing: One Year In

jsteinhardt4 Jul 2022 5:10 UTC
132 points
12 comments6 min readLW link
(bounded-regret.ghost.io)

Con­jec­ture: In­ter­nal In­fo­haz­ard Policy

29 Jul 2022 19:07 UTC
131 points
6 comments19 min readLW link

Limer­ence Messes Up Your Ra­tion­al­ity Real Bad, Yo

Raemon1 Jul 2022 16:53 UTC
121 points
41 comments3 min readLW link2 reviews

Prin­ci­ples for Align­ment/​Agency Projects

johnswentworth7 Jul 2022 2:07 UTC
121 points
20 comments4 min readLW link

Unify­ing Bar­gain­ing No­tions (2/​2)

Diffractor27 Jul 2022 3:40 UTC
116 points
19 comments21 min readLW link

Mo­ral strate­gies at differ­ent ca­pa­bil­ity levels

Richard_Ngo27 Jul 2022 18:50 UTC
112 points
14 comments5 min readLW link
(thinkingcomplete.blogspot.com)

Cir­cum­vent­ing in­ter­pretabil­ity: How to defeat mind-readers

Lee Sharkey14 Jul 2022 16:59 UTC
112 points
12 comments33 min readLW link

Crit­i­cism of EA Crit­i­cism Contest

Zvi14 Jul 2022 14:30 UTC
108 points
17 comments31 min readLW link1 review
(thezvi.wordpress.com)

Focusing

CFAR!Duncan29 Jul 2022 19:15 UTC
107 points
23 comments14 min readLW link

Ex­am­ples of AI In­creas­ing AI Progress

ThomasW17 Jul 2022 20:06 UTC
107 points
14 comments1 min readLW link

Safety Im­pli­ca­tions of LeCun’s path to ma­chine intelligence

Ivan Vendrov15 Jul 2022 21:47 UTC
102 points
18 comments6 min readLW link

Com­ment on “Propo­si­tions Con­cern­ing Digi­tal Minds and So­ciety”

Zack_M_Davis10 Jul 2022 5:48 UTC
99 points
12 comments8 min readLW link

Mar­riage, the Giv­ing What We Can Pledge, and the dam­age caused by vague pub­lic commitments

Jeffrey Ladish11 Jul 2022 19:38 UTC
98 points
27 comments6 min readLW link1 review

Naive Hy­pothe­ses on AI Alignment

Shoshannah Tekofsky2 Jul 2022 19:03 UTC
98 points
29 comments5 min readLW link

Help ARC eval­u­ate ca­pa­bil­ities of cur­rent lan­guage mod­els (still need peo­ple)

Beth Barnes19 Jul 2022 4:55 UTC
95 points
6 comments2 min readLW link

A sum­mary of ev­ery “High­lights from the Se­quences” post

Akash15 Jul 2022 23:01 UTC
94 points
7 comments17 min readLW link

Hu­man val­ues & bi­ases are in­ac­cessible to the genome

TurnTrout7 Jul 2022 17:29 UTC
93 points
54 comments6 min readLW link1 review

In­ter­nal Dou­ble Crux

CFAR!Duncan22 Jul 2022 4:34 UTC
88 points
15 comments12 min readLW link

Im­manuel Kant and the De­ci­sion The­ory App Store

Daniel Kokotajlo10 Jul 2022 16:04 UTC
88 points
12 comments5 min readLW link

How to Diver­sify Con­cep­tual Align­ment: the Model Be­hind Refine

adamShimi20 Jul 2022 10:44 UTC
87 points
11 comments8 min readLW link

MATS Models

johnswentworth9 Jul 2022 0:14 UTC
86 points
5 comments16 min readLW link

Trends in GPU price-performance

1 Jul 2022 15:51 UTC
85 points
12 comments1 min readLW link1 review
(epochai.org)

Don’t use ‘in­fo­haz­ard’ for col­lec­tively de­struc­tive info

Eliezer Yudkowsky15 Jul 2022 5:13 UTC
84 points
33 comments1 min readLW link2 reviews
(www.facebook.com)

All AGI safety ques­tions wel­come (es­pe­cially ba­sic ones) [July 2022]

16 Jul 2022 12:57 UTC
84 points
132 comments3 min readLW link

Bench­mark for suc­cess­ful con­cept ex­trap­o­la­tion/​avoid­ing goal misgeneralization

Stuart_Armstrong4 Jul 2022 20:48 UTC
82 points
12 comments4 min readLW link

Open­ing Ses­sion Tips & Advice

CFAR!Duncan25 Jul 2022 3:57 UTC
81 points
3 comments14 min readLW link1 review

Trig­ger-Ac­tion Planning

CFAR!Duncan3 Jul 2022 1:42 UTC
81 points
14 comments13 min readLW link2 reviews

Goal Factoring

CFAR!Duncan5 Jul 2022 7:10 UTC
80 points
2 comments8 min readLW link

Ad­den­dum: A non-mag­i­cal ex­pla­na­tion of Jeffrey Epstein

lc18 Jul 2022 17:40 UTC
80 points
21 comments11 min readLW link

[Question] How do AI timelines af­fect how you live your life?

Quadratic Reciprocity11 Jul 2022 13:54 UTC
80 points
50 comments1 min readLW link

De­ci­sion the­ory and dy­namic inconsistency

paulfchristiano3 Jul 2022 22:20 UTC
79 points
33 comments10 min readLW link
(sideways-view.com)