All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025 2026

All Jan Feb Mar Apr May JunJulAug Sep Oct Nov Dec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Reward is not the optimization target

TurnTrout25 Jul 2022 0:03 UTC

385 points

128 comments10 min readLW link 3 reviews

Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

Ajeya Cotra18 Jul 2022 19:06 UTC

373 points

95 comments75 min readLW link 1 review

What should you change in response to an “emergency”? And AI risk

AnnaSalamon18 Jul 2022 1:11 UTC

346 points

60 comments7 min readLW link 1 review

Looking back on my alignment PhD

TurnTrout1 Jul 2022 3:19 UTC

334 points

67 comments11 min readLW link

On how various plans miss the hard bits of the alignment challenge

So8res12 Jul 2022 2:49 UTC

322 points

91 comments29 min readLW link 3 reviews

Toni Kurz and the Insanity of Climbing Mountains

GeneSmith3 Jul 2022 20:51 UTC

293 points

73 comments11 min readLW link 2 reviews

Changing the world through slack & hobbies

Steven Byrnes21 Jul 2022 18:11 UTC

271 points

13 comments10 min readLW link

Safetywashing

Adam Scholl1 Jul 2022 11:56 UTC

265 points

20 comments1 min readLW link 2 reviews

Sexual Abuse attitudes might be infohazardous

Pseudonymous Otter19 Jul 2022 18:06 UTC

258 points

72 comments1 min readLW link

Unifying Bargaining Notions (1/2)

Diffractor25 Jul 2022 0:28 UTC

213 points

41 comments16 min readLW link

Humans provide an untapped wealth of evidence about alignment

TurnTrout and Quintin Pope

14 Jul 2022 2:31 UTC

213 points

94 comments9 min readLW link 1 review

A note about differential technological development

So8res15 Jul 2022 4:46 UTC

199 points

34 comments6 min readLW link

Connor Leahy on Dying with Dignity, EleutherAI and Conjecture

Michaël Trazzi22 Jul 2022 18:44 UTC

195 points

29 comments14 min readLW link

(theinsideview.ai)

AGI ruin scenarios are likely (and disjunctive)

So8res27 Jul 2022 3:21 UTC

177 points

38 comments6 min readLW link

ITT-passing and civility are good; “charity” is bad; steelmanning is niche

Rob Bensinger5 Jul 2022 0:15 UTC

165 points

37 comments6 min readLW link 1 review

«Boundaries», Part 1: a key missing concept from utility theory

Andrew_Critch26 Jul 2022 23:03 UTC

160 points

33 comments7 min readLW link

Resolve Cycles

CFAR!Duncan16 Jul 2022 23:17 UTC

145 points

8 comments10 min readLW link

Carrying the Torch: A Response to Anna Salamon by the Guild of the Rose

moridinamael6 Jul 2022 14:20 UTC

137 points

16 comments6 min readLW link

Brainstorm of things that could force an AI team to burn their lead

So8res24 Jul 2022 23:58 UTC

136 points

8 comments13 min readLW link

Limerence Messes Up Your Rationality Real Bad, Yo

Raemon1 Jul 2022 16:53 UTC

135 points

41 comments3 min readLW link 2 reviews

AI Forecasting: One Year In

jsteinhardt4 Jul 2022 5:10 UTC

132 points

12 comments6 min readLW link

(bounded-regret.ghost.io)

Conjecture: Internal Infohazard Policy

Connor Leahy, Sid Black, Chris Scammell and Andrea_Miotti

29 Jul 2022 19:07 UTC

130 points

6 comments19 min readLW link

Focusing

CFAR!Duncan29 Jul 2022 19:15 UTC

130 points

25 comments14 min readLW link

Moral strategies at different capability levels

Richard_Ngo27 Jul 2022 18:50 UTC

123 points

14 comments5 min readLW link

(thinkingcomplete.blogspot.com)

Principles for Alignment/Agency Projects

johnswentworth7 Jul 2022 2:07 UTC

122 points

20 comments4 min readLW link

Circumventing interpretability: How to defeat mind-readers

Lee Sharkey14 Jul 2022 16:59 UTC

119 points

15 comments33 min readLW link

Unifying Bargaining Notions (2/2)

Diffractor27 Jul 2022 3:40 UTC

118 points

19 comments21 min readLW link

Criticism of EA Criticism Contest

Zvi14 Jul 2022 14:30 UTC

108 points

17 comments31 min readLW link 1 review

(thezvi.wordpress.com)

Examples of AI Increasing AI Progress

TW12317 Jul 2022 20:06 UTC

107 points

14 comments1 min readLW link

Internal Double Crux

CFAR!Duncan22 Jul 2022 4:34 UTC

104 points

15 comments12 min readLW link

Safety Implications of LeCun’s path to machine intelligence

Ivan Vendrov15 Jul 2022 21:47 UTC

103 points

18 comments6 min readLW link

Goal Factoring

CFAR!Duncan5 Jul 2022 7:10 UTC

101 points

2 comments8 min readLW link

Comment on “Propositions Concerning Digital Minds and Society”

Zack_M_Davis10 Jul 2022 5:48 UTC

100 points

12 comments8 min readLW link

A summary of every “Highlights from the Sequences” post

Orpheus1615 Jul 2022 23:01 UTC

99 points

7 comments17 min readLW link

Opening Session Tips & Advice

CFAR!Duncan25 Jul 2022 3:57 UTC

99 points

3 comments14 min readLW link 1 review

Marriage, the Giving What We Can Pledge, and the damage caused by vague public commitments

Jeffrey Ladish11 Jul 2022 19:38 UTC

98 points

27 comments6 min readLW link 1 review

Naive Hypotheses on AI Alignment

Shoshannah Tekofsky2 Jul 2022 19:03 UTC

98 points

29 comments5 min readLW link

MATS Models

johnswentworth9 Jul 2022 0:14 UTC

95 points

5 comments16 min readLW link

Help ARC evaluate capabilities of current language models (still need people)

Beth Barnes19 Jul 2022 4:55 UTC

95 points

6 comments2 min readLW link

Human values & biases are inaccessible to the genome

TurnTrout7 Jul 2022 17:29 UTC

95 points

54 comments6 min readLW link 1 review

Immanuel Kant and the Decision Theory App Store

Daniel Kokotajlo10 Jul 2022 16:04 UTC

95 points

12 comments5 min readLW link

Trigger-Action Planning

CFAR!Duncan3 Jul 2022 1:42 UTC

92 points

14 comments13 min readLW link 2 reviews

Don’t use ‘infohazard’ for collectively destructive info

Eliezer Yudkowsky15 Jul 2022 5:13 UTC

87 points

34 comments1 min readLW link 2 reviews

(www.facebook.com)

Addendum: A non-magical explanation of Jeffrey Epstein

lc18 Jul 2022 17:40 UTC

87 points

21 comments11 min readLW link

Aversion Factoring

CFAR!Duncan7 Jul 2022 16:09 UTC

87 points

1 comment8 min readLW link

How to Diversify Conceptual Alignment: the Model Behind Refine

adamShimi20 Jul 2022 10:44 UTC

87 points

11 comments8 min readLW link

Trends in GPU price-performance

Marius Hobbhahn and Tamay

1 Jul 2022 15:51 UTC

85 points

13 comments1 min readLW link 1 review

(epochai.org)

All AGI safety questions welcome (especially basic ones) [July 2022]

plex and Robert Miles

16 Jul 2022 12:57 UTC

84 points

132 comments3 min readLW link

Benchmark for successful concept extrapolation/avoiding goal misgeneralization

Stuart_Armstrong4 Jul 2022 20:48 UTC

83 points

12 comments4 min readLW link

Decision theory and dynamic inconsistency

paulfchristiano3 Jul 2022 22:20 UTC

82 points

33 comments10 min readLW link

(sideways-view.com)