All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025 2026

All Jan Feb Mar AprMayJun Jul Aug Sep Oct Nov Dec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

My hour of memoryless lucidity

Eric Neyman4 May 2024 1:40 UTC

381 points

38 comments5 min readLW link 1 review

(ericneyman.wordpress.com)

Notifications Received in 30 Minutes of Class

tanagrabeast26 May 2024 17:02 UTC

378 points

17 comments8 min readLW link 1 review

MIRI 2024 Communications Strategy

Gretta Duleba29 May 2024 19:33 UTC

325 points

218 comments7 min readLW link

Non-Disparagement Canaries for OpenAI

aysja and Adam Scholl

30 May 2024 19:20 UTC

291 points

51 comments2 min readLW link

Truthseeking is the ground in which other principles grow

Elizabeth27 May 2024 1:09 UTC

278 points

18 comments16 min readLW link 2 reviews

Ilya Sutskever and Jan Leike resign from OpenAI [updated]

Zach Stein-Perlman15 May 2024 0:45 UTC

246 points

94 comments2 min readLW link

AI companies aren’t really using external evaluators

Zach Stein-Perlman24 May 2024 16:01 UTC

242 points

15 comments4 min readLW link

Maybe Anthropic’s Long-Term Benefit Trust is powerless

Zach Stein-Perlman27 May 2024 13:00 UTC

206 points

21 comments2 min readLW link

OpenAI: Fallout

Zvi28 May 2024 13:20 UTC

204 points

25 comments36 min readLW link

(thezvi.wordpress.com)

Jaan Tallinn’s 2023 Philanthropy Overview

jaan20 May 2024 12:11 UTC

203 points

5 comments1 min readLW link

(jaan.info)

What’s Going on With OpenAI’s Messaging?

ozziegooen21 May 2024 2:22 UTC

191 points

14 comments3 min readLW link

Deep Honesty

Aletheophile7 May 2024 20:31 UTC

166 points

26 comments9 min readLW link

DeepMind’s “Frontier Safety Framework” is weak and unambitious

Zach Stein-Perlman18 May 2024 3:00 UTC

160 points

14 comments4 min readLW link

Dyslucksia

Shoshannah Tekofsky9 May 2024 19:21 UTC

160 points

46 comments6 min readLW link

Language Models Model Us

eggsyntax17 May 2024 21:00 UTC

159 points

56 comments7 min readLW link 1 review

EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024

scasper21 May 2024 20:15 UTC

157 points

16 comments3 min readLW link

OpenAI: Exodus

Zvi20 May 2024 13:10 UTC

153 points

26 comments44 min readLW link

(thezvi.wordpress.com)

Value Claims (In Particular) Are Usually Bullshit

johnswentworth30 May 2024 6:26 UTC

151 points

18 comments2 min readLW link

The Pearly Gates

lsusr30 May 2024 4:01 UTC

137 points

6 comments3 min readLW link

Awakening

lsusr30 May 2024 7:03 UTC

130 points

80 comments9 min readLW link

Talent Needs of Technical AI Safety Teams

yams, Carson Jones, deus_ex_maki and Ryan Kidd

24 May 2024 0:36 UTC

129 points

65 comments14 min readLW link

shortest goddamn bayes guide ever

lemonhope10 May 2024 7:06 UTC

128 points

28 comments1 min readLW link 2 reviews

[Question] Which skincare products are evidence-based?

Vanessa Kosoy2 May 2024 15:22 UTC

123 points

48 comments1 min readLW link

Do you believe in hundred dollar bills lying on the ground? Consider humming

Elizabeth16 May 2024 0:00 UTC

122 points

22 comments6 min readLW link

(acesounderglass.com)

Response to nostalgebraist: proudly waving my moral-antirealist battle flag

Steven Byrnes29 May 2024 16:48 UTC

118 points

34 comments11 min readLW link

Key takeaways from our EA and alignment research surveys

Cameron Berg, Kvee, florin_pop and Trent Hodgeson

3 May 2024 18:10 UTC

114 points

10 comments21 min readLW link

introduction to cancer vaccines

bhauth5 May 2024 1:06 UTC

113 points

19 comments5 min readLW link

(www.bhauth.com)

Clarifying METR’s Auditing Role

Beth Barnes30 May 2024 18:41 UTC

108 points

1 comment2 min readLW link

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Lucius Bushnaq, jake_mendel, Dan Braun, StefanHex, Nicholas Goldowsky-Dill, Kaarel, Avery, Joern Stoehler, debrevitatevitae, Magdalena Wache and Marius Hobbhahn

20 May 2024 17:53 UTC

108 points

4 comments3 min readLW link

Explaining a Math Magic Trick

Robert_AIZI5 May 2024 19:41 UTC

103 points

10 comments5 min readLW link

We might be missing some key feature of AI takeoff; it’ll probably seem like “we could’ve seen this coming”

Lukas_Gloor9 May 2024 15:43 UTC

101 points

36 comments5 min readLW link

Advice for Activists from the History of Environmentalism

Jeffrey Heninger16 May 2024 18:40 UTC

101 points

10 comments6 min readLW link

(blog.aiimpacts.org)

Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant

Olli Järviniemi and evhub

6 May 2024 7:07 UTC

95 points

13 comments1 min readLW link

(arxiv.org)

[Question] How to get nerds fascinated about mysterious chronic illness research?

riceissa27 May 2024 22:58 UTC

95 points

50 comments2 min readLW link

I am the Golden Gate Bridge

Zvi27 May 2024 14:40 UTC

95 points

6 comments27 min readLW link

(thezvi.wordpress.com)

Apollo Research 1-year update

Marius Hobbhahn, Lee Sharkey, Lucius Bushnaq, Dan Braun, Mikita Balesni, Jérémy Scheurer, Nicholas Goldowsky-Dill, StefanHex, jake_mendel, AlexMeinke and rusheb

29 May 2024 17:44 UTC

93 points

0 comments7 min readLW link

MATS Winter 2023-24 Retrospective

utilistrutil, LauraVaughan, deus_ex_maki, Christian Smith, Juan Gil, Henry Sleight, Matthew Wearden and Ryan Kidd

11 May 2024 0:09 UTC

92 points

28 comments49 min readLW link

“AI Safety for Fleshy Humans” an AI Safety explainer by Nicky Case

habryka3 May 2024 18:10 UTC

92 points

12 comments4 min readLW link

(aisafety.dance)

Teaching CS During Take-Off

andrew carle14 May 2024 22:45 UTC

92 points

13 comments2 min readLW link

Hardshipification

Jonathan Moregård28 May 2024 20:02 UTC

91 points

17 comments2 min readLW link

(honestliving.substack.com)

Review: Conor Moreton’s “Civilization & Cooperation”

Duncan Sabien (Inactive)26 May 2024 19:32 UTC

88 points

9 comments38 min readLW link 1 review

Reward hacking behavior can generalize across tasks

Kei Nishimura-Gasparian, Isaac Dunn, Henry Sleight, Miles Turpin, evhub, Carson Denison and Ethan Perez

28 May 2024 16:33 UTC

86 points

5 comments21 min readLW link

OpenAI: Helen Toner Speaks

Zvi30 May 2024 21:10 UTC

86 points

8 comments13 min readLW link

(thezvi.wordpress.com)

Environmentalism in the United States Is Unusually Partisan

Jeffrey Heninger13 May 2024 21:23 UTC

86 points

26 comments4 min readLW link

(blog.aiimpacts.org)

AISafety.com – Resources for AI Safety

Søren Elverlin, plex, Bryce Robertson and honeybee

17 May 2024 15:57 UTC

84 points

3 comments1 min readLW link

My thesis (Algorithmic Bayesian Epistemology) explained in more depth

Eric Neyman9 May 2024 19:43 UTC

82 points

4 comments27 min readLW link

(ericneyman.wordpress.com)

Instruction-following AGI is easier and more likely than value aligned AGI

Seth Herd15 May 2024 19:38 UTC

82 points

29 comments12 min readLW link 1 review

New voluntary commitments (AI Seoul Summit)

Zach Stein-Perlman21 May 2024 11:00 UTC

81 points

17 comments7 min readLW link

(www.gov.uk)

MIRI’s May 2024 Newsletter

Harlan15 May 2024 0:13 UTC

79 points

1 comment3 min readLW link

(intelligence.org)

LessWrong Community Weekend 2024, open for applications

UnplannedCauliflower and jt

1 May 2024 10:18 UTC

79 points

2 comments7 min readLW link