All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

AllJanFeb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

The Case Against AI Control Research

johnswentworth21 Jan 2025 16:03 UTC

372 points

84 comments6 min readLW link

What’s the short timeline plan?

Marius Hobbhahn2 Jan 2025 14:59 UTC

361 points

51 comments23 min readLW link

The Gentle Romance

Richard_Ngo19 Jan 2025 18:29 UTC

244 points

46 comments15 min readLW link

(www.asimov.press)

“Sharp Left Turn” discourse: An opinionated review

Steven Byrnes28 Jan 2025 18:47 UTC

220 points

31 comments31 min readLW link

Mechanisms too simple for humans to design

Malmesbury22 Jan 2025 16:54 UTC

212 points

45 comments15 min readLW link

Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals

johnswentworth and David Lorell

24 Jan 2025 20:20 UTC

186 points

61 comments5 min readLW link

What Is The Alignment Problem?

johnswentworth16 Jan 2025 1:20 UTC

181 points

49 comments25 min readLW link

How will we update about scheming?

ryan_greenblatt6 Jan 2025 20:21 UTC

176 points

21 comments37 min readLW link

Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development

Jan_Kulveit, Raymond Douglas, Nora_Ammann, Deger Turan, David Scott Krueger (formerly: capybaralet) and David Duvenaud

30 Jan 2025 17:03 UTC

167 points

65 comments2 min readLW link

(gradual-disempowerment.ai)

Don’t ignore bad vibes you get from people

Kaj_Sotala18 Jan 2025 9:20 UTC

164 points

52 comments2 min readLW link

(kajsotala.fi)

[Fiction] [Comic] Effective Altruism and Rationality meet at a Secular Solstice afterparty

tandem7 Jan 2025 19:11 UTC

163 points

9 comments1 min readLW link

Capital Ownership Will Not Prevent Human Disempowerment

beren5 Jan 2025 6:00 UTC

162 points

20 comments14 min readLW link

Maximizing Communication, not Traffic

jefftk5 Jan 2025 13:00 UTC

161 points

10 comments1 min readLW link

(www.jefftk.com)

Applying traditional economic thinking to AGI: a trilemma

Steven Byrnes13 Jan 2025 1:23 UTC

153 points

32 comments3 min readLW link

Activation space interpretability may be doomed

bilalchughtai and Lucius Bushnaq

8 Jan 2025 12:49 UTC

152 points

34 comments8 min readLW link

OpenAI #10: Reflections

Zvi7 Jan 2025 17:00 UTC

149 points

7 comments11 min readLW link

(thezvi.wordpress.com)

Quotes from the Stargate press conference

Nikola Jurkovic22 Jan 2025 0:50 UTC

149 points

7 comments1 min readLW link

(www.c-span.org)

Human takeover might be worse than AI takeover

Tom Davidson10 Jan 2025 16:53 UTC

147 points

56 comments8 min readLW link

(forethoughtnewsletter.substack.com)

AI companies are unlikely to make high-assurance safety cases if timelines are short

ryan_greenblatt23 Jan 2025 18:41 UTC

145 points

5 comments13 min readLW link

Anomalous Tokens in DeepSeek-V3 and r1

henry25 Jan 2025 22:55 UTC

144 points

3 comments7 min readLW link

Planning for Extreme AI Risks

joshc29 Jan 2025 18:33 UTC

143 points

5 comments16 min readLW link

The Intelligence Curse

lukedrago3 Jan 2025 19:07 UTC

142 points

27 comments18 min readLW link

(lukedrago.substack.com)

What Indicators Should We Watch to Disambiguate AGI Timelines?

snewman6 Jan 2025 19:57 UTC

142 points

57 comments13 min readLW link

Ten people on the inside

Buck28 Jan 2025 16:41 UTC

139 points

28 comments4 min readLW link

Tell me about yourself: LLMs are aware of their learned behaviors

Martín Soto and Owain_Evans

22 Jan 2025 0:47 UTC

132 points

5 comments6 min readLW link

Building AI Research Fleets

Ben Goldhaber and Jesse Hoogland

12 Jan 2025 18:23 UTC

132 points

11 comments5 min readLW link

Training on Documents About Reward Hacking Induces Reward Hacking

evhub and Nathan Hu

21 Jan 2025 21:32 UTC

131 points

15 comments2 min readLW link

(alignment.anthropic.com)

Parkinson’s Law and the Ideology of Statistics

Benquo4 Jan 2025 15:49 UTC

130 points

7 comments8 min readLW link

(benjaminrosshoffman.com)

2024 in AI predictions

jessicata1 Jan 2025 20:29 UTC

125 points

3 comments8 min readLW link

The Game Board has been Flipped: Now is a good time to rethink what you’re doing

LintzA28 Jan 2025 23:36 UTC

118 points

30 comments13 min readLW link

My supervillain origin story

Dmitry Vaintrob27 Jan 2025 12:20 UTC

112 points

2 comments5 min readLW link

How do you deal w/ Super Stimuli?

Logan Riggs14 Jan 2025 15:14 UTC

112 points

25 comments3 min readLW link

Fake thinking and real thinking

Joe Carlsmith28 Jan 2025 20:05 UTC

111 points

17 comments38 min readLW link

Aristocracy and Hostage Capital

Arjun Panickssery8 Jan 2025 19:38 UTC

108 points

7 comments3 min readLW link

(arjunpanickssery.substack.com)

Attribution-based parameter decomposition

Lucius Bushnaq, Dan Braun, StefanHex, jake_mendel and Lee Sharkey

25 Jan 2025 13:12 UTC

108 points

21 comments4 min readLW link

(publications.apolloresearch.ai)

Comment on “Death and the Gorgon”

Zack_M_Davis1 Jan 2025 5:47 UTC

106 points

35 comments8 min readLW link

Reasons for and against working on technical AI safety at a frontier AI lab

bilalchughtai5 Jan 2025 14:49 UTC

100 points

12 comments12 min readLW link

The purposeful drunkard

Dmitry Vaintrob12 Jan 2025 12:27 UTC

98 points

13 comments6 min readLW link

The Rising Sea

Jesse Hoogland25 Jan 2025 20:48 UTC

97 points

6 comments2 min readLW link

Tips On Empirical Research Slides

James Chua, John Hughes, Ethan Perez and Owain_Evans

8 Jan 2025 5:06 UTC

97 points

4 comments6 min readLW link

We probably won’t just play status games with each other after AGI

Matthew Barnett15 Jan 2025 4:56 UTC

97 points

21 comments4 min readLW link

Implications of the inference scaling paradigm for AI safety

Ryan Kidd14 Jan 2025 2:14 UTC

96 points

70 comments5 min readLW link

Tips and Code for Empirical Research Workflows

John Hughes and Ethan Perez

20 Jan 2025 22:31 UTC

96 points

15 comments20 min readLW link

On Eating the Sun

jessicata8 Jan 2025 4:57 UTC

96 points

99 comments3 min readLW link

(unstablerontology.substack.com)

The subset parity learning problem: much more than you wanted to know

Dmitry Vaintrob3 Jan 2025 9:13 UTC

95 points

18 comments11 min readLW link

Heritability: Five Battles

Steven Byrnes14 Jan 2025 18:21 UTC

94 points

23 comments60 min readLW link

Five Recent AI Tutoring Studies

Arjun Panickssery19 Jan 2025 3:53 UTC

94 points

0 comments2 min readLW link

(arjunpanickssery.substack.com)

Introducing Squiggle AI

ozziegooen3 Jan 2025 17:53 UTC

92 points

15 comments8 min readLW link

Six Thoughts on AI Safety

boazbarak24 Jan 2025 22:20 UTC

92 points

55 comments15 min readLW link

The Manhattan Trap: Why a Race to Artificial Superintelligence is Self-Defeating

Corin Katzke and GideonF

21 Jan 2025 16:57 UTC

91 points

11 comments2 min readLW link

(www.convergenceanalysis.org)