All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May Jun Jul Aug SepOctNov Dec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Hospitalization: A Review

Logan Riggs9 Oct 2025 14:36 UTC

365 points

21 comments9 min readLW link

Towards a Typology of Strange LLM Chains-of-Thought

1a3orn9 Oct 2025 22:02 UTC

305 points

29 comments9 min readLW link

I take antidepressants. You’re welcome

Elizabeth9 Oct 2025 19:30 UTC

265 points

11 comments3 min readLW link

(acesounderglass.com)

Consider donating to Alex Bores, author of the RAISE Act

Eric Neyman20 Oct 2025 14:50 UTC

260 points

20 comments18 min readLW link

(ericneyman.wordpress.com)

On Fleshling Safety: A Debate by Klurl and Trapaucius.

Eliezer Yudkowsky26 Oct 2025 23:44 UTC

257 points

52 comments79 min readLW link

The Most Common Bad Argument In These Parts

J Bostock11 Oct 2025 16:29 UTC

247 points

62 comments4 min readLW link

EU explained in 10 minutes

Martin Sustrik21 Oct 2025 4:40 UTC

244 points

51 comments8 min readLW link

(www.250bpm.com)

Omelas Is Perfectly Misread

Tobias H2 Oct 2025 23:11 UTC

221 points

59 comments5 min readLW link

The Memetics of AI Successionism

Jan_Kulveit28 Oct 2025 15:04 UTC

214 points

54 comments9 min readLW link

If Anyone Builds It Everyone Dies, a semi-outsider review

dvd13 Oct 2025 22:10 UTC

214 points

67 comments15 min readLW link

Do One New Thing A Day To Solve Your Problems

Algon3 Oct 2025 17:08 UTC

211 points

28 comments2 min readLW link

The Doomers Were Right

Algon22 Oct 2025 22:18 UTC

208 points

26 comments3 min readLW link

The Origami Men

Tomás B.6 Oct 2025 15:25 UTC

192 points

14 comments16 min readLW link

That Mad Olympiad

Tomás B.15 Oct 2025 13:45 UTC

189 points

15 comments14 min readLW link

The “Length” of “Horizons”

Adam Scholl14 Oct 2025 14:48 UTC

186 points

27 comments7 min readLW link

An Opinionated Guide to Privacy Despite Authoritarianism

TurnTrout29 Oct 2025 20:32 UTC

180 points

29 comments4 min readLW link

(turntrout.com)

Inoculation prompting: Instructing models to misbehave at train-time can improve run-time behavior

Sam Marks, Nevan Wichers, Daniel Tan, Aram Ebtekar, Jozdien, David Africa, Alex Mallen and Fabien Roger

8 Oct 2025 22:02 UTC

171 points

37 comments2 min readLW link

Don’t Mock Yourself

Algon12 Oct 2025 22:40 UTC

166 points

18 comments2 min readLW link

AIs should also refuse to work on capabilities research

Davidmanheim27 Oct 2025 8:42 UTC

165 points

20 comments3 min readLW link

Humanity Learned Almost Nothing From COVID-19

niplav19 Oct 2025 21:24 UTC

163 points

38 comments4 min readLW link

Meditation is dangerous

Algon17 Oct 2025 22:52 UTC

156 points

40 comments4 min readLW link

Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most “classic humans” in a few decades.

Raemon2 Oct 2025 21:03 UTC

153 points

19 comments12 min readLW link

Realistic Reward Hacking Induces Different and Deeper Misalignment

Jozdien9 Oct 2025 18:45 UTC

146 points

2 comments23 min readLW link

Cheap Labour Everywhere

Morpheus16 Oct 2025 13:15 UTC

145 points

34 comments2 min readLW link

Sonnet 4.5′s eval gaming seriously undermines alignment evals, and this seems caused by training on alignment evals

Alexa Pan and ryan_greenblatt

30 Oct 2025 15:34 UTC

144 points

21 comments14 min readLW link

Which side of the AI safety community are you in?

Max Tegmark22 Oct 2025 21:17 UTC

141 points

88 comments2 min readLW link

Recontextualization Mitigates Specification Gaming Without Modifying the Specification

ariana_azarbal, Victor Gillioz, TurnTrout and cloud

14 Oct 2025 0:53 UTC

141 points

15 comments11 min readLW link

The main way I’ve seen people turn ideologically crazy [Linkpost]

Noosphere8923 Oct 2025 20:09 UTC

135 points

22 comments8 min readLW link

(andymasley.substack.com)

Consider donating to AI safety champion Scott Wiener

Eric Neyman22 Oct 2025 18:40 UTC

133 points

9 comments18 min readLW link

(ericneyman.wordpress.com)

How Well Does RL Scale?

Toby_Ord22 Oct 2025 13:16 UTC

132 points

23 comments7 min readLW link

(www.tobyord.com)

Plans A, B, C, and D for misalignment risk

ryan_greenblatt8 Oct 2025 17:18 UTC

131 points

75 comments6 min readLW link

Emergent Introspective Awareness in Large Language Models

Drake Thomas30 Oct 2025 4:42 UTC

130 points

19 comments1 min readLW link

(transformer-circuits.pub)

Cancer has a surprising amount of detail

Abhishaike Mahajan26 Oct 2025 20:33 UTC

128 points

18 comments11 min readLW link

(www.owlposting.com)

Checking in on AI-2027

Baybar2 Oct 2025 18:46 UTC

128 points

22 comments4 min readLW link

Gradual Disempowerment Monthly Roundup

Raymond Douglas6 Oct 2025 15:36 UTC

120 points

9 comments6 min readLW link

Give Me Your Data: The Rationalist Mind Meld

Taylor G. Lunt19 Oct 2025 2:25 UTC

116 points

14 comments4 min readLW link

Musings on Reported Cost of Compute (Oct 2025)

Vladimir_Nesov24 Oct 2025 20:42 UTC

105 points

11 comments2 min readLW link

LLM robots can’t pass butter (and they are having an existential crisis about it)

Lukas Petersson28 Oct 2025 14:14 UTC

105 points

7 comments4 min readLW link

OpenAI #15: More on OpenAI’s Paranoid Lawfare Against Advocates of SB 53

Zvi13 Oct 2025 15:00 UTC

104 points

2 comments23 min readLW link

(thezvi.wordpress.com)

You Should Get a Reusable Mask

jefftk8 Oct 2025 2:40 UTC

103 points

28 comments1 min readLW link

(www.jefftk.com)

Where does Sonnet 4.5′s desire to “not get too comfortable” come from?

Kaj_Sotala4 Oct 2025 10:19 UTC

103 points

24 comments64 min readLW link

Considerations around career costs of political donations

GradientDissenter20 Oct 2025 12:51 UTC

97 points

17 comments15 min readLW link

The Thinking Machines Tinker API is good news for AI control and security

Buck9 Oct 2025 15:22 UTC

92 points

10 comments6 min readLW link

Is 90% of code at Anthropic being written by AIs?

ryan_greenblatt22 Oct 2025 14:50 UTC

92 points

14 comments5 min readLW link

Bending The Curve

Zvi7 Oct 2025 20:00 UTC

91 points

12 comments21 min readLW link

(thezvi.wordpress.com)

Learning to Interpret Weight Differences in Language Models

avichal23 Oct 2025 3:55 UTC

90 points

3 comments5 min readLW link

(arxiv.org)

Making Your Pain Worse can Get You What You Want

Logan Riggs5 Oct 2025 0:19 UTC

87 points

5 comments3 min readLW link

Reasons to sign a statement to ban superintelligence (+ FAQ for those on the fence)

Mateusz Bagiński and Ishual

13 Oct 2025 19:00 UTC

83 points

4 comments13 min readLW link

The Biochemical Beauty of Retatrutide: How GLP-1s Actually Work

Elizabeth14 Oct 2025 16:00 UTC

82 points

3 comments7 min readLW link

(acesounderglass.com)

How AI Manipulates—A Case Study

Adele Lopez14 Oct 2025 0:54 UTC

82 points

27 comments13 min readLW link