All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025 2026

All JanFebMar Apr May Jun Jul Aug Sep Oct Nov Dec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Scale Was All We Needed, At First

GMM14 Feb 2024 1:49 UTC

298 points

35 comments8 min readLW link

(aiacumen.substack.com)

“No-one in my org puts money in their pension”

Tobes16 Feb 2024 18:33 UTC

290 points

17 comments9 min readLW link 1 review

(seekingtobejolly.substack.com)

Raising children on the eve of AI

juliawise15 Feb 2024 21:28 UTC

288 points

48 comments5 min readLW link 1 review

Believing In

AnnaSalamon8 Feb 2024 7:06 UTC

272 points

59 comments13 min readLW link 4 reviews

Brute Force Manufactured Consensus is Hiding the Crime of the Century

Roko3 Feb 2024 20:36 UTC

223 points

157 comments9 min readLW link

CFAR Takeaways: Andrew Critch

Raemon14 Feb 2024 1:37 UTC

222 points

64 comments5 min readLW link

Sam Altman’s Chip Ambitions Undercut OpenAI’s Safety Strategy

garrison10 Feb 2024 19:52 UTC

198 points

53 comments3 min readLW link 1 review

(garrisonlovely.substack.com)

Contra Ngo et al. “Every ‘Every Bay Area House Party’ Bay Area House Party”

Ricki Heicklen22 Feb 2024 23:56 UTC

191 points

5 comments4 min readLW link

(bayesshammai.substack.com)

Every “Every Bay Area House Party” Bay Area House Party

Richard_Ngo16 Feb 2024 18:53 UTC

191 points

6 comments4 min readLW link

Masterpiece

Richard_Ngo13 Feb 2024 23:10 UTC

178 points

22 comments4 min readLW link 1 review

(www.narrativeark.xyz)

And All the Shoggoths Merely Players

Zack_M_Davis10 Feb 2024 19:56 UTC

178 points

59 comments12 min readLW link 2 reviews

Timaeus’s First Four Months

Jesse Hoogland, Daniel Murfet, Stan van Wingerden and Alexander Gietelink Oldenziel

28 Feb 2024 17:01 UTC

173 points

6 comments6 min readLW link

2023 Survey Results

Screwtape16 Feb 2024 22:24 UTC

151 points

26 comments44 min readLW link

Updatelessness doesn’t solve most problems

Martín Soto8 Feb 2024 17:30 UTC

142 points

45 comments12 min readLW link

The Pareto Best and the Curse of Doom

Screwtape21 Feb 2024 23:10 UTC

132 points

22 comments9 min readLW link 1 review

Things I’ve Grieved

Raemon18 Feb 2024 19:32 UTC

125 points

6 comments2 min readLW link

Rationality Research Report: Towards 10x OODA Looping?

Raemon24 Feb 2024 21:06 UTC

118 points

26 comments15 min readLW link

Attitudes about Applied Rationality

Camille B. 3 Feb 2024 14:42 UTC

113 points

19 comments5 min readLW link 1 review

Dreams of AI alignment: The danger of suggestive names

TurnTrout10 Feb 2024 1:22 UTC

109 points

59 comments4 min readLW link

Skills I’d like my collaborators to have

Raemon9 Feb 2024 8:20 UTC

108 points

9 comments8 min readLW link

Lsusr’s Rationality Dojo

lsusr13 Feb 2024 5:52 UTC

108 points

19 comments2 min readLW link

New LessWrong review winner UI (“The LeastWrong” section and full-art post pages)

kave28 Feb 2024 2:42 UTC

107 points

65 comments1 min readLW link

A Chess-GPT Linear Emergent World Representation

Adam Karvonen8 Feb 2024 4:25 UTC

106 points

14 comments7 min readLW link

(adamkarvonen.github.io)

More Hyphenation

Arjun Panickssery7 Feb 2024 19:43 UTC

106 points

22 comments1 min readLW link 1 review

(arjunpanickssery.substack.com)

Ideological Bayesians

Kevin Dorst25 Feb 2024 14:17 UTC

105 points

5 comments10 min readLW link

(kevindorst.substack.com)

Things You’re Allowed to Do: University Edition

Saul Munn6 Feb 2024 0:36 UTC

103 points

13 comments5 min readLW link

(www.brasstacks.blog)

Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small

Joseph Bloom2 Feb 2024 6:54 UTC

103 points

37 comments15 min readLW link

Counting arguments provide no evidence for AI doom

Nora Belrose and Quintin Pope

27 Feb 2024 23:03 UTC

99 points

200 comments14 min readLW link 2 reviews

My cover story in Jacobin on AI capitalism and the x-risk debates

garrison12 Feb 2024 23:34 UTC

98 points

5 comments6 min readLW link

(jacobin.com)

Announcing the London Initiative for Safe AI (LISA)

James Fox, mike_safeAI and Ryan Kidd

2 Feb 2024 23:17 UTC

98 points

0 comments9 min readLW link

OpenAI’s Sora is an agent

Caleb Biddulph16 Feb 2024 7:35 UTC

98 points

25 comments4 min readLW link

Everything Wrong with Roko’s Claims about an Engineered Pandemic

WitheringWeights22 Feb 2024 15:59 UTC

97 points

11 comments16 min readLW link

How well do truth probes generalise?

mishajw24 Feb 2024 14:12 UTC

96 points

11 comments9 min readLW link

How to train your own “Sleeper Agents”

evhub7 Feb 2024 0:31 UTC

94 points

11 comments2 min readLW link

story-based decision-making

bhauth7 Feb 2024 2:35 UTC

90 points

11 comments4 min readLW link

Debating with More Persuasive LLMs Leads to More Truthful Answers

Akbir Khan, John Hughes, Dan Valentine, Sam Bowman and Ethan Perez

7 Feb 2024 21:28 UTC

89 points

14 comments9 min readLW link

(arxiv.org)

Addressing Feature Suppression in SAEs

Benjamin Wright and Lee Sharkey

16 Feb 2024 18:32 UTC

88 points

5 comments10 min readLW link

Preventing model exfiltration with upload limits

ryan_greenblatt6 Feb 2024 16:29 UTC

83 points

24 comments14 min readLW link 1 review

Wrong answer bias

lemonhope1 Feb 2024 20:05 UTC

83 points

23 comments1 min readLW link

AI #51: Altman’s Ambition

Zvi20 Feb 2024 19:50 UTC

83 points

5 comments38 min readLW link

(thezvi.wordpress.com)

Retirement Accounts and Short Timelines

jefftk19 Feb 2024 18:50 UTC

83 points

35 comments2 min readLW link

(www.jefftk.com)

The Gemini Incident

Zvi22 Feb 2024 21:00 UTC

80 points

19 comments18 min readLW link

(thezvi.wordpress.com)

Attention SAEs Scale to GPT-2 Small

Connor Kissane, robertzk, Arthur Conmy and Neel Nanda

3 Feb 2024 6:50 UTC

78 points

4 comments8 min readLW link

Analogies between scaling labs and misaligned superintelligent AI

scasper21 Feb 2024 19:29 UTC

77 points

5 comments4 min readLW link

Do sparse autoencoders find “true features”?

Demian Till22 Feb 2024 18:06 UTC

76 points

33 comments11 min readLW link

Self-Awareness: Taxonomy and eval suite proposal

Daniel Kokotajlo17 Feb 2024 1:47 UTC

76 points

2 comments11 min readLW link

Implementing activation steering

Annah5 Feb 2024 17:51 UTC

76 points

8 comments7 min readLW link

Managing risks while trying to do good

Wei Dai1 Feb 2024 18:08 UTC

76 points

28 comments2 min readLW link

My guess at Conjecture’s vision: triggering a narrative bifurcation

Alexandre Variengien6 Feb 2024 19:10 UTC

75 points

12 comments16 min readLW link

The One and a Half Gemini

Zvi22 Feb 2024 13:10 UTC

73 points

4 comments8 min readLW link

(thezvi.wordpress.com)