All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025 2026

All Jan Feb Mar Apr May Jun Jul Aug Sep OctNovDec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Mysteries of mode collapse

janus8 Nov 2022 10:37 UTC

303 points

57 comments14 min readLW link 1 review

Tyranny of the Epistemic Majority

Scott Garrabrant22 Nov 2022 17:19 UTC

217 points

14 comments9 min readLW link 1 review

What it’s like to dissect a cadaver

Alok Singh10 Nov 2022 6:40 UTC

213 points

25 comments5 min readLW link

(alok.github.io)

I Converted Book I of The Sequences Into A Zoomer-Readable Format

dkirmani10 Nov 2022 2:59 UTC

201 points

32 comments2 min readLW link

The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable

beren and Sid Black

28 Nov 2022 12:54 UTC

200 points

34 comments31 min readLW link

Geometric Rationality is Not VNM Rational

Scott Garrabrant27 Nov 2022 19:36 UTC

198 points

29 comments3 min readLW link

Conjecture: a retrospective after 8 months of work

Connor Leahy, Sid Black, Gabriel Alfour and Chris Scammell

23 Nov 2022 17:10 UTC

180 points

9 comments8 min readLW link

The Geometric Expectation

Scott Garrabrant23 Nov 2022 18:05 UTC

179 points

22 comments4 min readLW link

Planes are still decades away from displacing most bird jobs

guzey25 Nov 2022 16:49 UTC

172 points

14 comments3 min readLW link

Here’s the exit.

Valentine21 Nov 2022 18:07 UTC

159 points

187 comments10 min readLW link 5 reviews

Geometric Exploration, Arithmetic Exploitation

Scott Garrabrant24 Nov 2022 15:36 UTC

142 points

5 comments7 min readLW link

Mechanistic anomaly detection and ELK

paulfchristiano25 Nov 2022 18:50 UTC

138 points

22 comments21 min readLW link

(ai-alignment.com)

AI will change the world, but won’t take it over by playing “3-dimensional chess”.

Boaz Barak and benedelman

22 Nov 2022 18:57 UTC

135 points

97 comments24 min readLW link

Sadly, FTX

Zvi17 Nov 2022 14:30 UTC

133 points

18 comments47 min readLW link

(thezvi.wordpress.com)

On the Diplomacy AI

Zvi28 Nov 2022 13:20 UTC

127 points

29 comments11 min readLW link

(thezvi.wordpress.com)

Clarifying AI X-risk

zac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar and Elliot Catt

1 Nov 2022 11:03 UTC

127 points

24 comments4 min readLW link 1 review

Utilitarianism Meets Egalitarianism

Scott Garrabrant21 Nov 2022 19:00 UTC

126 points

17 comments6 min readLW link 1 review

Speculation on Current Opportunities for Unusually High Impact in Global Health

johnswentworth11 Nov 2022 20:47 UTC

114 points

31 comments4 min readLW link

How could we know that an AGI system will have good consequences?

So8res7 Nov 2022 22:42 UTC

112 points

25 comments5 min readLW link

Applying superintelligence without collusion

Eric Drexler8 Nov 2022 18:08 UTC

110 points

63 comments4 min readLW link

What I Learned Running Refine

adamShimi24 Nov 2022 14:49 UTC

110 points

5 comments4 min readLW link

Instrumental convergence is what makes general intelligence possible

tailcalled11 Nov 2022 16:38 UTC

107 points

11 comments4 min readLW link

Caution when interpreting Deepmind’s In-context RL paper

Sam Marks1 Nov 2022 2:42 UTC

107 points

8 comments4 min readLW link

LW Beta Feature: Side-Comments

jimrandomh24 Nov 2022 1:55 UTC

104 points

47 comments1 min readLW link

LessWrong readers are invited to apply to the Lurkshop

Jonas V and GradientDissenter

22 Nov 2022 9:19 UTC

101 points

41 comments3 min readLW link

Instead of technical research, more people should focus on buying time

Orpheus16, Olive Branch and Thomas Larsen

5 Nov 2022 20:43 UTC

101 points

45 comments14 min readLW link

Searching for Search

Niki Dupuis and janus

28 Nov 2022 15:31 UTC

98 points

9 comments14 min readLW link 1 review

ARC paper: Formalizing the presumption of independence

Erik Jenner20 Nov 2022 1:22 UTC

97 points

2 comments2 min readLW link

(arxiv.org)

Trying to Make a Treacherous Mesa-Optimizer

MadHatter9 Nov 2022 18:07 UTC

95 points

14 comments4 min readLW link

(attentionspan.blog)

Meta AI announces Cicero: Human-Level Diplomacy play (with dialogue)

Jacy Reese Anthis22 Nov 2022 16:50 UTC

93 points

64 comments1 min readLW link

(www.science.org)

Conjecture Second Hiring Round

Connor Leahy, Sid Black, Gabriel Alfour and Chris Scammell

23 Nov 2022 17:11 UTC

92 points

0 comments1 min readLW link

By Default, GPTs Think In Plain Sight

Fabien Roger19 Nov 2022 19:15 UTC

90 points

36 comments9 min readLW link

When AI solves a game, focus on the game’s mechanics, not its theme.

Cleo Nardo23 Nov 2022 19:16 UTC

89 points

7 comments2 min readLW link

Current themes in mechanistic interpretability research

Lee Sharkey, Sid Black and beren

16 Nov 2022 14:14 UTC

89 points

2 comments12 min readLW link

Respecting your Local Preferences

Scott Garrabrant26 Nov 2022 19:04 UTC

84 points

1 comment4 min readLW link

Always know where your abstractions break

lsusr27 Nov 2022 6:32 UTC

83 points

6 comments2 min readLW link

Announcing the Progress Forum

jasoncrawford17 Nov 2022 19:26 UTC

83 points

9 comments1 min readLW link

Results from the interpretability hackathon

Esben Kran and Neel Nanda

17 Nov 2022 14:51 UTC

81 points

0 comments6 min readLW link

(alignmentjam.com)

Exams-Only Universities

Mati_Roy6 Nov 2022 22:05 UTC

80 points

40 comments2 min readLW link

Threat Model Literature Review

zac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar and Elliot Catt

1 Nov 2022 11:03 UTC

79 points

4 comments25 min readLW link

What is epigenetics?

Metacelsus6 Nov 2022 1:24 UTC

78 points

4 comments6 min readLW link

(denovo.substack.com)

Follow up to medical miracle

Elizabeth4 Nov 2022 18:00 UTC

78 points

5 comments6 min readLW link

(acesounderglass.com)

Elastic Productivity Tools

Simon Berens19 Nov 2022 21:59 UTC

76 points

8 comments2 min readLW link

(simonberens.me)

K-types vs T-types — what priors do you have?

Cleo Nardo3 Nov 2022 11:29 UTC

75 points

25 comments7 min readLW link

Disagreement with bio anchors that lead to shorter timelines

Marius Hobbhahn16 Nov 2022 14:40 UTC

75 points

17 comments7 min readLW link 1 review

Engineering Monosemanticity in Toy Models

Adam Jermyn, evhub and Nicholas Schiefer

18 Nov 2022 1:43 UTC

75 points

7 comments3 min readLW link

(arxiv.org)

Will we run out of ML data? Evidence from projecting dataset size trends

Pablo Villalobos14 Nov 2022 16:42 UTC

75 points

12 comments2 min readLW link

(epochai.org)

Announcing AI Alignment Awards: $100k research contests about goal misgeneralization & corrigibility

Orpheus16 and Olive Branch

22 Nov 2022 22:19 UTC

74 points

20 comments4 min readLW link

Against “Classic Style”

Cleo Nardo23 Nov 2022 22:10 UTC

74 points

31 comments4 min readLW link

Far-UVC Light Update: No, LEDs are not around the corner (tweetstorm)

Davidmanheim2 Nov 2022 12:57 UTC

74 points

29 comments4 min readLW link

(twitter.com)