All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Let’s think about slowing down AI

KatjaGraceDec 22, 2022, 5:40 PM

552 points

182 comments38 min readLW link 3 reviews

(aiimpacts.org)

Staring into the abyss as a core life skill

benkuhnDec 22, 2022, 3:30 PM

357 points

22 comments12 min readLW link 1 review

(www.benkuhn.net)

Models Don’t “Get Reward”

Sam RingerDec 30, 2022, 10:37 AM

320 points

63 comments5 min readLW link 1 review

A challenge for AGI organizations, and a challenge for readers

Rob Bensinger and Eliezer Yudkowsky

Dec 1, 2022, 11:11 PM

302 points

33 comments2 min readLW link

Sazen

Duncan Sabien (Inactive)Dec 21, 2022, 7:54 AM

287 points

85 comments12 min readLW link 2 reviews

AI alignment is distinct from its near-term applications

paulfchristianoDec 13, 2022, 7:10 AM

255 points

21 comments2 min readLW link

(ai-alignment.com)

How “Discovering Latent Knowledge in Language Models Without Supervision” Fits Into a Broader Alignment Scheme

CollinDec 15, 2022, 6:22 PM

244 points

40 comments16 min readLW link 1 review

Jailbreaking ChatGPT on Release Day

ZviDec 2, 2022, 1:10 PM

242 points

77 comments6 min readLW link 1 review

(thezvi.wordpress.com)

The Plan − 2022 Update

johnswentworthDec 1, 2022, 8:43 PM

239 points

37 comments8 min readLW link 1 review

Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC, Adrià Garriga-alonso, Nicholas Goldowsky-Dill, ryan_greenblatt, jenny, Ansh Radhakrishnan, Buck and Nate Thomas

Dec 3, 2022, 12:58 AM

206 points

35 comments20 min readLW link 1 review

Finite Factored Sets in Pictures

Magdalena WacheDec 11, 2022, 6:49 PM

183 points

35 comments12 min readLW link

The next decades might be wild

Marius HobbhahnDec 15, 2022, 4:10 PM

175 points

42 comments41 min readLW link 1 review

What AI Safety Materials Do ML Researchers Find Compelling?

Vael Gates and Collin

Dec 28, 2022, 2:03 AM

175 points

34 comments2 min readLW link

Using GPT-Eliezer against ChatGPT Jailbreaking

Stuart_Armstrong and rgorman

Dec 6, 2022, 7:54 PM

170 points

85 comments9 min readLW link

Things that can kill you quickly: What everyone should know about first aid

jasoncrawfordDec 27, 2022, 4:23 PM

167 points

21 comments2 min readLW link 1 review

(jasoncrawford.org)

Logical induction for software engineers

Alex FlintDec 3, 2022, 7:55 PM

163 points

8 comments27 min readLW link 1 review

[Interim research report] Taking features out of superposition with sparse autoencoders

Lee Sharkey, Dan Braun and beren

Dec 13, 2022, 3:41 PM

152 points

23 comments22 min readLW link 2 reviews

Shard Theory in Nine Theses: a Distillation and Critical Appraisal

LawrenceCDec 19, 2022, 10:52 PM

150 points

30 comments18 min readLW link

A Year of AI Increasing AI Progress

TW123Dec 30, 2022, 2:09 AM

148 points

3 comments2 min readLW link

K-complexity is silly; use cross-entropy instead

So8resDec 20, 2022, 11:06 PM

147 points

54 comments14 min readLW link 2 reviews

Updating my AI timelines

Matthew BarnettDec 5, 2022, 8:46 PM

145 points

50 comments2 min readLW link

[Question] How to Convince my Son that Drugs are Bad

concerned_dadDec 17, 2022, 6:47 PM

144 points

89 comments2 min readLW link

Inner and outer alignment decompose one hard problem into two extremely hard problems

TurnTroutDec 2, 2022, 2:43 AM

141 points

23 comments47 min readLW link 3 reviews

Deconfusing Direct vs Amortised Optimization

berenDec 2, 2022, 11:30 AM

136 points

19 comments10 min readLW link

The case against AI alignment

andrew sauerDec 24, 2022, 6:57 AM

129 points

110 comments5 min readLW link

Re-Examining LayerNorm

Eric WinsorDec 1, 2022, 10:20 PM

127 points

12 comments5 min readLW link

Shared reality: a key driver of human behavior

kdbscottDec 24, 2022, 7:35 PM

126 points

25 comments4 min readLW link

Did ChatGPT just gaslight me?

TW123Dec 1, 2022, 5:41 AM

124 points

45 comments9 min readLW link

(aiwatchtower.substack.com)

[Question] Why The Focus on Expected Utility Maximisers?

DragonGodDec 27, 2022, 3:49 PM

118 points

84 comments3 min readLW link

Trying to disambiguate different questions about whether RLHF is “good”

BuckDec 14, 2022, 4:03 AM

108 points

47 comments7 min readLW link 1 review

200 Concrete Open Problems in Mechanistic Interpretability: Introduction

Neel NandaDec 28, 2022, 9:06 PM

106 points

0 comments10 min readLW link

But is it really in Rome? An investigation of the ROME model editing technique

jacquesthibsDec 30, 2022, 2:40 AM

105 points

2 comments18 min readLW link

Language models are nearly AGIs but we don’t notice it because we keep shifting the bar

philosophybearDec 30, 2022, 5:15 AM

105 points

13 comments7 min readLW link

Finding gliders in the game of life

paulfchristianoDec 1, 2022, 8:40 PM

104 points

8 comments16 min readLW link

(ai-alignment.com)

Slightly against aligning with neo-luddites

Matthew BarnettDec 26, 2022, 10:46 PM

104 points

31 comments4 min readLW link

[Linkpost] The Story Of VaccinateCA

hathDec 9, 2022, 11:54 PM

103 points

4 comments10 min readLW link

(www.worksinprogress.co)

Applied Linear Algebra Lecture Series

johnswentworthDec 22, 2022, 6:57 AM

103 points

8 comments1 min readLW link

Thoughts on AGI organizations and capabilities work

Rob Bensinger and So8res

Dec 7, 2022, 7:46 PM

102 points

17 comments5 min readLW link

Discovering Language Model Behaviors with Model-Written Evaluations

evhub and Ethan Perez

Dec 20, 2022, 8:08 PM

100 points

34 comments1 min readLW link

(www.anthropic.com)

Bad at Arithmetic, Promising at Math

cohenmacaulayDec 18, 2022, 5:40 AM

100 points

19 comments20 min readLW link 1 review

[Link] Why I’m optimistic about OpenAI’s alignment approach

janleikeDec 5, 2022, 10:51 PM

98 points

15 comments1 min readLW link

(aligned.substack.com)

You can still fetch the coffee today if you’re dead tomorrow

davidadDec 9, 2022, 2:06 PM

96 points

19 comments5 min readLW link

Towards Hodge-podge Alignment

Cleo NardoDec 19, 2022, 8:12 PM

95 points

30 comments9 min readLW link

The LessWrong 2021 Review: Intellectual Circle Expansion

Ruby and Raemon

Dec 1, 2022, 9:17 PM

95 points

55 comments8 min readLW link

Revisiting algorithmic progress

Tamay and Ege Erdil

Dec 13, 2022, 1:39 AM

95 points

15 comments2 min readLW link 1 review

(arxiv.org)

A Comprehensive Mechanistic Interpretability Explainer & Glossary

Neel NandaDec 21, 2022, 12:35 PM

91 points

6 comments2 min readLW link

(neelnanda.io)

Can we efficiently distinguish different mechanisms?

paulfchristianoDec 27, 2022, 12:20 AM

91 points

30 comments16 min readLW link

(ai-alignment.com)

Setting the Zero Point

Duncan Sabien (Inactive)Dec 9, 2022, 6:06 AM

91 points

43 comments20 min readLW link 1 review

Local Memes Against Geometric Rationality

Scott GarrabrantDec 21, 2022, 3:53 AM

90 points

3 comments6 min readLW link

Consider using reversible automata for alignment research

Alex_AltairDec 11, 2022, 1:00 AM

88 points

30 comments2 min readLW link

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer