All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

AllJan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Deep Honesty

AletheophileMay 7, 2024, 8:31 PM

159 points

25 comments9 min readLW link

Making every researcher seek grants is a broken model

jasoncrawfordJan 26, 2024, 4:06 PM

159 points

41 comments4 min readLW link

(rootsofprogress.org)

Current safety training techniques do not fully transfer to the agent setting

Simon Lermen and Govind Pimpale

Nov 3, 2024, 7:24 PM

158 points

9 comments5 min readLW link

What’s up with LLMs representing XORs of arbitrary features?

Sam MarksJan 3, 2024, 7:44 PM

158 points

63 comments16 min readLW link

Language Models Model Us

eggsyntaxMay 17, 2024, 9:00 PM

158 points

55 comments7 min readLW link

EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024

scasperMay 21, 2024, 8:15 PM

157 points

16 comments3 min readLW link

Ironing Out the Squiggles

Zack_M_DavisApr 29, 2024, 4:13 PM

157 points

36 comments11 min readLW link

[Question] things that confuse me about the current AI market.

DMMFAug 28, 2024, 1:46 PM

156 points

27 comments2 min readLW link

Apologizing is a Core Rationalist Skill

johnswentworthJan 2, 2024, 5:47 PM

156 points

42 comments5 min readLW link

If you weren’t such an idiot...

kave and Mark Xu

Mar 2, 2024, 12:01 AM

156 points

74 comments2 min readLW link

(markxu.com)

The Incredible Fentanyl-Detecting Machine

sarahconstantinJun 28, 2024, 10:10 PM

156 points

26 comments7 min readLW link

(sarahconstantin.substack.com)

Formal verification, heuristic explanations and surprise accounting

Jacob_HiltonJun 25, 2024, 3:40 PM

156 points

11 comments9 min readLW link

(www.alignment.org)

A Rocket–Interpretability Analogy

plexOct 21, 2024, 1:55 PM

155 points

31 comments1 min readLW link

“It’s a 10% chance which I did 10 times, so it should be 100%”

egor.timatkovNov 18, 2024, 1:14 AM

154 points

59 comments2 min readLW link

o3

Zach Stein-PerlmanDec 20, 2024, 6:30 PM

154 points

164 comments1 min readLW link

Dyslucksia

Shoshannah TekofskyMay 9, 2024, 7:21 PM

154 points

45 comments6 min readLW link

Subskills of “Listening to Wisdom”

RaemonDec 9, 2024, 3:01 AM

154 points

29 comments42 min readLW link

Liability regimes for AI

Ege ErdilAug 19, 2024, 1:25 AM

153 points

34 comments5 min readLW link

Deep atheism and AI risk

Joe CarlsmithJan 4, 2024, 6:58 PM

153 points

22 comments27 min readLW link

Decomposing Agency — capabilities without desires

owencb and Raymond D

Jul 11, 2024, 9:38 AM

153 points

32 comments12 min readLW link

(strangecities.substack.com)

OpenAI: Exodus

ZviMay 20, 2024, 1:10 PM

153 points

26 comments44 min readLW link

(thezvi.wordpress.com)

Arithmetic is an underrated world-modeling technology

dynomightOct 17, 2024, 2:00 PM

152 points

33 comments6 min readLW link

(dynomight.net)

“Alignment Faking” frame is somewhat fake

Jan_KulveitDec 20, 2024, 9:51 AM

152 points

13 comments6 min readLW link

Using axis lines for good or evil

dynomightMar 6, 2024, 2:47 PM

151 points

39 comments4 min readLW link

(dynomight.net)

Priors and Prejudice

MathiasKBApr 22, 2024, 3:00 PM

151 points

31 comments7 min readLW link

The Checklist: What Succeeding at AI Safety Will Involve

Sam BowmanSep 3, 2024, 6:18 PM

151 points

49 comments22 min readLW link

(sleepinyourhat.github.io)

My takes on SB-1047

leogaoSep 9, 2024, 6:38 PM

151 points

8 comments4 min readLW link

Daniel Dennett has died (1942-2024)

kaveApr 19, 2024, 4:17 PM

150 points

5 comments1 min readLW link

(dailynous.com)

2023 Survey Results

ScrewtapeFeb 16, 2024, 10:24 PM

150 points

26 comments44 min readLW link

Vernor Vinge, who coined the term “Technological Singularity”, dies at 79

Kaj_SotalaMar 21, 2024, 10:14 PM

149 points

25 comments1 min readLW link

(arstechnica.com)

On Devin

ZviMar 18, 2024, 1:20 PM

148 points

34 comments11 min readLW link

(thezvi.wordpress.com)

What good is G-factor if you’re dumped in the woods? A field report from a camp counselor.

HastingsJan 12, 2024, 1:17 PM

148 points

22 comments1 min readLW link

Some (problematic) aesthetics of what constitutes good work in academia

Steven ByrnesMar 11, 2024, 5:47 PM

148 points

12 comments12 min readLW link

Leading The Parade

johnswentworthJan 31, 2024, 10:39 PM

148 points

31 comments9 min readLW link

What o3 Becomes by 2028

Vladimir_NesovDec 22, 2024, 12:37 PM

147 points

15 comments5 min readLW link

0. CAST: Corrigibility as Singular Target

Max HarmsJun 7, 2024, 10:29 PM

147 points

17 comments8 min readLW link

Stanislav Petrov Quarterly Performance Review

Ricki HeicklenSep 26, 2024, 9:20 PM

147 points

3 comments5 min readLW link

(bayesshammai.substack.com)

OpenAI o1

Zach Stein-PerlmanSep 12, 2024, 5:30 PM

147 points

41 comments1 min readLW link

Repeal the Jones Act of 1920

ZviNov 27, 2024, 3:00 PM

146 points

24 comments39 min readLW link

(thezvi.wordpress.com)

LLMs for Alignment Research: a safety priority?

abramdemskiApr 4, 2024, 8:03 PM

145 points

24 comments11 min readLW link

The Information: OpenAI shows ‘Strawberry’ to feds, races to launch it

Martín SotoAug 27, 2024, 11:10 PM

145 points

15 comments3 min readLW link

When is a mind me?

Rob BensingerApr 17, 2024, 5:56 AM

144 points

130 comments15 min readLW link

Fields that I reference when thinking about AI takeover prevention

BuckAug 13, 2024, 11:08 PM

144 points

16 comments10 min readLW link

(redwoodresearch.substack.com)

Why Don’t We Just… Shoggoth+Face+Paraphraser?

Daniel Kokotajlo and abramdemski

Nov 19, 2024, 8:53 PM

144 points

58 comments14 min readLW link

That Alien Message—The Animation

WriterSep 7, 2024, 2:53 PM

144 points

10 comments8 min readLW link

(youtu.be)

Nursing doubts

dynomightAug 30, 2024, 2:25 AM

144 points

23 comments9 min readLW link

(dynomight.net)

China Hawks are Manufacturing an AI Arms Race

garrisonNov 20, 2024, 6:17 PM

144 points

44 comments LW link

(garrisonlovely.substack.com)

The “Think It Faster” Exercise

RaemonDec 11, 2024, 7:14 PM

144 points

35 comments13 min readLW link

Value Claims (In Particular) Are Usually Bullshit

johnswentworth30 May 2024 6:26 UTC

144 points

18 comments2 min readLW link

Momentum of Light in Glass

Ben9 Oct 2024 20:19 UTC

143 points

44 comments11 min readLW link