All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All Jan Feb Mar Apr May Jun JulAugSep Oct Nov Dec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Against Almost Every Theory of Impact of Interpretability

Charbel-RaphaëlAug 17, 2023, 6:44 PM

331 points

91 comments26 min readLW link 2 reviews

Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research

evhub, Nicholas Schiefer, Carson Denison and Ethan Perez

Aug 8, 2023, 1:30 AM

319 points

30 comments18 min readLW link 1 review

Dear Self; we need to talk about ambition

ElizabethAug 27, 2023, 11:10 PM

270 points

28 comments8 min readLW link 2 reviews

(acesounderglass.com)

My current LK99 questions

Eliezer YudkowskyAug 1, 2023, 10:48 PM

206 points

38 comments5 min readLW link

Feedbackloop-first Rationality

RaemonAug 7, 2023, 5:58 PM

205 points

69 comments8 min readLW link 2 reviews

Large Language Models will be Great for Censorship

Ethan EdwardsAug 21, 2023, 7:03 PM

185 points

14 comments8 min readLW link

(ethanedwards.substack.com)

OpenAI API base models are not sycophantic, at any size

nostalgebraistAug 29, 2023, 12:58 AM

183 points

20 comments2 min readLW link

(colab.research.google.com)

A list of core AI safety problems and how I hope to solve them

davidadAug 26, 2023, 3:12 PM

165 points

29 comments5 min readLW link

Password-locked models: a stress case for capabilities evaluation

Fabien RogerAug 3, 2023, 2:53 PM

156 points

14 comments6 min readLW link

Assume Bad Faith

Zack_M_DavisAug 25, 2023, 5:36 PM

156 points

63 comments7 min readLW link 3 reviews

ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks

Beth BarnesAug 1, 2023, 6:30 PM

153 points

12 comments5 min readLW link

(evals.alignment.org)

The U.S. is becoming less stable

lcAug 18, 2023, 9:13 PM

150 points

68 comments2 min readLW link

6 non-obvious mental health issues specific to AI safety

Igor IvanovAug 18, 2023, 3:46 PM

147 points

24 comments4 min readLW link

The “public debate” about AI is confusing for the general public and for policymakers because it is a three-sided debate

Adam David LongAug 1, 2023, 12:08 AM

146 points

30 comments4 min readLW link

Responses to apparent rationalist confusions about game / decision theory

Anthony DiGiovanniAug 30, 2023, 10:02 PM

142 points

20 comments12 min readLW link 1 review

Ten Thousand Years of Solitude

agpAug 15, 2023, 5:45 PM

137 points

19 comments4 min readLW link

(www.discovermagazine.com)

Inflection.ai is a major AGI lab

Nikola JurkovicAug 9, 2023, 1:05 AM

137 points

13 comments2 min readLW link

Invulnerable Incomplete Preferences: A Formal Statement

SCPAug 30, 2023, 9:59 PM

136 points

39 comments35 min readLW link

Book Launch: “The Carving of Reality,” Best of LessWrong vol. III

RaemonAug 16, 2023, 11:52 PM

131 points

22 comments5 min readLW link

When discussing AI risks, talk about capabilities, not intelligence

VikaAug 11, 2023, 1:38 PM

124 points

7 comments3 min readLW link

(vkrakovna.wordpress.com)

Introducing the Center for AI Policy (& we’re hiring!)

Thomas LarsenAug 28, 2023, 9:17 PM

123 points

50 comments2 min readLW link

(www.aipolicy.us)

Report on Frontier Model Training

YafahEdelmanAug 30, 2023, 8:02 PM

122 points

21 comments21 min readLW link

(docs.google.com)

Summary of and Thoughts on the Hotz/Yudkowsky Debate

ZviAug 16, 2023, 4:50 PM

106 points

47 comments9 min readLW link

(thezvi.wordpress.com)

A Theory of Laughter

Steven ByrnesAug 23, 2023, 3:05 PM

104 points

14 comments28 min readLW link

Biosecurity Culture, Computer Security Culture

jefftkAug 30, 2023, 4:40 PM

103 points

11 comments2 min readLW link

(www.jefftk.com)

[Question] Exercise: Solve “Thinking Physics”

RaemonAug 1, 2023, 12:44 AM

102 points

30 comments5 min readLW link 1 review

What’s A “Market”?

johnswentworthAug 8, 2023, 11:29 PM

94 points

16 comments10 min readLW link

Biological Anchors: The Trick that Might or Might Not Work

Scott AlexanderAug 12, 2023, 12:53 AM

91 points

3 comments33 min readLW link

(astralcodexten.substack.com)

LTFF and EAIF are unusually funding-constrained right now

Linch and calebp99

Aug 30, 2023, 1:03 AM

90 points

24 comments15 min readLW link

(forum.effectivealtruism.org)

We Should Prepare for a Larger Representation of Academia in AI Safety

Leon LangAug 13, 2023, 6:03 PM

90 points

14 comments5 min readLW link

Problems with Robin Hanson’s Quillette Article On AI

DaemonicSigilAug 6, 2023, 10:13 PM

89 points

33 comments8 min readLW link

My checklist for publishing a blog post

Steven ByrnesAug 15, 2023, 3:04 PM

88 points

6 comments3 min readLW link

Dating Roundup #1: This is Why You’re Single

ZviAug 29, 2023, 12:50 PM

87 points

28 comments38 min readLW link

(thezvi.wordpress.com)

Decomposing independent generalizations in neural networks via Hessian analysis

Dmitry Vaintrob and Nina Panickssery

Aug 14, 2023, 5:04 PM

84 points

4 comments1 min readLW link

The Low-Hanging Fruit Prior and sloped valleys in the loss landscape

Dmitry Vaintrob and Nina Panickssery

Aug 23, 2023, 9:12 PM

82 points

1 comment13 min readLW link

Stepping down as moderator on LW

Kaj_SotalaAug 14, 2023, 10:46 AM

82 points

1 comment1 min readLW link

Long-Term Future Fund: April 2023 grant recommendations

abergal, calebp99, Linch, habryka, Thomas Larsen and Vaniver

Aug 2, 2023, 7:54 AM

81 points

3 comments50 min readLW link

The God of Humanity, and the God of the Robot Utilitarians

RaemonAug 24, 2023, 8:27 AM

80 points

13 comments2 min readLW link 1 review

Digital brains beat biological ones because diffusion is too slow

GeneSmithAug 26, 2023, 2:22 AM

78 points

21 comments5 min readLW link

An Interpretability Illusion for Activation Patching of Arbitrary Subspaces

Georg Lange, Alex Makelov and Neel Nanda

Aug 29, 2023, 1:04 AM

77 points

4 comments1 min readLW link

The Economics of the Asteroid Deflection Problem (Dominant Assurance Contracts)

moyamoAug 29, 2023, 6:28 PM

77 points

71 comments15 min readLW link

A Proof of Löb’s Theorem using Computability Theory

jessicataAug 16, 2023, 6:57 PM

76 points

0 comments17 min readLW link

(unstableontology.com)

Computational Thread Art

CallumMcDougallAug 6, 2023, 9:42 PM

76 points

2 comments6 min readLW link

A plea for more funding shortfall transparency

porbyAug 7, 2023, 9:33 PM

73 points

4 comments2 min readLW link

AI Forecasting: Two Years In

jsteinhardtAug 19, 2023, 11:40 PM

72 points

15 comments11 min readLW link

(bounded-regret.ghost.io)

AI pause/governance advocacy might be net-negative, especially without a focus on explaining x-risk

Mikhail SaminAug 27, 2023, 11:05 PM

72 points

9 comments6 min readLW link

Aumann-agreement is common

tailcalledAug 26, 2023, 8:22 PM

72 points

33 comments7 min readLW link 1 review

When Omnipotence is Not Enough

lsusrAug 25, 2023, 7:50 PM

71 points

4 comments2 min readLW link 1 review

Modulating sycophancy in an RLHF model via activation steering

Nina PanicksseryAug 9, 2023, 7:06 AM

69 points

20 comments12 min readLW link

Red-teaming language models via activation engineering

Nina PanicksseryAug 26, 2023, 5:52 AM

69 points

6 comments9 min readLW link

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer