All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025 2026

All Jan FebMarApr May Jun Jul Aug Sep Oct Nov Dec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

The Waluigi Effect (mega-post)

Cleo Nardo3 Mar 2023 3:22 UTC

648 points

188 comments16 min readLW link

My Objections to “We’re All Gonna Die with Eliezer Yudkowsky”

Quintin Pope21 Mar 2023 0:06 UTC

363 points

233 comments39 min readLW link 1 review

Shutting Down the Lightcone Offices

habryka and Ben Pace

14 Mar 2023 22:47 UTC

339 points

103 comments17 min readLW link 2 reviews

Understanding and controlling a maze-solving policy network

TurnTrout, peligrietzer, Ulisse Mini, Monte M and David Udell

11 Mar 2023 18:59 UTC

334 points

28 comments23 min readLW link

The Parable of the King and the Random Process

moridinamael1 Mar 2023 22:18 UTC

315 points

26 comments6 min readLW link 3 reviews

Pausing AI Developments Isn’t Enough. We Need to Shut it All Down by Eliezer Yudkowsky

jacquesthibs29 Mar 2023 23:16 UTC

294 points

297 comments3 min readLW link

(time.com)

Discussion with Nate Soares on a key alignment difficulty

HoldenKarnofsky13 Mar 2023 21:20 UTC

276 points

43 comments22 min readLW link 1 review

Deep Deceptiveness

So8res21 Mar 2023 2:51 UTC

268 points

60 comments14 min readLW link 1 review

“Carefully Bootstrapped Alignment” is organizationally hard

Raemon17 Mar 2023 18:00 UTC

266 points

23 comments11 min readLW link 1 review

Natural Abstractions: Key Claims, Theorems, and Critiques

LawrenceC, Leon Lang and Erik Jenner

16 Mar 2023 16:37 UTC

247 points

26 comments45 min readLW link 3 reviews

The salt in pasta water fallacy

Thomas Sepulchre27 Mar 2023 14:53 UTC

244 points

52 comments3 min readLW link 2 reviews

More information about the dangerous capability evaluations we did with GPT-4 and Claude.

Beth Barnes19 Mar 2023 0:25 UTC

233 points

54 comments8 min readLW link

(evals.alignment.org)

Actually, Othello-GPT Has A Linear Emergent World Representation

Neel Nanda29 Mar 2023 22:13 UTC

213 points

26 comments19 min readLW link

(neelnanda.io)

An AI risk argument that resonates with NYTimes readers

Julian Bradshaw12 Mar 2023 23:09 UTC

212 points

14 comments1 min readLW link

Acausal normalcy

Andrew_Critch3 Mar 2023 23:34 UTC

204 points

40 comments8 min readLW link 1 review

GPT-4 Plugs In

Zvi27 Mar 2023 12:10 UTC

198 points

47 comments6 min readLW link

(thezvi.wordpress.com)

Why Not Just… Build Weak AI Tools For AI Alignment Research?

johnswentworth5 Mar 2023 0:12 UTC

187 points

18 comments6 min readLW link

ChatGPT (and now GPT4) is very easily distracted from its rules

dmcs15 Mar 2023 17:55 UTC

180 points

42 comments1 min readLW link

A rough and incomplete review of some of John Wentworth’s research

So8res28 Mar 2023 18:52 UTC

176 points

18 comments18 min readLW link

Anthropic’s Core Views on AI Safety

Zac Hatfield-Dodds9 Mar 2023 16:55 UTC

173 points

39 comments2 min readLW link

(www.anthropic.com)

Why I’m not into the Free Energy Principle

Steven Byrnes2 Mar 2023 19:27 UTC

170 points

55 comments9 min readLW link 1 review

A stylized dialogue on John Wentworth’s claims about markets and optimization

So8res25 Mar 2023 22:32 UTC

169 points

22 comments8 min readLW link

What Discovering Latent Knowledge Did and Did Not Find

Fabien Roger13 Mar 2023 19:29 UTC

166 points

17 comments11 min readLW link

Towards understanding-based safety evaluations

evhub15 Mar 2023 18:18 UTC

164 points

16 comments5 min readLW link

POC || GTFO culture as partial antidote to alignment wordcelism

lc15 Mar 2023 10:21 UTC

162 points

17 comments7 min readLW link 2 reviews

Inside the mind of a superhuman Go model: How does Leela Zero read ladders?

Haoxing Du1 Mar 2023 1:47 UTC

159 points

8 comments30 min readLW link

Why Not Just Outsource Alignment Research To An AI?

johnswentworth9 Mar 2023 21:49 UTC

159 points

50 comments9 min readLW link 1 review

What would a compute monitoring plan look like? [Linkpost]

Orpheus1626 Mar 2023 19:33 UTC

158 points

10 comments4 min readLW link

(arxiv.org)

AI: Practical Advice for the Worried

Zvi1 Mar 2023 12:30 UTC

156 points

49 comments16 min readLW link 2 reviews

(thezvi.wordpress.com)

GPT-4

nz14 Mar 2023 17:02 UTC

151 points

150 comments1 min readLW link

(openai.com)

Comments on OpenAI’s “Planning for AGI and beyond”

So8res3 Mar 2023 23:01 UTC

149 points

2 comments14 min readLW link

Dan Luu on “You can only communicate one top priority”

Raemon18 Mar 2023 18:55 UTC

149 points

18 comments3 min readLW link

(twitter.com)

Remarks 1–18 on GPT (compressed)

Cleo Nardo20 Mar 2023 22:27 UTC

147 points

35 comments31 min readLW link

The Translucent Thoughts Hypotheses and Their Implications

Fabien Roger9 Mar 2023 16:30 UTC

142 points

7 comments19 min readLW link

Speed running everyone through the bad alignment bingo. $5k bounty for a LW conversational agent

ArthurB9 Mar 2023 9:26 UTC

140 points

33 comments2 min readLW link

Against LLM Reductionism

Erich_Grunewald8 Mar 2023 15:52 UTC

140 points

17 comments18 min readLW link

(www.erichgrunewald.com)

Conceding a short timelines bet early

Matthew Barnett16 Mar 2023 21:49 UTC

134 points

17 comments1 min readLW link

Good News, Everyone!

jbash25 Mar 2023 13:48 UTC

133 points

23 comments2 min readLW link

We have to Upgrade

Jed McCaleb23 Mar 2023 17:53 UTC

131 points

35 comments2 min readLW link

High Status Eschews Quantification of Performance

niplav19 Mar 2023 22:14 UTC

128 points

36 comments5 min readLW link

[Linkpost] Some high-level thoughts on the DeepMind alignment team’s strategy

Vika and Rohin Shah

7 Mar 2023 11:55 UTC

128 points

13 comments5 min readLW link

(drive.google.com)

FLI open letter: Pause giant AI experiments

Zach Stein-Perlman29 Mar 2023 4:04 UTC

126 points

123 comments2 min readLW link

(futureoflife.org)

How bad a future do ML researchers expect?

KatjaGrace9 Mar 2023 4:50 UTC

122 points

8 comments2 min readLW link

(aiimpacts.org)

Manifold: If okay AGI, why?

Eliezer Yudkowsky25 Mar 2023 22:43 UTC

121 points

37 comments1 min readLW link

(manifold.markets)

ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so

Christopher King15 Mar 2023 0:29 UTC

116 points

22 comments2 min readLW link

Parasitic Language Games: maintaining ambiguity to hide conflict while burning the commons

Hazard12 Mar 2023 5:25 UTC

116 points

18 comments13 min readLW link

GPT can write Quines now (GPT-4)

Andrew_Critch14 Mar 2023 19:18 UTC

112 points

30 comments1 min readLW link

“Publish or Perish” (a quick note on why you should try to make your work legible to existing academic communities)

David Scott Krueger (formerly: capybaralet)18 Mar 2023 19:01 UTC

112 points

49 comments1 min readLW link 1 review

Here, have a calmness video

Kaj_Sotala16 Mar 2023 10:00 UTC

112 points

15 comments2 min readLW link

(www.youtube.com)

“Liquidity” vs “solvency” in bank runs (and some notes on Silicon Valley Bank)

rossry12 Mar 2023 9:16 UTC

108 points

27 comments12 min readLW link