LessWrong’s (first) album: I Have Been A Good Bing

1 Apr 2024 7:33 UTC
556 points
173 comments11 min readLW link

I would have shit in that alley, too

Declan Molony18 Jun 2024 4:41 UTC
398 points
128 comments4 min readLW link

Trans­form­ers Rep­re­sent Belief State Geom­e­try in their Resi­d­ual Stream

Adam Shai16 Apr 2024 21:16 UTC
397 points
100 comments12 min readLW link

There is way too much serendipity

Malmesbury19 Jan 2024 19:37 UTC
357 points
56 comments7 min readLW link

Failures in Kindness

silentbob26 Mar 2024 21:30 UTC
354 points
48 comments9 min readLW link

My hour of mem­o­ryless lucidity

Eric Neyman4 May 2024 1:40 UTC
349 points
34 comments5 min readLW link

The Best Tacit Knowl­edge Videos on Every Subject

Parker Conley31 Mar 2024 17:14 UTC
347 points
138 comments16 min readLW link

Thoughts on seed oil

dynomight20 Apr 2024 12:29 UTC
341 points
122 comments17 min readLW link

Reli­able Sources: The Story of David Gerard

TracingWoodgrains10 Jul 2024 19:50 UTC
333 points
52 comments43 min readLW link

No­tifi­ca­tions Re­ceived in 30 Minutes of Class

tanagrabeast26 May 2024 17:02 UTC
332 points
11 comments8 min readLW link

[April Fools’ Day] In­tro­duc­ing Open As­teroid Impact

Linch1 Apr 2024 8:14 UTC
332 points
29 comments1 min readLW link

Safety isn’t safety with­out a so­cial model (or: dis­pel­ling the myth of per se tech­ni­cal safety)

Andrew_Critch14 Jun 2024 0:16 UTC
324 points
34 comments4 min readLW link

MIRI 2024 Com­mu­ni­ca­tions Strategy

Gretta Duleba29 May 2024 19:33 UTC
318 points
198 comments7 min readLW link

Sleeper Agents: Train­ing De­cep­tive LLMs that Per­sist Through Safety Training

12 Jan 2024 19:51 UTC
299 points
95 comments3 min readLW link

Non-Dis­par­age­ment Ca­naries for OpenAI

30 May 2024 19:20 UTC
287 points
51 comments2 min readLW link

Gentle­ness and the ar­tifi­cial Other

Joe Carlsmith2 Jan 2024 18:21 UTC
282 points
33 comments11 min readLW link

Scale Was All We Needed, At First

Gabe M14 Feb 2024 1:49 UTC
277 points
32 comments8 min readLW link

My AI Model Delta Com­pared To Yudkowsky

johnswentworth10 Jun 2024 16:12 UTC
272 points
100 comments4 min readLW link

80,000 hours should re­move OpenAI from the Job Board (and similar EA orgs should do similarly)

Raemon3 Jul 2024 20:34 UTC
268 points
63 comments1 min readLW link

Ex­press in­ter­est in an “FHI of the West”

habryka18 Apr 2024 3:32 UTC
264 points
41 comments3 min readLW link

On green

Joe Carlsmith21 Mar 2024 17:38 UTC
261 points
35 comments31 min readLW link

Get­ting 50% (SoTA) on ARC-AGI with GPT-4o

ryan_greenblatt17 Jun 2024 18:44 UTC
257 points
49 comments13 min readLW link

Paul Chris­ti­ano named as US AI Safety In­sti­tute Head of AI Safety

Joel Burget16 Apr 2024 16:22 UTC
256 points
59 comments1 min readLW link

“No-one in my org puts money in their pen­sion”

Tobes16 Feb 2024 18:33 UTC
253 points
16 comments9 min readLW link

My PhD the­sis: Al­gorith­mic Bayesian Epistemology

Eric Neyman16 Mar 2024 22:56 UTC
252 points
14 comments7 min readLW link

The case for en­sur­ing that pow­er­ful AIs are controlled

24 Jan 2024 16:11 UTC
250 points
66 comments28 min readLW link

Ilya Sutskever and Jan Leike re­sign from OpenAI [up­dated]

Zach Stein-Perlman15 May 2024 0:45 UTC
246 points
95 comments2 min readLW link

AI com­pa­nies aren’t re­ally us­ing ex­ter­nal evaluators

Zach Stein-Perlman24 May 2024 16:01 UTC
240 points
15 comments4 min readLW link

My Clients, The Liars

ymeskhout5 Mar 2024 21:06 UTC
238 points
85 comments7 min readLW link

Re­fusal in LLMs is me­di­ated by a sin­gle direction

27 Apr 2024 11:13 UTC
222 points
87 comments10 min readLW link

In­tro­duc­ing AI Lab Watch

Zach Stein-Perlman30 Apr 2024 17:00 UTC
222 points
30 comments1 min readLW link

MIRI 2024 Mis­sion and Strat­egy Update

Malo5 Jan 2024 0:20 UTC
221 points
44 comments8 min readLW link

Modern Trans­form­ers are AGI, and Hu­man-Level

abramdemski26 Mar 2024 17:46 UTC
219 points
89 comments5 min readLW link

SAE fea­ture ge­om­e­try is out­side the su­per­po­si­tion hypothesis

jake_mendel24 Jun 2024 16:07 UTC
216 points
17 comments11 min readLW link

Believ­ing In

AnnaSalamon8 Feb 2024 7:06 UTC
215 points
50 comments13 min readLW link

CFAR Take­aways: An­drew Critch

Raemon14 Feb 2024 1:37 UTC
215 points
62 comments5 min readLW link

Brute Force Man­u­fac­tured Con­sen­sus is Hid­ing the Crime of the Century

Roko3 Feb 2024 20:36 UTC
214 points
156 comments9 min readLW link

ChatGPT can learn in­di­rect control

Raymond D21 Mar 2024 21:11 UTC
213 points
24 comments1 min readLW link

Truth­seek­ing is the ground in which other prin­ci­ples grow

Elizabeth27 May 2024 1:09 UTC
207 points
14 comments16 min readLW link

“How could I have thought that faster?”

mesaoptimizer11 Mar 2024 10:56 UTC
205 points
32 comments2 min readLW link

OpenAI: Fallout

Zvi28 May 2024 13:20 UTC
204 points
25 comments36 min readLW link

Rais­ing chil­dren on the eve of AI

juliawise15 Feb 2024 21:28 UTC
204 points
38 comments5 min readLW link

Towards more co­op­er­a­tive AI safety strategies

Richard_Ngo16 Jul 2024 4:36 UTC
202 points
126 comments4 min readLW link

LLM Gen­er­al­ity is a Timeline Crux

eggsyntax24 Jun 2024 12:52 UTC
201 points
92 comments7 min readLW link

Su­perba­bies: Put­ting The Pie­ces Together

sarahconstantin11 Jul 2024 20:40 UTC
200 points
35 comments10 min readLW link

Jaan Tal­linn’s 2023 Philan­thropy Overview

jaan20 May 2024 12:11 UTC
199 points
5 comments1 min readLW link

Maybe An­thropic’s Long-Term Benefit Trust is powerless

Zach Stein-Perlman27 May 2024 13:00 UTC
199 points
21 comments2 min readLW link

Mechanis­ti­cally Elic­it­ing La­tent Be­hav­iors in Lan­guage Models

30 Apr 2024 18:51 UTC
199 points
37 comments45 min readLW link

Sam Alt­man’s Chip Am­bi­tions Un­der­cut OpenAI’s Safety Strategy

garrison10 Feb 2024 19:52 UTC
198 points
52 comments1 min readLW link

Funny Anec­dote of Eliezer From His Sister

Noah Birnbaum22 Apr 2024 22:05 UTC
195 points
6 comments2 min readLW link