LessWrong’s (first) album: I Have Been A Good Bing

1 Apr 2024 7:33 UTC
544 points
170 comments11 min readLW link

Trans­form­ers Rep­re­sent Belief State Geom­e­try in their Resi­d­ual Stream

Adam Shai16 Apr 2024 21:16 UTC
397 points
100 comments12 min readLW link

There is way too much serendipity

Malmesbury19 Jan 2024 19:37 UTC
351 points
56 comments7 min readLW link

The Best Tacit Knowl­edge Videos on Every Subject

Parker Conley31 Mar 2024 17:14 UTC
339 points
134 comments16 min readLW link

My hour of mem­o­ryless lucidity

Eric Neyman4 May 2024 1:40 UTC
338 points
32 comments5 min readLW link
(ericneyman.wordpress.com)

[April Fools’ Day] In­tro­duc­ing Open As­teroid Impact

Linch1 Apr 2024 8:14 UTC
332 points
29 comments1 min readLW link
(openasteroidimpact.org)

Thoughts on seed oil

dynomight20 Apr 2024 12:29 UTC
330 points
118 comments17 min readLW link
(dynomight.net)

No­tifi­ca­tions Re­ceived in 30 Minutes of Class

tanagrabeast26 May 2024 17:02 UTC
328 points
9 comments8 min readLW link

I would have shit in that alley, too

Declan Molony18 Jun 2024 4:41 UTC
322 points
65 comments4 min readLW link

MIRI 2024 Com­mu­ni­ca­tions Strategy

Gretta Duleba29 May 2024 19:33 UTC
314 points
193 comments7 min readLW link

Sleeper Agents: Train­ing De­cep­tive LLMs that Per­sist Through Safety Training

12 Jan 2024 19:51 UTC
298 points
95 comments3 min readLW link
(arxiv.org)

Non-Dis­par­age­ment Ca­naries for OpenAI

30 May 2024 19:20 UTC
286 points
51 comments2 min readLW link

Gentle­ness and the ar­tifi­cial Other

Joe Carlsmith2 Jan 2024 18:21 UTC
278 points
33 comments11 min readLW link

Scale Was All We Needed, At First

Gabe M14 Feb 2024 1:49 UTC
273 points
32 comments8 min readLW link
(aiacumen.substack.com)

Ex­press in­ter­est in an “FHI of the West”

habryka18 Apr 2024 3:32 UTC
264 points
41 comments3 min readLW link

On green

Joe Carlsmith21 Mar 2024 17:38 UTC
262 points
35 comments31 min readLW link

Paul Chris­ti­ano named as US AI Safety In­sti­tute Head of AI Safety

Joel Burget16 Apr 2024 16:22 UTC
256 points
59 comments1 min readLW link
(www.commerce.gov)

My AI Model Delta Com­pared To Yudkowsky

johnswentworth10 Jun 2024 16:12 UTC
255 points
93 comments4 min readLW link

My PhD the­sis: Al­gorith­mic Bayesian Epistemology

Eric Neyman16 Mar 2024 22:56 UTC
252 points
14 comments7 min readLW link
(arxiv.org)

“No-one in my org puts money in their pen­sion”

Tobes16 Feb 2024 18:33 UTC
249 points
16 comments9 min readLW link
(seekingtobejolly.substack.com)

Failures in Kindness

silentbob26 Mar 2024 21:30 UTC
248 points
27 comments9 min readLW link

The case for en­sur­ing that pow­er­ful AIs are controlled

24 Jan 2024 16:11 UTC
247 points
66 comments28 min readLW link

Ilya Sutskever and Jan Leike re­sign from OpenAI [up­dated]

Zach Stein-Perlman15 May 2024 0:45 UTC
246 points
95 comments3 min readLW link

AI com­pa­nies aren’t re­ally us­ing ex­ter­nal evaluators

Zach Stein-Perlman24 May 2024 16:01 UTC
240 points
15 comments4 min readLW link

My Clients, The Liars

ymeskhout5 Mar 2024 21:06 UTC
236 points
85 comments7 min readLW link

Get­ting 50% (SoTA) on ARC-AGI with GPT-4o

ryan_greenblatt17 Jun 2024 18:44 UTC
235 points
34 comments13 min readLW link

Brute Force Man­u­fac­tured Con­sen­sus is Hid­ing the Crime of the Century

Roko3 Feb 2024 20:36 UTC
222 points
156 comments9 min readLW link

In­tro­duc­ing AI Lab Watch

Zach Stein-Perlman30 Apr 2024 17:00 UTC
220 points
30 comments1 min readLW link
(ailabwatch.org)

Safety isn’t safety with­out a so­cial model (or: dis­pel­ling the myth of per se tech­ni­cal safety)

Andrew_Critch14 Jun 2024 0:16 UTC
219 points
22 comments4 min readLW link

MIRI 2024 Mis­sion and Strat­egy Update

Malo5 Jan 2024 0:20 UTC
218 points
44 comments8 min readLW link

Re­fusal in LLMs is me­di­ated by a sin­gle direction

27 Apr 2024 11:13 UTC
217 points
83 comments10 min readLW link

CFAR Take­aways: An­drew Critch

Raemon14 Feb 2024 1:37 UTC
215 points
62 comments5 min readLW link

Believ­ing In

AnnaSalamon8 Feb 2024 7:06 UTC
214 points
49 comments13 min readLW link

Modern Trans­form­ers are AGI, and Hu­man-Level

abramdemski26 Mar 2024 17:46 UTC
213 points
89 comments5 min readLW link

ChatGPT can learn in­di­rect control

Raymond D21 Mar 2024 21:11 UTC
212 points
24 comments1 min readLW link

OpenAI: Fallout

Zvi28 May 2024 13:20 UTC
204 points
25 comments36 min readLW link
(thezvi.wordpress.com)

Truth­seek­ing is the ground in which other prin­ci­ples grow

Elizabeth27 May 2024 1:09 UTC
202 points
11 comments16 min readLW link

“How could I have thought that faster?”

mesaoptimizer11 Mar 2024 10:56 UTC
201 points
32 comments2 min readLW link
(twitter.com)

Maybe An­thropic’s Long-Term Benefit Trust is powerless

Zach Stein-Perlman27 May 2024 13:00 UTC
199 points
21 comments2 min readLW link

Sam Alt­man’s Chip Am­bi­tions Un­der­cut OpenAI’s Safety Strategy

garrison10 Feb 2024 19:52 UTC
198 points
52 comments1 min readLW link
(garrisonlovely.substack.com)

Mechanis­ti­cally Elic­it­ing La­tent Be­hav­iors in Lan­guage Models

30 Apr 2024 18:51 UTC
197 points
37 comments45 min readLW link

Funny Anec­dote of Eliezer From His Sister

Daniel Birnbaum22 Apr 2024 22:05 UTC
195 points
6 comments2 min readLW link

Jaan Tal­linn’s 2023 Philan­thropy Overview

jaan20 May 2024 12:11 UTC
194 points
5 comments1 min readLW link
(jaan.info)

What’s Go­ing on With OpenAI’s Mes­sag­ing?

ozziegooen21 May 2024 2:22 UTC
191 points
13 comments1 min readLW link

Toward A Math­e­mat­i­cal Frame­work for Com­pu­ta­tion in Superposition

18 Jan 2024 21:06 UTC
190 points
17 comments73 min readLW link

My In­ter­view With Cade Metz on His Re­port­ing About Slate Star Codex

Zack_M_Davis26 Mar 2024 17:18 UTC
188 points
187 comments6 min readLW link

The im­pos­si­ble prob­lem of due process

mingyuan16 Jan 2024 5:18 UTC
188 points
63 comments14 min readLW link

[Question] Ex­am­ples of Highly Coun­ter­fac­tual Dis­cov­er­ies?

johnswentworth23 Apr 2024 22:19 UTC
185 points
99 comments1 min readLW link

Con­tra Ngo et al. “Every ‘Every Bay Area House Party’ Bay Area House Party”

Ricki Heicklen22 Feb 2024 23:56 UTC
184 points
5 comments4 min readLW link
(bayesshammai.substack.com)

Daniel Kah­ne­man has died

DanielFilan27 Mar 2024 15:59 UTC
183 points
11 comments1 min readLW link
(www.washingtonpost.com)