LessWrong’s (first) album: I Have Been A Good Bing

1 Apr 2024 7:33 UTC
529 points
158 comments11 min readLW link

Trans­form­ers Rep­re­sent Belief State Geom­e­try in their Resi­d­ual Stream

Adam Shai16 Apr 2024 21:16 UTC
366 points
83 comments12 min readLW link

[April Fools’ Day] In­tro­duc­ing Open As­teroid Impact

Linch1 Apr 2024 8:14 UTC
324 points
29 comments1 min readLW link
(openasteroidimpact.org)

Thoughts on seed oil

dynomight20 Apr 2024 12:29 UTC
300 points
114 comments17 min readLW link
(dynomight.net)

Ex­press in­ter­est in an “FHI of the West”

habryka18 Apr 2024 3:32 UTC
260 points
41 comments3 min readLW link

Paul Chris­ti­ano named as US AI Safety In­sti­tute Head of AI Safety

Joel Burget16 Apr 2024 16:22 UTC
254 points
59 comments1 min readLW link
(www.commerce.gov)

Funny Anec­dote of Eliezer From His Sister

Daniel Birnbaum22 Apr 2024 22:05 UTC
197 points
5 comments2 min readLW link

Re­fusal in LLMs is me­di­ated by a sin­gle direction

27 Apr 2024 11:13 UTC
184 points
76 comments10 min readLW link

[Question] Ex­am­ples of Highly Coun­ter­fac­tual Dis­cov­er­ies?

johnswentworth23 Apr 2024 22:19 UTC
178 points
93 comments1 min readLW link

OMMC An­nounces RIP

1 Apr 2024 23:20 UTC
178 points
5 comments2 min readLW link

FHI (Fu­ture of Hu­man­ity In­sti­tute) has shut down (2005–2024)

gwern17 Apr 2024 13:54 UTC
173 points
21 comments1 min readLW link
(www.futureofhumanityinstitute.org)

Why Would Belief-States Have A Frac­tal Struc­ture, And Why Would That Mat­ter For In­ter­pretabil­ity? An Explainer

18 Apr 2024 0:27 UTC
170 points
18 comments7 min readLW link

Re­con­sider the anti-cav­ity bac­te­ria if you are Asian

Lao Mein15 Apr 2024 7:02 UTC
167 points
41 comments4 min readLW link

On Not Pul­ling The Lad­der Up Be­hind You

Screwtape26 Apr 2024 21:58 UTC
163 points
14 comments9 min readLW link

Iron­ing Out the Squiggles

Zack_M_Davis29 Apr 2024 16:13 UTC
144 points
34 comments11 min readLW link

Daniel Den­nett has died (1942-2024)

kave19 Apr 2024 16:17 UTC
142 points
5 comments1 min readLW link
(dailynous.com)

LLMs for Align­ment Re­search: a safety pri­or­ity?

abramdemski4 Apr 2024 20:03 UTC
138 points
24 comments11 min readLW link

My ex­pe­rience us­ing fi­nan­cial com­mit­ments to over­come akrasia

William Howard15 Apr 2024 22:57 UTC
124 points
31 comments18 min readLW link

RTFB: On the New Pro­posed CAIP AI Bill

Zvi10 Apr 2024 18:30 UTC
119 points
14 comments34 min readLW link
(thezvi.wordpress.com)

Sim­ple probes can catch sleeper agents

23 Apr 2024 21:10 UTC
117 points
15 comments1 min readLW link
(www.anthropic.com)

A Selec­tion of Ran­domly Selected SAE Features

1 Apr 2024 9:09 UTC
106 points
2 comments4 min readLW link

The first fu­ture and the best future

KatjaGrace25 Apr 2024 6:40 UTC
104 points
11 comments1 min readLW link
(worldspiritsockpuppet.com)

[Question] What con­vinc­ing warn­ing shot could help pre­vent ex­tinc­tion from AI?

13 Apr 2024 18:09 UTC
103 points
18 comments2 min readLW link

Discrim­i­nat­ing Be­hav­iorally Iden­ti­cal Clas­sifiers: a model prob­lem for ap­ply­ing in­ter­pretabil­ity to scal­able oversight

Sam Marks18 Apr 2024 16:17 UTC
101 points
7 comments12 min readLW link

Carl Sa­gan, nuk­ing the moon, and not nuk­ing the moon

eukaryote13 Apr 2024 4:08 UTC
96 points
7 comments6 min readLW link
(eukaryotewritesblog.com)

MIRI’s April 2024 Newsletter

Harlan12 Apr 2024 23:38 UTC
95 points
0 comments3 min readLW link
(intelligence.org)

Par­tial value takeover with­out world takeover

KatjaGrace5 Apr 2024 6:20 UTC
89 points
23 comments3 min readLW link
(worldspiritsockpuppet.com)

Spar­sify: A mechanis­tic in­ter­pretabil­ity re­search agenda

Lee Sharkey3 Apr 2024 12:34 UTC
85 points
22 comments22 min readLW link

A Dozen Ways to Get More Dakka

Davidmanheim8 Apr 2024 4:45 UTC
83 points
5 comments3 min readLW link

Es­say com­pe­ti­tion on the Au­toma­tion of Wis­dom and Philos­o­phy — $25k in prizes

16 Apr 2024 10:10 UTC
79 points
6 comments8 min readLW link
(blog.aiimpacts.org)

Pri­ors and Prejudice

MathiasKB22 Apr 2024 15:00 UTC
78 points
16 comments7 min readLW link

When is a mind me?

Rob Bensinger17 Apr 2024 5:56 UTC
76 points
62 comments15 min readLW link

A cou­ple pro­duc­tivity tips for overthinkers

Steven Byrnes20 Apr 2024 16:05 UTC
75 points
9 comments4 min readLW link

Co­her­ence of Caches and Agents

johnswentworth1 Apr 2024 23:04 UTC
74 points
7 comments11 min readLW link

A Gen­tle In­tro­duc­tion to Risk Frame­works Beyond Forecasting

pendingsurvival11 Apr 2024 18:03 UTC
73 points
10 comments27 min readLW link

Creat­ing un­re­stricted AI Agents with Com­mand R+

Simon Lermen16 Apr 2024 14:52 UTC
72 points
12 comments5 min readLW link

[Full Post] Progress Up­date #1 from the GDM Mech In­terp Team

19 Apr 2024 19:06 UTC
71 points
8 comments8 min readLW link

An­nounc­ing Suffer­ing For Good

Garrett Baker1 Apr 2024 17:08 UTC
70 points
5 comments1 min readLW link

Re­ject­ing Television

Declan Molony23 Apr 2024 4:59 UTC
69 points
10 comments6 min readLW link

Mo­ti­va­tion gaps: Why so much EA crit­i­cism is hos­tile and lazy

titotal22 Apr 2024 11:49 UTC
69 points
5 comments1 min readLW link
(titotal.substack.com)

Towards Mul­ti­modal In­ter­pretabil­ity: Learn­ing Sparse In­ter­pretable Fea­tures in Vi­sion Transformers

hugofry29 Apr 2024 20:57 UTC
68 points
7 comments11 min readLW link

[Sum­mary] Progress Up­date #1 from the GDM Mech In­terp Team

19 Apr 2024 19:06 UTC
68 points
0 comments3 min readLW link

Mid-con­di­tional love

KatjaGrace17 Apr 2024 4:00 UTC
68 points
19 comments2 min readLW link
(worldspiritsockpuppet.com)

The In­ner Ring by C. S. Lewis

Saul Munn24 Apr 2024 22:48 UTC
68 points
6 comments13 min readLW link
(www.lewissociety.org)

The 2nd De­mo­graphic Transition

Maxwell Tabarrok6 Apr 2024 14:10 UTC
68 points
17 comments4 min readLW link
(www.maximum-progress.com)

Gen­er­al­ized Stat Mech: The Boltz­mann Approach

12 Apr 2024 17:47 UTC
67 points
7 comments20 min readLW link

AXRP Epi­sode 27 - AI Con­trol with Buck Sh­legeris and Ryan Greenblatt

DanielFilan11 Apr 2024 21:30 UTC
67 points
10 comments107 min readLW link

Best in Class Life Improvement

sapphire4 Apr 2024 1:51 UTC
66 points
15 comments16 min readLW link

How We Pic­ture Bayesian Agents

8 Apr 2024 18:12 UTC
65 points
11 comments7 min readLW link

Con­structabil­ity: Plainly-coded AGIs may be fea­si­ble in the near future

27 Apr 2024 16:04 UTC
65 points
12 comments13 min readLW link