Is Es­ca­la­tion Inevitable?

Lennart Wijers31 May 2025 22:10 UTC
5 points
0 comments3 min readLW link

Policy En­tropy, Learn­ing, and Align­ment (Or Maybe Your LLM Needs Ther­apy)

sdeture31 May 2025 22:09 UTC
15 points
6 comments8 min readLW link

The Unseen Hand: AI’s Prob­lem Preemp­tion and the True Fu­ture of Labor

Ben Kassan31 May 2025 22:04 UTC
8 points
0 comments20 min readLW link

The 80/​20 play­book for miti­gat­ing AI schem­ing in 2025

Charbel-Raphaël31 May 2025 21:17 UTC
40 points
2 comments4 min readLW link

Col­lec­tive Ac­tion for AI Safety (June 4, NYC)

Jordan Braunstein31 May 2025 20:27 UTC
1 point
0 comments1 min readLW link

The best ap­proaches for miti­gat­ing “the in­tel­li­gence curse” (or grad­ual dis­em­pow­er­ment); my quick guesses at the best ob­ject-level interventions

ryan_greenblatt31 May 2025 18:20 UTC
76 points
19 comments5 min readLW link

Would It Be Bet­ter to Dispense with Good and Evil?

arusarda31 May 2025 16:40 UTC
−2 points
10 comments6 min readLW link

How Epistemic Col­lapse Looks from Inside

Martin Sustrik31 May 2025 16:30 UTC
9 points
11 comments1 min readLW link
(250bpm.substack.com)

When will AI au­to­mate all men­tal work, and how fast?

31 May 2025 16:18 UTC
10 points
0 comments7 min readLW link
(youtu.be)

Progress links and short notes, 2025-05-31: RPI fel­low­ship dead­line to­mor­row, Edge Es­mer­alda next week, and more

jasoncrawford31 May 2025 15:20 UTC
11 points
0 comments7 min readLW link
(newsletter.rootsofprogress.org)

House Party Dances

jefftk31 May 2025 15:20 UTC
13 points
1 comment1 min readLW link
(www.jefftk.com)

Free Will, Like Prob­a­bil­ity, is About Lo­cal Knowledge

Rob Lucas31 May 2025 14:19 UTC
4 points
6 comments16 min readLW link
(open.substack.com)

The (Unoffi­cial) Ra­tion­al­ity: A-Z Anki Deck

japancolorado31 May 2025 7:01 UTC
30 points
8 comments1 min readLW link

Zochi Pub­lishes A* Paper

mannatvjain31 May 2025 0:00 UTC
11 points
0 comments4 min readLW link
(www.intology.ai)

Me­mory De­cod­ing Jour­nal Club: Struc­ture and func­tion of the hip­pocam­pal CA3 module

Devin Ward30 May 2025 23:59 UTC
1 point
0 comments1 min readLW link

Di­a­betes is Caused by Ox­ida­tive Stress

Lorec30 May 2025 21:03 UTC
11 points
11 comments8 min readLW link

Too Many Me­taphors: A Case for Plain Talk in AI Safety

David Harket30 May 2025 19:29 UTC
0 points
8 comments2 min readLW link

[Question] Could we go an­other route with com­put­ers?

Roman Malov30 May 2025 19:04 UTC
12 points
5 comments1 min readLW link

Aris­totelian Op­ti­miza­tion: The Eco­nomics of Cameralism

Edward Könings30 May 2025 19:02 UTC
−2 points
1 comment13 min readLW link

I repli­cated the An­thropic al­ign­ment fak­ing ex­per­i­ment on other mod­els, and they didn’t fake alignment

30 May 2025 18:57 UTC
34 points
0 comments2 min readLW link

‘GiveWell for AI Safety’: Les­sons learned in a week

Lydia Nottingham30 May 2025 18:38 UTC
41 points
0 comments6 min readLW link

Idea Gen­er­a­tion and Sifting

belos30 May 2025 16:59 UTC
1 point
0 comments20 min readLW link
(bestofagreatlot.substack.com)

50 Ideas for Life I Re­peat­edly Share

DMMF30 May 2025 16:57 UTC
26 points
9 comments15 min readLW link
(notnottalmud.substack.com)

Virtues re­lated to honesty

Orioth30 May 2025 14:11 UTC
11 points
23 comments2 min readLW link

AI 2027 - Rogue Repli­ca­tion Timeline

Alvin Ånestrand30 May 2025 13:46 UTC
41 points
3 comments7 min readLW link
(forecastingaifutures.substack.com)

Let­ting Kids Be Kids

Zvi30 May 2025 10:50 UTC
86 points
15 comments20 min readLW link
(thezvi.wordpress.com)

The Geom­e­try of LLM Log­its (an an­a­lyt­i­cal outer bound)

Rohan Ganapavarapu30 May 2025 1:21 UTC
5 points
0 comments2 min readLW link
(rohan.ga)

Me­mory De­cod­ing Jour­nal Club: Struc­ture and func­tion of the hip­pocam­pal CA3 module

Devin Ward30 May 2025 1:08 UTC
1 point
0 comments1 min readLW link

Ex­per­i­men­tal CFAR Mini-Work­shop @ Ar­bor Sum­mer Camp

Davis_Kingsley30 May 2025 0:23 UTC
12 points
0 comments2 min readLW link

CFAR is run­ning an ex­per­i­men­tal mini-work­shop (June 2-6, Berkeley CA)!

Davis_Kingsley29 May 2025 22:02 UTC
64 points
2 comments2 min readLW link

Or­phaned Poli­cies (Post 5 of 7 on AI Gover­nance)

Mass_Driver29 May 2025 21:42 UTC
70 points
5 comments16 min readLW link

Grad­ual Disem­pow­er­ment: Con­crete Re­search Projects

Raymond Douglas29 May 2025 18:55 UTC
100 points
10 comments10 min readLW link

Do you even have a sys­tem prompt? (PSA /​ repo)

Croissanthology29 May 2025 18:49 UTC
108 points
77 comments2 min readLW link

In­cor­rect Baseline Eval­u­a­tions Call into Ques­tion Re­cent LLM-RL Claims

shash4229 May 2025 18:40 UTC
66 points
7 comments1 min readLW link
(safe-lip-9a8.notion.site)

Dimensionalization

Jordan Rubin29 May 2025 18:18 UTC
7 points
6 comments4 min readLW link
(jordanmrubin.substack.com)

Distil­led Hu­man Judg­ment: Reify­ing AI Alignment

Devansh Mehta29 May 2025 18:06 UTC
2 points
0 comments4 min readLW link

Sum­mer AI Safety In­tro Fel­low­ships in Bos­ton and On­line (Policy & Tech­ni­cal) – Ap­ply by June 6!

jandrade11229 May 2025 18:02 UTC
1 point
0 comments1 min readLW link

Digi­tal sen­tience fund­ing op­por­tu­ni­ties: Sup­port for ap­plied work and research

29 May 2025 15:22 UTC
21 points
0 comments4 min readLW link

When to Be Nice vs Kind

Declan Molony29 May 2025 15:06 UTC
24 points
2 comments1 min readLW link

AI #118: Claude Ascendant

Zvi29 May 2025 14:10 UTC
45 points
8 comments57 min readLW link
(thezvi.wordpress.com)

So­cial Cap­i­tal—Does it Mat­ter?

Momcilo29 May 2025 12:26 UTC
−9 points
1 comment6 min readLW link

Align­ment Cri­sis: Geno­cide Denial

_mp_29 May 2025 12:04 UTC
−11 points
5 comments4 min readLW link

Cross-post­ing to Substack

jefftk29 May 2025 11:10 UTC
12 points
0 comments1 min readLW link
(www.jefftk.com)

Reflec­tions on AI Wis­dom, plus an­nounc­ing Wise AI Wednesdays

Chris_Leong29 May 2025 7:13 UTC
18 points
0 comments3 min readLW link

[Question] What was so great about Move 37?

Caleb Biddulph29 May 2025 7:00 UTC
24 points
4 comments3 min readLW link

Pro­ce­du­ral vs. Causal Understanding

Caleb Biddulph29 May 2025 7:00 UTC
7 points
2 comments2 min readLW link

Se­cu­rity Mind­set: Hack­ing Pin­ball High Scores

gwern29 May 2025 3:39 UTC
27 points
3 comments1 min readLW link
(gwern.net)

Quick Min­i­mal Playhouse

jefftk29 May 2025 2:10 UTC
17 points
1 comment1 min readLW link
(www.jefftk.com)

Cog­ni­tive Ex­haus­tion and Eng­ineered Trust: Les­sons from My Gym

Priyanka Bharadwaj29 May 2025 1:21 UTC
14 points
3 comments3 min readLW link

Truth or Dare

Duncan Sabien (Inactive)29 May 2025 0:07 UTC
263 points
61 comments69 min readLW link