The AI’s Toolbox: From Soggy Toast to Op­ti­mal Solutions

Thehumanproject.ai22 Jun 2025 20:54 UTC
1 point
0 comments8 min readLW link

Black-box in­ter­pretabil­ity method­ol­ogy blueprint: Prob­ing run­away op­ti­mi­sa­tion in LLMs

Roland Pihlakas22 Jun 2025 18:16 UTC
17 points
0 comments7 min readLW link

The Crois­sant Prin­ci­ple: A The­ory of AI Generalization

Jeffrey Liang22 Jun 2025 17:58 UTC
20 points
6 comments2 min readLW link

Re­la­tional De­sign Can’t Be Left to Chance

Priyanka Bharadwaj22 Jun 2025 15:32 UTC
5 points
0 comments3 min readLW link

Ground­ing to Avoid Air­plane Delays

jefftk22 Jun 2025 1:50 UTC
30 points
0 comments2 min readLW link
(www.jefftk.com)

Open ques­tions on com­pat­i­bil­ist free will and sub­junc­tive dependence

jackmastermind22 Jun 2025 1:15 UTC
3 points
0 comments1 min readLW link
(jacktlab.substack.com)

The Six­teen Kinds of Intimacy

Ruby21 Jun 2025 19:59 UTC
57 points
2 comments5 min readLW link

Book re­view: Against Method

Valdes21 Jun 2025 18:59 UTC
9 points
0 comments6 min readLW link

Con­trived eval­u­a­tions are use­ful evaluations

pradyuprasad21 Jun 2025 18:18 UTC
3 points
0 comments3 min readLW link
(speculativedecoding.substack.com)

Con­sider chilling out in 2028

Valentine21 Jun 2025 17:07 UTC
189 points
143 comments13 min readLW link

Up­com­ing work­shop on Post-AGI Civ­i­liza­tional Equilibria

21 Jun 2025 15:57 UTC
25 points
0 comments1 min readLW link

Ge­nomic emancipation

TsviBT21 Jun 2025 8:15 UTC
83 points
14 comments26 min readLW link

Eval­u­at­ing the Risk of Job Dis­place­ment by Trans­for­ma­tive AI Au­toma­tion in Devel­op­ing Coun­tries: A Case Study on Brazil

Abubakar21 Jun 2025 0:48 UTC
4 points
0 comments15 min readLW link

Back­door aware­ness and mis­al­igned per­sonas in rea­son­ing models

20 Jun 2025 23:38 UTC
35 points
8 comments6 min readLW link

Agen­tic Misal­ign­ment: How LLMs Could be In­sider Threats

20 Jun 2025 22:34 UTC
83 points
13 comments6 min readLW link

Clar­ify­ing “wis­dom”: Foun­da­tional top­ics for al­igned AIs to pri­ori­tize be­fore ir­re­versible decisions

Anthony DiGiovanni20 Jun 2025 21:55 UTC
40 points
2 comments12 min readLW link

Are In­tel­li­gent Agents More Eth­i­cal?

PeterMcCluskey20 Jun 2025 21:26 UTC
13 points
7 comments2 min readLW link

An AI Arms Race Scenario

shanzson20 Jun 2025 19:25 UTC
2 points
2 comments1 min readLW link

Mak­ing deals with early schemers

20 Jun 2025 18:21 UTC
127 points
41 comments15 min readLW link

Ivan Gay­ton: A Right and a Duty

Elizabeth20 Jun 2025 18:20 UTC
21 points
0 comments1 min readLW link
(acesounderglass.com)

What is the func­tional role of SAE er­rors?

20 Jun 2025 18:11 UTC
12 points
6 comments38 min readLW link

Mus­ings on AI Com­pa­nies of 2025-2026 (Jun 2025)

Vladimir_Nesov20 Jun 2025 17:14 UTC
66 points
4 comments3 min readLW link

Es­cap­ing the Jun­gles of Nor­wood: A Ra­tion­al­ist’s Guide to Male Pat­tern Baldness

AlphaAndOmega20 Jun 2025 16:40 UTC
12 points
10 comments1 min readLW link
(open.substack.com)

Pre­fix cache un­trusted mon­i­tors: a method to ap­ply af­ter you catch your AI

ryan_greenblatt20 Jun 2025 15:56 UTC
33 points
2 comments7 min readLW link

Did the Army Poi­son a Bunch of Women in Min­nesota?

rba20 Jun 2025 15:33 UTC
54 points
2 comments4 min readLW link

AI #121 Part 2: The OpenAI Files

Zvi20 Jun 2025 14:50 UTC
37 points
9 comments41 min readLW link
(thezvi.wordpress.com)

Smarter Models Lie Less

Expertium20 Jun 2025 13:31 UTC
6 points
0 comments2 min readLW link

AI Safety Com­mu­ni­ca­tors Meet-up

Vishakha20 Jun 2025 12:34 UTC
3 points
0 comments1 min readLW link

X ex­plains Z% of the var­i­ance in Y

Leon Lang20 Jun 2025 12:17 UTC
160 points
36 comments9 min readLW link

Yes RAND, AI Could Really Cause Hu­man Ex­tinc­tion [cross­post]

otto.barten20 Jun 2025 11:42 UTC
17 points
4 comments4 min readLW link
(www.existentialriskobservatory.org)

Misal­ign­ment or mi­suse? The AGI al­ign­ment tradeoff

Max_He-Ho20 Jun 2025 10:43 UTC
3 points
0 comments1 min readLW link
(forum.effectivealtruism.org)

Paphos

Yudhister Kumar20 Jun 2025 9:25 UTC
4 points
0 comments1 min readLW link
(yudhister.me)

Rome

Yudhister Kumar20 Jun 2025 9:23 UTC
3 points
0 comments2 min readLW link
(yudhister.me)

Geneva

Yudhister Kumar20 Jun 2025 9:22 UTC
4 points
0 comments1 min readLW link
(yudhister.me)

Toledo

Yudhister Kumar20 Jun 2025 9:18 UTC
3 points
0 comments2 min readLW link
(www.yudhister.me)

Graph­ing AI eco­nomic growth rates, or time to Dyson Swarm

denkenberger20 Jun 2025 7:00 UTC
4 points
2 comments1 min readLW link

the silk pa­ja­mas effect

thiccythot20 Jun 2025 3:31 UTC
41 points
11 comments4 min readLW link

Change And Iden­tity: a Story and Dis­cus­sion on the Evolv­ing Self

Rob Lucas20 Jun 2025 1:44 UTC
0 points
0 comments19 min readLW link
(open.substack.com)

Mov­ing Past the Ques­tion of Con­scious­ness: A Thought Experiment

Satya Benson19 Jun 2025 19:52 UTC
12 points
8 comments2 min readLW link
(satchlj.com)

S-Ex­pres­sions as a De­sign Lan­guage: A Tool for De­con­fu­sion in Align­ment

Johannes C. Mayer19 Jun 2025 19:03 UTC
5 points
0 comments6 min readLW link

AISEC: Why to not to be shy.

xen919 Jun 2025 18:16 UTC
4 points
1 comment1 min readLW link

LLMs as am­plifiers, not assistants

Caleb Biddulph19 Jun 2025 17:21 UTC
27 points
8 comments7 min readLW link

How The Singer Sang His Tales

adamShimi19 Jun 2025 17:06 UTC
18 points
0 comments36 min readLW link
(formethods.substack.com)

Key paths, plans and strate­gies to AI safety success

Adam Jones19 Jun 2025 16:56 UTC
13 points
0 comments6 min readLW link
(bluedot.org)

AI safety tech­niques lev­er­ag­ing distillation

ryan_greenblatt19 Jun 2025 14:31 UTC
61 points
0 comments12 min readLW link

Poli­ti­cal Fund­ing Ex­per­tise (Post 6 of 7 on AI Gover­nance)

Mass_Driver19 Jun 2025 14:14 UTC
59 points
4 comments14 min readLW link

Doc­u­ments Are Dead. Long Live the Con­ver­sa­tional Proxy.

8harath19 Jun 2025 14:01 UTC
−9 points
1 comment1 min readLW link

[Question] How did you find out about AI Safety? Why and how did you get in­volved?

Ana Lopez19 Jun 2025 14:00 UTC
1 point
0 comments1 min readLW link

A deep cri­tique of AI 2027’s bad timeline models

titotal19 Jun 2025 13:29 UTC
372 points
40 comments39 min readLW link
(titotal.substack.com)

AI #121 Part 1: New Connections

Zvi19 Jun 2025 13:00 UTC
32 points
12 comments39 min readLW link
(thezvi.wordpress.com)