All Ra­tion­al­ists hate & sab­o­tage Strat­egy with­out hav­ing any aware­ness of it.

Oxidize26 May 2025 22:09 UTC
−27 points
8 comments7 min readLW link

Per­sonal Ru­mi­na­tions on AI’s Miss­ing Vari­able Problem

Thehumanproject.ai26 May 2025 21:11 UTC
1 point
0 comments3 min readLW link

Poetic Meth­ods II: Rhyme as a Fo­cus­ing Device

adamShimi26 May 2025 18:29 UTC
24 points
1 comment17 min readLW link
(formethods.substack.com)

Is Build­ing Good Note-Tak­ing Soft­ware an AGI-Com­plete Prob­lem?

Thane Ruthenis26 May 2025 18:26 UTC
26 points
13 comments7 min readLW link

Prin­ci­pal-Agent Prob­lems and the Struc­ture of Governance

belos26 May 2025 18:23 UTC
1 point
0 comments8 min readLW link
(bestofagreatlot.substack.com)

[Question] Does the Univer­sal Geom­e­try of Embed­dings pa­per have big im­pli­ca­tions for in­ter­pretabil­ity?

Evan R. Murphy26 May 2025 18:20 UTC
43 points
6 comments1 min readLW link

So­cratic Per­sua­sion: Giv­ing Opinionated Yet Truth-Seek­ing Advice

Neel Nanda26 May 2025 17:38 UTC
61 points
14 comments21 min readLW link
(www.neelnanda.io)

[Be­neath Psy­chol­ogy] Case study on chronic pain: First in­sights, and the re­main­ing challenge

jimmy26 May 2025 17:29 UTC
12 points
0 comments11 min readLW link

An ob­ser­va­tion on self-play

jonrxu26 May 2025 17:22 UTC
15 points
1 comment3 min readLW link

New web­site an­a­lyz­ing AI com­pa­nies’ model evals

Zach Stein-Perlman26 May 2025 16:00 UTC
58 points
0 comments4 min readLW link

New score­card eval­u­at­ing AI com­pa­nies on safety

Zach Stein-Perlman26 May 2025 16:00 UTC
72 points
8 comments1 min readLW link

[Question] Ask­ing for AI Safety Ca­reer Advice

infinibot2726 May 2025 15:26 UTC
3 points
1 comment1 min readLW link

Nerve Blisters: A Stoic Response

Jonathan Moregård26 May 2025 15:07 UTC
8 points
2 comments1 min readLW link
(honestliving.substack.com)

On ‘On Car­ing’

atharva26 May 2025 13:39 UTC
9 points
4 comments3 min readLW link

Claude 4 You: The Quest for Mun­dane Utility

Zvi26 May 2025 13:01 UTC
36 points
0 comments17 min readLW link
(thezvi.wordpress.com)

For­mal­iz­ing Embed­ded­ness Failures in Univer­sal Ar­tifi­cial Intelligence

Cole Wyeth26 May 2025 12:36 UTC
39 points
0 comments1 min readLW link
(arxiv.org)

Techies Wanted: How STEM Back­grounds Can Ad­vance Safe AI Policy

Daniel_Eth26 May 2025 11:29 UTC
16 points
0 comments29 min readLW link

D&D.Sci: The Choos­ing Ones [An­swerkey and Rule­set]

abstractapplic26 May 2025 9:43 UTC
19 points
2 comments3 min readLW link

The Sun­dog Align­ment The­o­rem: A Pro­posal for Em­bod­ied Align­ment via Indi­rect Inference

Malice26 May 2025 7:26 UTC
−9 points
0 comments3 min readLW link

Su­per­po­si­tion Without Com­pres­sion: Why En­tan­gled Rep­re­sen­ta­tions Are the Default

James Butterworth26 May 2025 5:26 UTC
3 points
2 comments1 min readLW link
(drive.google.com)

Seek­ing Feed­back: Toy Model of De­cep­tive Align­ment (Game The­ory)

Alex Boche26 May 2025 5:23 UTC
5 points
6 comments5 min readLW link

Long-form data bot­tle­necks might stall AI progress for years

Michelle_Ma26 May 2025 4:36 UTC
21 points
0 comments13 min readLW link

Ex­am­ple of Split­ting a PR

jefftk26 May 2025 2:20 UTC
28 points
0 comments2 min readLW link
(www.jefftk.com)

How I’m tel­ling my friends about AI Safety

k6425 May 2025 22:43 UTC
1 point
7 comments7 min readLW link

Good Writing

Adam Zerner25 May 2025 21:52 UTC
11 points
0 comments2 min readLW link
(paulgraham.com)

Con­sider buy­ing vot­ing shares

Hruss25 May 2025 18:01 UTC
2 points
3 comments1 min readLW link

[Question] Can you donate to AI ad­vo­cacy?

k6425 May 2025 17:54 UTC
17 points
4 comments1 min readLW link

Rant: the ex­treme waste­ful­ness of high rent prices

Knight Lee25 May 2025 17:04 UTC
−2 points
0 comments2 min readLW link

Beyond Democ­racy: A Sys­tem Where Ci­ti­zens Vote with Their Taxes

Brendan Golledge25 May 2025 17:00 UTC
−1 points
3 comments7 min readLW link

Claude 4 You: Safety and Alignment

Zvi25 May 2025 14:00 UTC
86 points
8 comments63 min readLW link
(thezvi.wordpress.com)

Align­ment Pro­posal: Ad­ver­sar­i­ally Ro­bust Aug­men­ta­tion and Distillation

25 May 2025 12:58 UTC
56 points
47 comments13 min readLW link

An open job ap­pli­ca­tion to AI labs

Hruss25 May 2025 12:57 UTC
17 points
0 comments1 min readLW link

Med­i­ta­tions on Doge

Martin Sustrik25 May 2025 12:00 UTC
131 points
44 comments9 min readLW link
(250bpm.substack.com)

Case Stud­ies in Si­mu­la­tors and Agents

25 May 2025 5:40 UTC
12 points
8 comments6 min readLW link

On safety of be­ing a moral pa­tient of ASI

Yaroslav Granowski24 May 2025 21:24 UTC
3 points
8 comments1 min readLW link

We Need a Baseline for LLM-Aided Experiments

J Bostock24 May 2025 20:52 UTC
11 points
1 comment1 min readLW link

Lie De­tec­tors. Tech­ni­cal solu­tions to the co­op­er­a­tion prob­lem.

Window Frame24 May 2025 20:05 UTC
6 points
0 comments10 min readLW link

It’s hard to make schem­ing evals look re­al­is­tic for LLMs

24 May 2025 19:17 UTC
150 points
29 comments5 min readLW link

Launch of the New Hori­zons Podcast

Nezir Alic24 May 2025 17:50 UTC
5 points
0 comments1 min readLW link

Prim­ing effects are fake, but fram­ing effects are real

Matrice Jacobine24 May 2025 10:54 UTC
33 points
0 comments1 min readLW link
(xphi.net)

The Cos­mic Lottery

James Stephen Brown24 May 2025 4:05 UTC
5 points
0 comments5 min readLW link
(nonzerosum.games)

Some Con­sid­er­a­tions on Pre­dic­tion Markets

belos24 May 2025 3:24 UTC
2 points
1 comment9 min readLW link

The Para­dox of Low Fertility

Zero Contradictions24 May 2025 0:59 UTC
−9 points
6 comments1 min readLW link
(expandingrationality.substack.com)

That’s Not How Epi­ge­netic Mod­ifi­ca­tions Work

johnswentworth24 May 2025 0:15 UTC
68 points
12 comments2 min readLW link

[Question] To what ex­tent is AI safety work try­ing to get AI to re­li­ably and safely do what the user asks vs. do what is best in some ul­ti­mate sense?

Jordan Arel23 May 2025 21:05 UTC
14 points
3 comments1 min readLW link

Notes on Claude 4 Sys­tem Card

Dentosal23 May 2025 15:23 UTC
19 points
2 comments6 min readLW link

What is empti­ness?

Vadim Golub23 May 2025 12:06 UTC
−4 points
11 comments9 min readLW link

Idiohobbies

dkl923 May 2025 6:38 UTC
11 points
2 comments1 min readLW link
(dkl9.net)

Qual­i­ta­tive Fit Testing

jefftk23 May 2025 2:50 UTC
10 points
0 comments2 min readLW link
(www.jefftk.com)

An­thropic is Quietly Backpedal­ling on its Safety Commitments

garrison23 May 2025 2:26 UTC
81 points
7 comments5 min readLW link
(www.obsolete.pub)