All Ra­tion­al­ists hate & sab­o­tage Strat­egy with­out hav­ing any aware­ness of it.

OxidizeMay 26, 2025, 10:09 PM
−27 points
8 comments7 min readLW link

Per­sonal Ru­mi­na­tions on AI’s Miss­ing Vari­able Problem

Thehumanproject.aiMay 26, 2025, 9:11 PM
1 point
0 comments3 min readLW link

Poetic Meth­ods II: Rhyme as a Fo­cus­ing Device

adamShimiMay 26, 2025, 6:29 PM
24 points
1 comment17 min readLW link
(formethods.substack.com)

Is Build­ing Good Note-Tak­ing Soft­ware an AGI-Com­plete Prob­lem?

Thane RuthenisMay 26, 2025, 6:26 PM
25 points
13 comments7 min readLW link

Prin­ci­pal-Agent Prob­lems and the Struc­ture of Governance

belosMay 26, 2025, 6:23 PM
1 point
0 comments8 min readLW link
(bestofagreatlot.substack.com)

[Question] Does the Univer­sal Geom­e­try of Embed­dings pa­per have big im­pli­ca­tions for in­ter­pretabil­ity?

Evan R. MurphyMay 26, 2025, 6:20 PM
42 points
3 comments1 min readLW link

So­cratic Per­sua­sion: Giv­ing Opinionated Yet Truth-Seek­ing Advice

Neel NandaMay 26, 2025, 5:38 PM
56 points
13 comments21 min readLW link
(www.neelnanda.io)

[Be­neath Psy­chol­ogy] Case study on chronic pain: First in­sights, and the re­main­ing challenge

jimmyMay 26, 2025, 5:29 PM
8 points
0 comments11 min readLW link

An ob­ser­va­tion on self-play

jonrxuMay 26, 2025, 5:22 PM
13 points
0 comments3 min readLW link

New web­site an­a­lyz­ing AI com­pa­nies’ model evals

Zach Stein-PerlmanMay 26, 2025, 4:00 PM
58 points
0 comments4 min readLW link

New score­card eval­u­at­ing AI com­pa­nies on safety

Zach Stein-PerlmanMay 26, 2025, 4:00 PM
72 points
8 comments1 min readLW link

[Question] Ask­ing for AI Safety Ca­reer Advice

infinibot27May 26, 2025, 3:26 PM
3 points
1 comment1 min readLW link

Nerve Blisters: A Stoic Response

Jonathan MoregårdMay 26, 2025, 3:07 PM
8 points
2 comments1 min readLW link
(honestliving.substack.com)

On ‘On Car­ing’

atharvaMay 26, 2025, 1:39 PM
8 points
4 comments3 min readLW link

Claude 4 You: The Quest for Mun­dane Utility

ZviMay 26, 2025, 1:01 PM
36 points
0 comments17 min readLW link
(thezvi.wordpress.com)

For­mal­iz­ing Embed­ded­ness Failures in Univer­sal Ar­tifi­cial Intelligence

Cole WyethMay 26, 2025, 12:36 PM
39 points
0 comments1 min readLW link
(arxiv.org)

Techies Wanted: How STEM Back­grounds Can Ad­vance Safe AI Policy

Daniel_EthMay 26, 2025, 11:29 AM
16 points
0 comments29 min readLW link

D&D.Sci: The Choos­ing Ones [An­swerkey and Rule­set]

abstractapplicMay 26, 2025, 9:43 AM
19 points
2 comments3 min readLW link

The Sun­dog Align­ment The­o­rem: A Pro­posal for Em­bod­ied Align­ment via Indi­rect Inference

MaliceMay 26, 2025, 7:26 AM
−9 points
0 comments3 min readLW link

Su­per­po­si­tion Without Com­pres­sion: Why En­tan­gled Rep­re­sen­ta­tions Are the Default

James ButterworthMay 26, 2025, 5:26 AM
3 points
2 comments1 min readLW link
(drive.google.com)

Seek­ing Feed­back: Toy Model of De­cep­tive Align­ment (Game The­ory)

Alex BocheMay 26, 2025, 5:23 AM
5 points
4 comments5 min readLW link

Long-form data bot­tle­necks might stall AI progress for years

Michelle_MaMay 26, 2025, 4:36 AM
18 points
0 comments13 min readLW link

Ex­am­ple of Split­ting a PR

jefftkMay 26, 2025, 2:20 AM
28 points
0 comments2 min readLW link
(www.jefftk.com)

How I’m tel­ling my friends about AI Safety

k64May 25, 2025, 10:43 PM
1 point
7 comments7 min readLW link

Good Writing

Adam ZernerMay 25, 2025, 9:52 PM
11 points
0 comments2 min readLW link
(paulgraham.com)

Con­sider buy­ing vot­ing shares

HrussMay 25, 2025, 6:01 PM
2 points
3 comments1 min readLW link

[Question] Can you donate to AI ad­vo­cacy?

k64May 25, 2025, 5:54 PM
17 points
4 comments1 min readLW link

Rant: the ex­treme waste­ful­ness of high rent prices

Knight LeeMay 25, 2025, 5:04 PM
−6 points
0 comments2 min readLW link

Beyond Democ­racy: A Sys­tem Where Ci­ti­zens Vote with Their Taxes

Brendan GolledgeMay 25, 2025, 5:00 PM
−1 points
3 comments7 min readLW link

Claude 4 You: Safety and Alignment

ZviMay 25, 2025, 2:00 PM
86 points
8 comments63 min readLW link
(thezvi.wordpress.com)

Align­ment Pro­posal: Ad­ver­sar­i­ally Ro­bust Aug­men­ta­tion and Distillation

May 25, 2025, 12:58 PM
54 points
47 comments13 min readLW link

An open job ap­pli­ca­tion to AI labs

HrussMay 25, 2025, 12:57 PM
15 points
0 comments1 min readLW link

Med­i­ta­tions on Doge

Martin SustrikMay 25, 2025, 12:00 PM
129 points
44 comments9 min readLW link
(250bpm.substack.com)

Case Stud­ies in Si­mu­la­tors and Agents

May 25, 2025, 5:40 AM
11 points
8 comments6 min readLW link

On safety of be­ing a moral pa­tient of ASI

Yaroslav GranowskiMay 24, 2025, 9:24 PM
3 points
8 comments1 min readLW link

We Need a Baseline for LLM-Aided Experiments

J BostockMay 24, 2025, 8:52 PM
11 points
1 comment1 min readLW link

Lie De­tec­tors. Tech­ni­cal solu­tions to the co­op­er­a­tion prob­lem.

Window FrameMay 24, 2025, 8:05 PM
6 points
0 comments10 min readLW link

It’s hard to make schem­ing evals look re­al­is­tic for LLMs

May 24, 2025, 7:17 PM
141 points
27 comments5 min readLW link

Launch of the New Hori­zons Podcast

Nezir AlicMay 24, 2025, 5:50 PM
5 points
0 comments1 min readLW link

Prim­ing effects are fake, but fram­ing effects are real

Matrice JacobineMay 24, 2025, 10:54 AM
32 points
0 comments1 min readLW link
(xphi.net)

The Cos­mic Lottery

James Stephen BrownMay 24, 2025, 4:05 AM
5 points
0 comments5 min readLW link
(nonzerosum.games)

Some Con­sid­er­a­tions on Pre­dic­tion Markets

belosMay 24, 2025, 3:24 AM
2 points
0 comments9 min readLW link

The Para­dox of Low Fertility

Zero ContradictionsMay 24, 2025, 12:59 AM
−1 points
6 comments1 min readLW link
(expandingrationality.substack.com)

That’s Not How Epi­ge­netic Mod­ifi­ca­tions Work

johnswentworthMay 24, 2025, 12:15 AM
67 points
12 comments2 min readLW link

[Question] To what ex­tent is AI safety work try­ing to get AI to re­li­ably and safely do what the user asks vs. do what is best in some ul­ti­mate sense?

Jordan ArelMay 23, 2025, 9:05 PM
14 points
3 comments1 min readLW link

De­fault his­tory is dead wrong

kilgoarMay 23, 2025, 4:31 PM
−20 points
11 comments1 min readLW link

Notes on Claude 4 Sys­tem Card

DentosalMay 23, 2025, 3:23 PM
19 points
2 comments6 min readLW link

What is empti­ness?

Vadim GolubMay 23, 2025, 12:06 PM
−4 points
11 comments9 min readLW link

Idiohobbies

dkl9May 23, 2025, 6:38 AM
11 points
2 comments1 min readLW link
(dkl9.net)

Qual­i­ta­tive Fit Testing

jefftkMay 23, 2025, 2:50 AM
10 points
0 comments2 min readLW link
(www.jefftk.com)