How I’m tel­ling my friends about AI Safety

k6425 May 2025 22:43 UTC
1 point
7 comments7 min readLW link

Good Writing

Adam Zerner25 May 2025 21:52 UTC
11 points
0 comments2 min readLW link
(paulgraham.com)

Con­sider buy­ing vot­ing shares

Hruss25 May 2025 18:01 UTC
2 points
3 comments1 min readLW link

[Question] Can you donate to AI ad­vo­cacy?

k6425 May 2025 17:54 UTC
17 points
4 comments1 min readLW link

Rant: the ex­treme waste­ful­ness of high rent prices

Knight Lee25 May 2025 17:04 UTC
−2 points
0 comments2 min readLW link

Beyond Democ­racy: A Sys­tem Where Ci­ti­zens Vote with Their Taxes

Brendan Golledge25 May 2025 17:00 UTC
−1 points
3 comments7 min readLW link

Claude 4 You: Safety and Alignment

Zvi25 May 2025 14:00 UTC
86 points
8 comments63 min readLW link
(thezvi.wordpress.com)

Align­ment Pro­posal: Ad­ver­sar­i­ally Ro­bust Aug­men­ta­tion and Distillation

25 May 2025 12:58 UTC
56 points
47 comments13 min readLW link

An open job ap­pli­ca­tion to AI labs

Hruss25 May 2025 12:57 UTC
17 points
0 comments1 min readLW link

Med­i­ta­tions on Doge

Martin Sustrik25 May 2025 12:00 UTC
131 points
44 comments9 min readLW link
(250bpm.substack.com)

Case Stud­ies in Si­mu­la­tors and Agents

25 May 2025 5:40 UTC
12 points
8 comments6 min readLW link

On safety of be­ing a moral pa­tient of ASI

Yaroslav Granowski24 May 2025 21:24 UTC
3 points
8 comments1 min readLW link

We Need a Baseline for LLM-Aided Experiments

J Bostock24 May 2025 20:52 UTC
11 points
1 comment1 min readLW link

Lie De­tec­tors. Tech­ni­cal solu­tions to the co­op­er­a­tion prob­lem.

Window Frame24 May 2025 20:05 UTC
6 points
0 comments10 min readLW link

It’s hard to make schem­ing evals look re­al­is­tic for LLMs

24 May 2025 19:17 UTC
150 points
29 comments5 min readLW link

Launch of the New Hori­zons Podcast

Nezir Alic24 May 2025 17:50 UTC
5 points
0 comments1 min readLW link

Prim­ing effects are fake, but fram­ing effects are real

Matrice Jacobine24 May 2025 10:54 UTC
33 points
0 comments1 min readLW link
(xphi.net)

The Cos­mic Lottery

James Stephen Brown24 May 2025 4:05 UTC
5 points
0 comments5 min readLW link
(nonzerosum.games)

Some Con­sid­er­a­tions on Pre­dic­tion Markets

belos24 May 2025 3:24 UTC
2 points
1 comment9 min readLW link

The Para­dox of Low Fertility

Zero Contradictions24 May 2025 0:59 UTC
−9 points
6 comments1 min readLW link
(expandingrationality.substack.com)

That’s Not How Epi­ge­netic Mod­ifi­ca­tions Work

johnswentworth24 May 2025 0:15 UTC
68 points
12 comments2 min readLW link

[Question] To what ex­tent is AI safety work try­ing to get AI to re­li­ably and safely do what the user asks vs. do what is best in some ul­ti­mate sense?

Jordan Arel23 May 2025 21:05 UTC
14 points
3 comments1 min readLW link

Notes on Claude 4 Sys­tem Card

Dentosal23 May 2025 15:23 UTC
19 points
2 comments6 min readLW link

What is empti­ness?

Vadim Golub23 May 2025 12:06 UTC
−4 points
11 comments9 min readLW link

Idiohobbies

dkl923 May 2025 6:38 UTC
11 points
2 comments1 min readLW link
(dkl9.net)

Qual­i­ta­tive Fit Testing

jefftk23 May 2025 2:50 UTC
10 points
0 comments2 min readLW link
(www.jefftk.com)

An­thropic is Quietly Backpedal­ling on its Safety Commitments

garrison23 May 2025 2:26 UTC
81 points
7 comments5 min readLW link
(www.obsolete.pub)

Learn­ing (more) from horse em­ploy­ment history

Tim H23 May 2025 2:11 UTC
68 points
13 comments5 min readLW link

Schizobench: Doc­u­ment­ing Mag­i­cal-Think­ing Be­hav­ior in Claude 4 Opus

viemccoy23 May 2025 1:31 UTC
23 points
0 comments1 min readLW link
(metanomicon.ink)

Post-Man­i­fest cowork­ing at Mox

23 May 2025 0:20 UTC
4 points
1 comment1 min readLW link

Claude 4, Op­por­tunis­tic Black­mail, and “Pleas”

Stephen Martin22 May 2025 19:59 UTC
30 points
2 comments2 min readLW link

Prob­lems in AI Align­ment: A Scale Model

Mickey Muldoon22 May 2025 19:22 UTC
−1 points
3 comments2 min readLW link
(muldoon.cloud)

Art Is Art: AI Is the Next Erotica

Charlie Edwards22 May 2025 18:04 UTC
0 points
1 comment14 min readLW link

Re­ward but­ton alignment

Steven Byrnes22 May 2025 17:36 UTC
50 points
15 comments12 min readLW link

We’re Not Ad­ver­tis­ing Enough (Post 3 of 7 on AI Gover­nance)

Mass_Driver22 May 2025 17:05 UTC
110 points
10 comments28 min readLW link

Claude 4

Zach Stein-Perlman22 May 2025 17:00 UTC
71 points
24 comments1 min readLW link
(www.anthropic.com)

Video and tran­script of talk on AI welfare

Joe Carlsmith22 May 2025 16:15 UTC
24 points
1 comment28 min readLW link
(joecarlsmith.substack.com)

What we can learn from af­ter­life myths

jchan22 May 2025 15:49 UTC
5 points
0 comments15 min readLW link

Policy recom­men­da­tions re­gard­ing re­pro­duc­tive technology

TsviBT22 May 2025 14:49 UTC
76 points
2 comments3 min readLW link

AI #117: OpenAI Buys De­vice Maker IO

Zvi22 May 2025 13:40 UTC
37 points
9 comments62 min readLW link
(thezvi.wordpress.com)

Does BPC-157 work for heal­ing and tis­sue re­pair?

ChristianKl22 May 2025 13:18 UTC
17 points
0 comments5 min readLW link
(somaticsignals.jollyjoyjourney.com)

[Question] How load-bear­ing is KL di­ver­gence from a known-good base model in mod­ern RL?

faul_sname22 May 2025 12:08 UTC
22 points
3 comments4 min readLW link

Chris­ti­an­ity vs. Tantra vs. Sex – one spiritual path?

pchvykov22 May 2025 11:15 UTC
−2 points
0 comments24 min readLW link

Mir­ror Or­ganisms Are Not Im­mune to Predation

Matt Dellago22 May 2025 11:10 UTC
27 points
5 comments1 min readLW link

How 2025 AI Fore­casts Fared So Far

22 May 2025 9:42 UTC
11 points
2 comments8 min readLW link
(theaidigest.org)

Con­tain and ver­ify: The endgame of US-China AI competition

sjadler22 May 2025 8:13 UTC
6 points
7 comments2 min readLW link
(open.substack.com)

Laugencroissant

Martin Sustrik22 May 2025 6:30 UTC
13 points
0 comments3 min readLW link
(250bpm.substack.com)

Google I/​O Day

Zvi21 May 2025 22:00 UTC
49 points
0 comments20 min readLW link
(thezvi.wordpress.com)

Pod­cast: How not to waste a billion dol­lars (on your clini­cal trial), with Meri Beck­with on Devel­op­ment & Research

rossry21 May 2025 21:27 UTC
25 points
0 comments3 min readLW link
(developmentandresearch.bio)

Pod­cast: From molecule to medicine, with Ross Rhe­in­gans-Yoo on Com­plex Systems

rossry21 May 2025 21:08 UTC
15 points
0 comments5 min readLW link
(www.complexsystemspodcast.com)