Let­ting Kids Be Kids

ZviMay 30, 2025, 10:50 AM
71 points
14 comments20 min readLW link
(thezvi.wordpress.com)

Re­gard­ing South Africa

ZviMay 16, 2025, 4:10 PM
71 points
5 comments11 min readLW link
(thezvi.wordpress.com)

Claude 4

Zach Stein-PerlmanMay 22, 2025, 5:00 PM
71 points
24 comments1 min readLW link
(www.anthropic.com)

Bet­ter Air Purifiers

jefftkMay 11, 2025, 4:50 PM
71 points
21 comments3 min readLW link
(www.jefftk.com)

Nega­tive Re­sults on Group SAEs

Josh EngelsMay 6, 2025, 9:49 PM
70 points
3 comments8 min readLW link

Ts­inghua pa­per: Does RL Really In­cen­tivize Rea­son­ing Ca­pac­ity in LLMs Beyond the Base Model?

Thomas KwaMay 5, 2025, 6:56 PM
68 points
21 comments2 min readLW link
(arxiv.org)

Learn­ing (more) from horse em­ploy­ment history

Tim HMay 23, 2025, 2:11 AM
68 points
13 comments5 min readLW link

Re­quiem for the hopes of a pre-AI world

Mitchell_PorterMay 27, 2025, 2:47 PM
68 points
0 comments3 min readLW link

That’s Not How Epi­ge­netic Mod­ifi­ca­tions Work

johnswentworthMay 24, 2025, 12:15 AM
67 points
12 comments2 min readLW link

What Does It Mean to “Write Like You Talk”?

Arjun PanicksseryMay 15, 2025, 9:49 AM
67 points
8 comments5 min readLW link
(arjunpanickssery.substack.com)

Work­ing through a small tiling result

James PayorMay 13, 2025, 8:28 PM
66 points
9 comments5 min readLW link

OpenAI Claims Non­profit Will Re­tain Nom­i­nal Control

ZviMay 7, 2025, 7:40 PM
65 points
4 comments11 min readLW link
(thezvi.wordpress.com)

In­ter­est In Con­flict Is In­stru­men­tally Convergent

ScrewtapeMay 9, 2025, 2:16 AM
65 points
58 comments10 min readLW link

CFAR is run­ning an ex­per­i­men­tal mini-work­shop (June 2-6, Berkeley CA)!

Davis_KingsleyMay 29, 2025, 10:02 PM
64 points
2 comments2 min readLW link

Be­ware the Mo­ral Homophone

ymeskhoutMay 27, 2025, 12:06 PM
63 points
4 comments9 min readLW link
(www.ymeskhout.com)

Se­men and Se­man­tics: Un­der­stand­ing Porn with Lan­guage Embeddings

future_detectiveMay 19, 2025, 3:39 PM
63 points
27 comments6 min readLW link
(github.com)

Things I Learned Mak­ing The SB-1047 Documentary

Michaël TrazziMay 12, 2025, 5:41 PM
63 points
2 comments2 min readLW link

Do you even have a sys­tem prompt? (PSA /​ repo)

CroissanthologyMay 29, 2025, 6:49 PM
62 points
48 comments2 min readLW link

Zucker­berg’s Dystopian AI Vision

ZviMay 6, 2025, 1:50 PM
61 points
7 comments11 min readLW link
(thezvi.wordpress.com)

In­cor­rect Baseline Eval­u­a­tions Call into Ques­tion Re­cent LLM-RL Claims

shash42May 29, 2025, 6:40 PM
61 points
5 comments1 min readLW link
(safe-lip-9a8.notion.site)

Out­comes of the Geopoli­ti­cal Singularity

Nikola JurkovicMay 20, 2025, 6:09 PM
61 points
5 comments5 min readLW link

OpenAI Pre­pared­ness Frame­work 2.0

ZviMay 2, 2025, 1:10 PM
60 points
1 comment23 min readLW link
(thezvi.wordpress.com)

Su­per­hu­man Coders in AI 2027 - Not So Fast

May 1, 2025, 6:56 PM
59 points
0 comments5 min readLW link

Why I am not a successionist

Nina PanicksseryMay 4, 2025, 7:08 PM
59 points
48 comments2 min readLW link
(ninapanickssery.substack.com)

Highly Opinionated Ad­vice on How to Write ML Papers

Neel NandaMay 12, 2025, 1:59 AM
59 points
4 comments32 min readLW link

Oc­to­ber The First Is Too Late

gwernMay 13, 2025, 9:45 PM
58 points
8 comments1 min readLW link
(gwern.net)

New web­site an­a­lyz­ing AI com­pa­nies’ model evals

Zach Stein-PerlmanMay 26, 2025, 4:00 PM
58 points
0 comments4 min readLW link

An al­ign­ment safety case sketch based on debate

May 8, 2025, 3:02 PM
57 points
19 comments25 min readLW link
(arxiv.org)

At­tend the 2025 Re­pro­duc­tive Fron­tiers Sum­mit, June 10-12

May 9, 2025, 5:17 AM
57 points
0 comments3 min readLW link

A widely shared AI pro­duc­tivity pa­per was re­tracted, is pos­si­bly fraudulent

titotalMay 19, 2025, 10:18 AM
56 points
4 commentsLW link

GPT-4o Sy­co­phancy Post Mortem

ZviMay 5, 2025, 4:00 PM
55 points
1 comment16 min readLW link
(thezvi.wordpress.com)

Or­phaned Poli­cies (Post 5 of 6 on AI Gover­nance)

Mass_DriverMay 29, 2025, 9:42 PM
54 points
3 comments16 min readLW link

Align­ment Pro­posal: Ad­ver­sar­i­ally Ro­bust Aug­men­ta­tion and Distillation

May 25, 2025, 12:58 PM
54 points
45 comments13 min readLW link

The Need for Poli­ti­cal Ad­ver­tis­ing (Post 2 of 6 on AI Gover­nance)

Mass_DriverMay 21, 2025, 12:44 AM
54 points
2 comments13 min readLW link

So­cratic Per­sua­sion: Giv­ing Opinionated Yet Truth-Seek­ing Advice

Neel NandaMay 26, 2025, 5:38 PM
53 points
12 comments21 min readLW link
(www.neelnanda.io)

PSA: Be­fore May 21 is a good time to sign up for cryonics

AlexMennenMay 4, 2025, 4:10 AM
53 points
0 comments1 min readLW link

LessWrong Feed [new, now in beta]

RubyMay 28, 2025, 7:01 PM
53 points
20 comments8 min readLW link

Cheaters Gonna Cheat Cheat Cheat Cheat Cheat

ZviMay 9, 2025, 2:30 PM
52 points
4 comments22 min readLW link
(thezvi.wordpress.com)

Man­age­ment is the Near Future

jefftkMay 17, 2025, 2:50 AM
52 points
10 comments2 min readLW link
(www.jefftk.com)

Shift Re­sources to Ad­vo­cacy Now (Post 4 of 6 on AI Gover­nance)

Mass_DriverMay 28, 2025, 1:19 AM
51 points
18 comments32 min readLW link

Amer­ica Makes AI Chip Diffu­sion Deal with UAE and KSA

ZviMay 19, 2025, 7:10 PM
51 points
7 comments27 min readLW link
(thezvi.wordpress.com)

Re­ward but­ton alignment

Steven ByrnesMay 22, 2025, 5:36 PM
50 points
15 comments12 min readLW link

Can We Nat­u­ral­ize Mo­ral Episte­mol­ogy?

tylermjohn 21 May 2025 14:25 UTC
50 points
22 comments6 min readLW link

Google Logo Li­ga­ture Bug

jefftk18 May 2025 2:40 UTC
49 points
7 comments1 min readLW link
(www.jefftk.com)

Google I/​O Day

Zvi21 May 2025 22:00 UTC
49 points
0 comments20 min readLW link
(thezvi.wordpress.com)

Prob­lems with in­struc­tion-fol­low­ing as an al­ign­ment target

Seth Herd15 May 2025 15:41 UTC
48 points
14 comments10 min readLW link

AI #116: If Any­one Builds It, Every­one Dies

Zvi15 May 2025 15:10 UTC
47 points
5 comments42 min readLW link
(thezvi.wordpress.com)

Re SMTM: nega­tive feed­back on nega­tive feedback

Steven Byrnes14 May 2025 19:50 UTC
46 points
1 comment22 min readLW link

D&D.Sci: The Choos­ing Ones

abstractapplic17 May 2025 15:26 UTC
46 points
16 comments1 min readLW link

Overview: AI Safety Outreach Grass­roots Orgs

4 May 2025 17:39 UTC
46 points
8 comments2 min readLW link