If I Were Em­peror of New AI Safety Re­searcher Train­ing...

Lorxus20 May 2026 23:10 UTC
21 points
3 comments8 min readLW link
(tiled-with-pentagons.blogspot.com)

the­ory up­lift differ­en­tially benefits safety & is underleveraged

yudhister20 May 2026 21:43 UTC
133 points
14 comments1 min readLW link

Sin­gu­lar Learn­ing The­ory Com­pre­hen­sive − 1

Agastya Agrawal20 May 2026 20:00 UTC
35 points
1 comment12 min readLW link

Sparse Effi­ciency vs. Su­per­po­si­tion: The In­ter­pretabil­ity Tradeoff

hillz20 May 2026 19:14 UTC
8 points
0 comments1 min readLW link

The Case for Eval­u­at­ing Model Behaviors

jsteinhardt20 May 2026 18:42 UTC
40 points
3 comments3 min readLW link

Toward In­ter­op­er­abil­ity of Min­i­mal Programs

johnswentworth20 May 2026 18:37 UTC
67 points
13 comments3 min readLW link

Fun­da­men­tal Uncer­tainty $2,000 Es­say Contest

Gordon Seidoh Worley20 May 2026 15:20 UTC
25 points
4 comments5 min readLW link
(www.uncertainupdates.com)

Syn­thetic Per­sona Pre­train­ing: Align­ment from To­ken Zero

20 May 2026 14:16 UTC
112 points
26 comments17 min readLW link

Give my chil­dren minds

momom220 May 2026 14:14 UTC
7 points
1 comment1 min readLW link

Check out my tech­nolog­i­cal up­lift­ing, civ­i­liza­tion-build­ing, and sci­ence in a magic world fic­tion!

Jens Brandt20 May 2026 12:30 UTC
6 points
0 comments1 min readLW link

Power-seek­ing agents will likely be developed

Alec Harris20 May 2026 9:26 UTC
42 points
0 comments4 min readLW link

Ap­ply now to Hu­man-Aligned AI Sum­mer School 2026

20 May 2026 8:44 UTC
13 points
0 comments1 min readLW link
(humanaligned.ai)

From 8B to Fron­tier: How Sys­tem Prompts Con­trol Whether AI Agents Black­mail, Leak, and Kill

Chijioke Ugwuanyi20 May 2026 8:28 UTC
15 points
2 comments19 min readLW link

If AI is nor­mal tech­nol­ogy, his­tory is not re­as­sur­ing.

Davidmanheim20 May 2026 7:21 UTC
59 points
28 comments6 min readLW link

Pythagorean addition

kqr20 May 2026 7:13 UTC
32 points
4 comments3 min readLW link
(entropicthoughts.com)

So you don’t want ev­ery­body to die

Rattengift20 May 2026 5:10 UTC
−20 points
10 comments6 min readLW link

Tem­po­ral Pro­por­tional Representation

thomascolthurst20 May 2026 1:39 UTC
10 points
9 comments3 min readLW link

Con­clave 1492

Vaniver19 May 2026 23:44 UTC
72 points
7 comments1 min readLW link

Child­hood And Ed­u­ca­tion #19: Let­ting Kids Be Kids #2

Zvi19 May 2026 22:20 UTC
21 points
1 comment12 min readLW link
(thezvi.wordpress.com)

Im­pli­ca­tions Of Pre­dict­ing The Next Token

jdp19 May 2026 22:17 UTC
108 points
6 comments31 min readLW link
(minihf.com)

Which goals ac­tu­ally mo­ti­vate de­cep­tive al­ign­ment?

19 May 2026 21:53 UTC
25 points
0 comments10 min readLW link

Hous­ing Roundup #15: The War Against Renters

Zvi19 May 2026 21:40 UTC
19 points
1 comment14 min readLW link
(thezvi.wordpress.com)

Leav­ing DCA to the North on Foot

jefftk19 May 2026 20:30 UTC
19 points
0 comments1 min readLW link
(www.jefftk.com)

A Vi­sual Guide to Nat­u­ral Latents

Alfred Harwood19 May 2026 19:10 UTC
56 points
0 comments18 min readLW link

Hu­mans are not au­to­mat­i­cally strate­gic — “in­ner work” edition

Chris Lakin19 May 2026 18:37 UTC
36 points
0 comments1 min readLW link

[We­bi­nar]: How close is AI to tak­ing my job? (And what the bench­marks aren’t tel­ling us)

Schizoid Rentoid19 May 2026 17:43 UTC
2 points
0 comments1 min readLW link

We Need to Get Se­ri­ous about Uplift Studies

19 May 2026 17:21 UTC
23 points
0 comments5 min readLW link

Brain Struc­ture and IQ: How Myelin Ele­vates Intelligence

Shiva's Right Foot19 May 2026 14:13 UTC
57 points
7 comments12 min readLW link

Seal­ing Con­di­tional Misal­ign­ment in Inoc­u­la­tion Prompt­ing with Con­sis­tency Training

19 May 2026 13:55 UTC
44 points
7 comments6 min readLW link

Let’s have more par­tial in­sid­ers.

Cleo Nardo19 May 2026 7:24 UTC
15 points
0 comments2 min readLW link

Roadmap through AI safety pro­grams for early-ca­reer tech­ni­cal researchers

Mikhail Mironov19 May 2026 3:45 UTC
17 points
5 comments5 min readLW link

When Fluency Is Free

mcawesome19 May 2026 3:05 UTC
7 points
2 comments1 min readLW link

The an­thropic ar­gu­ment against the ex­is­tence of God.

usrnmtaken19 May 2026 3:05 UTC
−10 points
1 comment6 min readLW link

Should Ra­tion­al­ists Looks­maxx?

albertcai19 May 2026 3:03 UTC
9 points
2 comments6 min readLW link
(albertjcai.substack.com)

AI emo­tions and al­igned behavior

lisunshiny19 May 2026 3:02 UTC
9 points
0 comments5 min readLW link
(liannsun.com)

Track­ing Difficulty with Fea­ture Portfolios

19 May 2026 2:25 UTC
22 points
0 comments5 min readLW link

Out­siders should fo­cus on specs/​con­sti­tu­tions (among other things)

Cleo Nardo19 May 2026 1:04 UTC
4 points
5 comments2 min readLW link

Log­i­cal Share Split­ting for Intuitionists

DaemonicSigil19 May 2026 0:42 UTC
19 points
9 comments5 min readLW link
(notoneunusualthing.substack.com)

Co­or­di­nal: A Post­mortem.

Ronak_Mehta18 May 2026 20:43 UTC
37 points
3 comments4 min readLW link
(ronakrm.github.io)

Notic­ing Con­fu­sion: A prac­tice in stay­ing cu­ri­ous

vmehra18 May 2026 19:31 UTC
10 points
1 comment6 min readLW link

Dat­ing Roundup #12: Sex and Violence

Zvi18 May 2026 19:20 UTC
28 points
1 comment27 min readLW link
(thezvi.wordpress.com)

Ne­ga­tion Ne­glect: When mod­els fail to learn nega­tions in training

18 May 2026 18:37 UTC
119 points
37 comments8 min readLW link

So are you some kind of com­mu­nist?

jchan18 May 2026 15:53 UTC
5 points
1 comment3 min readLW link

Thoughts on in­ter­view­ing can­di­dates for AI safety fellowships

beyarkay (Boyd Kane)18 May 2026 15:28 UTC
36 points
4 comments7 min readLW link
(boydkane.com)

PauseAI Mu­nich Lo­cal Group Kickoff

mofeien18 May 2026 15:13 UTC
3 points
0 comments1 min readLW link

Clas­sifier Con­text Rot: Mon­i­tor Perfor­mance De­grades with Con­text Length

18 May 2026 14:05 UTC
54 points
1 comment4 min readLW link

How use­ful is cross-do­main gen­er­al­iza­tion for train­ing LLM mon­i­tors?

18 May 2026 13:52 UTC
21 points
0 comments4 min readLW link

Jhana Quick Start Guide

Zmavli Caimle18 May 2026 8:51 UTC
15 points
3 comments11 min readLW link

Links #1: 2026/​05 Part 1

papetoast18 May 2026 5:04 UTC
10 points
0 comments18 min readLW link

why pol­len aller­gies?

bhauth18 May 2026 4:44 UTC
33 points
6 comments6 min readLW link
(www.bhauth.com)