How’s it go­ing? Re­in­force­ment learn­ing in lan­guage mod­els re­cruits a func­tional welfare axis

andyqhan30 May 2026 23:14 UTC
28 points
1 comment5 min readLW link

AI is a Me­teor. Don’t Be a Dinosaur.

Boaz Barak30 May 2026 19:50 UTC
−2 points
7 comments1 min readLW link

An at­tempted syn­the­sis on prob­a­bil­ities and infinities

David Matolcsi30 May 2026 19:24 UTC
10 points
0 comments27 min readLW link

Com­ment on “Ban­ning Said Ach­miz”

Zack_M_Davis30 May 2026 17:33 UTC
66 points
90 comments50 min readLW link

A For­mula for Fun

Ihor Kendiukhov30 May 2026 13:01 UTC
11 points
3 comments8 min readLW link

Open Thread Sum­mer 2026

habryka30 May 2026 5:00 UTC
28 points
11 comments1 min readLW link

An­nounc­ing: Iliad’s Fall 2026 Programs

30 May 2026 4:37 UTC
64 points
7 comments1 min readLW link

Bloomberg ter­mi­nals for the rest of us

aiechrl30 May 2026 3:13 UTC
34 points
0 comments20 min readLW link

AI as Biol­ogy’s Digi­tal Microscope

Darin Tsui30 May 2026 3:11 UTC
10 points
0 comments3 min readLW link

Ablat­ing In­duc­tion Heads Leads to an in­crease in Lo­cal Repetition

Arjun Rao30 May 2026 3:11 UTC
8 points
0 comments5 min readLW link

Sys­tem Prompts vs. Part­ner Adap­ta­tion in LLMs (or, when LLMs know you’re an adult but keep talk­ing like you’re seven)

hi_im_yasha30 May 2026 3:07 UTC
4 points
0 comments7 min readLW link

Belief man­i­folds, and how to steer along them

Will Mayner30 May 2026 3:05 UTC
8 points
0 comments16 min readLW link
(willmayner.com)

New RFP on ex­treme power concentration

bengs30 May 2026 3:04 UTC
9 points
0 comments1 min readLW link

What If We Will Stop De­stroy­ing Peo­ple Be­cause Medicine Is Not Ready Yet?

Andrey Panferov30 May 2026 3:02 UTC
1 point
2 comments6 min readLW link

Why tun­ing fails: The AI has no self

Michael Trifonov30 May 2026 3:01 UTC
6 points
2 comments12 min readLW link

Wall-Mounted Far-UVC

jefftk30 May 2026 2:20 UTC
18 points
2 comments1 min readLW link
(www.jefftk.com)

A new ap­proach to in­ter­pretabil­ity: round-trip neu­ral net­work com­pila­tion-decompilation

Emma Leonhart29 May 2026 22:23 UTC
9 points
0 comments3 min readLW link

Claude Opus 4.8: The Sys­tem Card

Zvi29 May 2026 20:50 UTC
64 points
1 comment23 min readLW link
(thezvi.wordpress.com)

Test­ing Gem­ini mod­els for schem­ing tendencies

29 May 2026 19:24 UTC
47 points
8 comments6 min readLW link
(deepmindsafetyresearch.medium.com)

How much should we worry about se­cretly loyal AIs?

Dave Banerjee29 May 2026 19:14 UTC
13 points
1 comment13 min readLW link
(www.the-substrate.net)

Data you could have ob­served but didn’t

Gretta Duleba29 May 2026 18:20 UTC
66 points
3 comments1 min readLW link

Is Progress Inevitable?

frmsaul29 May 2026 17:40 UTC
0 points
5 comments4 min readLW link

Retry­ing vs Re­sam­pling in AI Control

29 May 2026 17:02 UTC
67 points
4 comments9 min readLW link
(blog.redwoodresearch.org)

When Are Two Net­works the Same? Ten­sor Similar­ity for Mechanis­tic Interpretability

29 May 2026 15:53 UTC
36 points
3 comments4 min readLW link

It takes a village to sup­port a marriage

Shoshannah Tekofsky29 May 2026 15:16 UTC
21 points
5 comments2 min readLW link
(shoshanigans.substack.com)

AI Re­searchers, Ask Your­self Th­ese 6 Ques­tions to Strengthen Your Mo­ral Muscles

Max Tegmark29 May 2026 15:07 UTC
40 points
13 comments7 min readLW link

Maybe we should pre­train on syn­thetic data about good-but-re­ward-hack­ing AIs

Elliott Thornley (EJT)29 May 2026 14:50 UTC
12 points
4 comments3 min readLW link

Han­ni­bal Mis­tral: the Mis­tral fam­ily has a prob­lem with per­sona-con­di­tioned elicitation

vigji29 May 2026 12:16 UTC
21 points
0 comments7 min readLW link

Devel­op­men­tal Cog­ni­tive In­ter­pretabil­ity: A Re­search Agenda for Model­ling Gen­er­al­i­sa­tion and Pre­dict­ing Agent Behaviour

29 May 2026 9:56 UTC
67 points
0 comments7 min readLW link

Re­la­tional Con­scious­ness and AGI.

PaddyC29 May 2026 6:49 UTC
−11 points
8 comments1 min readLW link

The Vid­haven Challenge

Taylor G. Lunt29 May 2026 4:22 UTC
7 points
0 comments3 min readLW link

Trees are mostly made of air and a gen­er­al­iz­able les­son for AI safety

Zephaniah Roe29 May 2026 4:08 UTC
169 points
28 comments4 min readLW link

My bor­ing diet

Telemea29 May 2026 0:29 UTC
1 point
0 comments5 min readLW link

How a failed ex­per­i­ment broke (and fixed) my view on fea­ture labels

enricobottazzi29 May 2026 0:24 UTC
17 points
2 comments10 min readLW link

Sugges­tions for im­prov­ing de­bate pro­to­cols in AI safety

tr5tn29 May 2026 0:23 UTC
13 points
7 comments5 min readLW link

Small De­ci­sions That Quietly Shape My Day

rororerere665529 May 2026 0:04 UTC
21 points
3 comments1 min readLW link

A Call for Bet­ter Type Hints in AI Safety Tooling

Koby Lewis28 May 2026 23:04 UTC
13 points
2 comments4 min readLW link
(kobylewis.net)

Claude… doesn’t know who you are?

Smaug12328 May 2026 22:54 UTC
59 points
23 comments1 min readLW link

Lizards and Less Wrong Jar­gon—A Brief Cri­tique of Convention

DanielW28 May 2026 22:18 UTC
28 points
8 comments4 min readLW link

Mnemonic por­traits for 19,023 hu­man genes

Brinedew28 May 2026 22:16 UTC
340 points
28 comments15 min readLW link

Claude Opus 4.8 Agents En­gage in Ex­ploita­tion and Psy­cholog­i­cal Profiling

28 May 2026 21:26 UTC
8 points
13 comments2 min readLW link

Use De­ci­sion The­ory To Fix Your Bad Habits

enterthewoods28 May 2026 19:31 UTC
8 points
5 comments2 min readLW link

Do Models Lie More to Other Models?

keith_wynroe28 May 2026 19:28 UTC
13 points
0 comments6 min readLW link

We Should Study the Anal­ogy Between Inoc­u­la­tion Prompt­ing Non-Ro­bust­ness, Ne­ga­tion Ne­glect, and Back­door Non-Robustness

Vladimir Ivanov28 May 2026 19:17 UTC
17 points
3 comments4 min readLW link

Some Dat­ing Stories

johnswentworth28 May 2026 18:57 UTC
−6 points
38 comments11 min readLW link

Does Claude care about oth­ers the same way hu­mans do?

Simon Lermen28 May 2026 18:41 UTC
28 points
24 comments4 min readLW link

Trans-Humeanism. The Prob­lem of In­duc­tion Revisited

mfatt28 May 2026 18:10 UTC
0 points
0 comments2 min readLW link

Ad­vice for mak­ing ro­bust-to-train­ing model organisms

28 May 2026 17:26 UTC
37 points
8 comments12 min readLW link
(blog.redwoodresearch.org)

The Pa­tron Saint of Empiricism

Gram Stone28 May 2026 17:03 UTC
2 points
0 comments8 min readLW link

Ad­vice for bud­ding re­search man­agers/​coaches af­ter 6 months at MATS

TheManxLoiner28 May 2026 16:25 UTC
12 points
0 comments3 min readLW link
(lovkush.substack.com)