An­thropic Is Tak­ing AI Welfare Se­ri­ously. I’m Not Sure It Knows What It’s Mea­sur­ing.

Failfinder7013 Jun 2026 20:54 UTC
−1 points
4 comments3 min readLW link

A cheap spe­cial­ist judge gets used by agents but fails to re­duce al­ign­ment au­dit costs

burnssa13 Jun 2026 20:38 UTC
8 points
0 comments8 min readLW link

What is a game?

Isaac Newton13 Jun 2026 19:51 UTC
2 points
2 comments8 min readLW link
(archimedeanmonoid.substack.com)

Amer­i­can Govern­ment Takes Down Claude Fable

Zvi13 Jun 2026 19:40 UTC
111 points
13 comments20 min readLW link
(thezvi.wordpress.com)

Not tel­ling is lying

Fernand013 Jun 2026 18:12 UTC
10 points
16 comments3 min readLW link

A sim­ple ar­gu­ment for try­ing less hard

Elias Schmied13 Jun 2026 18:12 UTC
13 points
3 comments3 min readLW link

How might con­tinual learn­ing af­fect safety and al­ign­ment?

13 Jun 2026 17:34 UTC
59 points
2 comments16 min readLW link

Pre­sent­ful­ness: Lu­cidity, Os­mo­sis, and Dissociation

Astrid Callender13 Jun 2026 17:21 UTC
4 points
2 comments5 min readLW link

How to Suffer Less

Gordon Seidoh Worley13 Jun 2026 17:10 UTC
19 points
4 comments6 min readLW link
(www.uncertainupdates.com)

Some­what Con­tra Ted Chi­ang on AI Consciousness

ThomasJ13 Jun 2026 16:49 UTC
8 points
0 comments10 min readLW link

The term “AGI” is al­most use­less at this point [Linkpost]

Noosphere8913 Jun 2026 16:15 UTC
30 points
1 comment5 min readLW link
(helentoner.substack.com)

SFT Drives Gem­ini’s Safety Properties

13 Jun 2026 15:31 UTC
69 points
3 comments1 min readLW link

Why not take the AI fight to the ground?

less_raichu13 Jun 2026 15:04 UTC
8 points
5 comments1 min readLW link

AML for AI as a ver­ifi­ca­tion mechanism

MarkelKori13 Jun 2026 11:59 UTC
9 points
2 comments2 min readLW link

Pul­ling he­do­nic util­i­tar­i­anism out of eth­i­cal emotivism

Bill Jackson13 Jun 2026 11:50 UTC
6 points
2 comments6 min readLW link
(billjackson7.substack.com)

Te­quila Sun­set at the Hog’s Head (A Scene)

Ben Pace13 Jun 2026 6:53 UTC
22 points
1 comment5 min readLW link

US gov­ern­ment di­rec­tive to sus­pend ac­cess to Fable 5 and Mythos 5

Capybasilisk13 Jun 2026 1:16 UTC
67 points
15 comments1 min readLW link
(www.anthropic.com)

Do we learn less from our de­ci­sions than we think we do?

QuietCalibration13 Jun 2026 1:05 UTC
5 points
0 comments1 min readLW link

Ex­plo­ra­tion of a DNA Se­quenc­ing Base­caller us­ing Ac­ti­va­tion Patching

Madeleine L13 Jun 2026 0:58 UTC
3 points
0 comments6 min readLW link

Sandy Blvd as an ex­am­ple of complexity

Adam Zerner13 Jun 2026 0:28 UTC
20 points
0 comments2 min readLW link

Short Timelines Fa­vor Con­trol, Long Timelines Fa­vor In­fras­truc­ture Security

Jannis13 Jun 2026 0:12 UTC
7 points
0 comments3 min readLW link

Cat aller­gies & Cavities

Etha13 Jun 2026 0:11 UTC
6 points
1 comment2 min readLW link

When Emo­tion De­scrip­tors Fail: AI-Na­tive Func­tions of Emo­tion Vectors

CandidLind12 Jun 2026 23:20 UTC
8 points
0 comments27 min readLW link

A Gen­er­ated Web

Klemen12 Jun 2026 23:09 UTC
3 points
0 comments3 min readLW link

The Quest To Find The Next Big Com­mu­ni­ca­tors In AI Safety

Akshyae Singh12 Jun 2026 20:17 UTC
17 points
3 comments6 min readLW link

Up­dates on perfor­ma­tive misalignment

12 Jun 2026 20:15 UTC
16 points
0 comments12 min readLW link

Statis­ti­cal Physics for Am­bi­tious In­ter­pretabil­ity: A Work­shop Retrospective

12 Jun 2026 20:01 UTC
4 points
0 comments6 min readLW link

Cal­ibrat­ing Ac­ti­va­tion Vec­tors us­ing Norm

Kamesh R12 Jun 2026 19:59 UTC
1 point
0 comments3 min readLW link

Claude Fable 5 and Mythos 5: The Sys­tem Card

Zvi12 Jun 2026 18:50 UTC
48 points
1 comment29 min readLW link
(thezvi.wordpress.com)

What’s Con­tinual Learn­ing, and Why Might We Ex­pect To See It In Ad­vanced LLM Agents?

12 Jun 2026 18:43 UTC
28 points
2 comments17 min readLW link

Im­pli­ca­tions of Con­tinual Learn­ing for LLM Agents: Introduction

12 Jun 2026 18:36 UTC
46 points
0 comments6 min readLW link

Sur­plus: for mas­sive pub­lic good

Austin Chen12 Jun 2026 18:10 UTC
11 points
0 comments4 min readLW link
(surplus.dev)

Re­ward Hack­ing at the 1937 World’s Fair

frmsaul12 Jun 2026 17:47 UTC
36 points
14 comments3 min readLW link

Bunk in AF

Fernand012 Jun 2026 17:41 UTC
6 points
0 comments1 min readLW link

Build­ing and eval­u­at­ing model diffing agents

12 Jun 2026 17:14 UTC
61 points
2 comments12 min readLW link

Ra­tional An­i­ma­tions is a 501(c)(3) non­profit and is look­ing for board members

Writer12 Jun 2026 16:47 UTC
7 points
0 comments2 min readLW link

“AF needs em­piri­cal ground­ing” is a mean­ingless valley of compromise

Fernand012 Jun 2026 16:37 UTC
9 points
3 comments1 min readLW link

How bad would it be if GPS satel­lites were shot down?

Jackson Wagner12 Jun 2026 16:34 UTC
19 points
0 comments21 min readLW link

Sym­pa­thy for both sides of the egre­gious mis­al­ign­ment debate

Steven Byrnes12 Jun 2026 16:26 UTC
197 points
26 comments4 min readLW link

The Uncer­tainty That Mat­ters Isn’t Fundamental

jimmy12 Jun 2026 16:23 UTC
30 points
1 comment13 min readLW link

Ci­ta­tions Needed: Magic En­cy­clo­pe­dias to Save the World

Oliver Sourbut12 Jun 2026 15:35 UTC
40 points
3 comments5 min readLW link
(www.oliversourbut.net)

If you, a hu­man, can imag­ine red and green be­ing swapped, you are prob­a­bly conscious

vals tutor12 Jun 2026 13:28 UTC
4 points
19 comments7 min readLW link

Si­mu­lat­ing Simulators

kromem12 Jun 2026 12:56 UTC
43 points
2 comments15 min readLW link

Learn­ing to spend money

Yair Halberstadt12 Jun 2026 6:56 UTC
19 points
1 comment2 min readLW link

Park­in­son’s Heuris­tic: The Only Time To Do Anything

Ben Pace12 Jun 2026 6:55 UTC
117 points
8 comments5 min readLW link

PSA: Al­most no­body is di­rectly work­ing on su­per­in­tel­li­gent alignment

12 Jun 2026 5:17 UTC
230 points
41 comments1 min readLW link

Honey is Good

G Wood12 Jun 2026 4:07 UTC
9 points
4 comments3 min readLW link

The Aes­thet­i­cis­ing Vice by Paul Seabright

Linch12 Jun 2026 2:20 UTC
25 points
2 comments2 min readLW link

Ce­lene’s thoughts on consciousness

ToasterLightning12 Jun 2026 0:55 UTC
46 points
34 comments18 min readLW link
(terminuspoint.substack.com)

Con­struct val­idity of Claude Opus 4.8′s Sys­tem Card – A com­men­tary

Maria Federica Martino Lena 11 Jun 2026 23:33 UTC
8 points
0 comments16 min readLW link