Does preser­va­tion make sense be­fore we know how to re­vive?

Aurelia15 Jun 2026 23:40 UTC
83 points
2 comments25 min readLW link

Find­ing pi and G in Mathland

Fernand015 Jun 2026 19:18 UTC
2 points
8 comments2 min readLW link

How Ma­tryoshka Sparse Au­toEn­coders Re­cover Fea­ture Hier­ar­chies That Vanilla SAEs Lose

baimamboukar15 Jun 2026 18:50 UTC
11 points
1 comment6 min readLW link

In open RLVR, “im­prove­ment” de­pends on the in­stru­ment — a small GRPO testbed sep­a­rat­ing what train­ing op­ti­mizes, mea­sures, and teaches

JulesRoussel0115 Jun 2026 18:50 UTC
7 points
0 comments20 min readLW link

Can the Safety Tax Be Highly Con­cen­trated?

ozziegooen15 Jun 2026 18:48 UTC
6 points
2 comments2 min readLW link

A fron­tier AI com­pany should shut down

MichaelDickens15 Jun 2026 16:56 UTC
135 points
37 comments2 min readLW link

The Once And Fu­ture Fable #2

Zvi15 Jun 2026 16:00 UTC
72 points
8 comments23 min readLW link
(thezvi.wordpress.com)

$10,000 bounty for the­o­rem refutation

Bruce Middleton15 Jun 2026 13:36 UTC
−52 points
31 comments1 min readLW link

Links #3: 2026/​06 Part 1

papetoast15 Jun 2026 12:53 UTC
9 points
0 comments27 min readLW link

How re­al­ity turns to slop

julius vidal15 Jun 2026 10:42 UTC
10 points
3 comments4 min readLW link

On Re­spon­si­bil­ity and Death: Can We See Real­ity for What It Is or Will It Break Us

Dawn Drescher15 Jun 2026 10:14 UTC
8 points
0 comments3 min readLW link
(impartial-priorities.org)

VFUSE: Viru­lent Fea­ture Un­der­stand­ing With Sparse AutoEncoders

michaelwaves15 Jun 2026 5:06 UTC
13 points
0 comments2 min readLW link

The Power to Punish

Ben Pace15 Jun 2026 2:22 UTC
27 points
9 comments5 min readLW link

Do k-Sparse Au­toen­coders Re­veal Think­ing Pat­terns? In­ter­pretable Fea­tures in a Small Rea­son­ing Model

Artt15 Jun 2026 1:51 UTC
8 points
2 comments9 min readLW link
(artcore.pages.dev)

You need to know about the Baruch Plan

aggliu15 Jun 2026 1:21 UTC
29 points
1 comment3 min readLW link
(signoregalilei.com)

Ex­plor­ing Known Un­knowns in the AI Reg­u­la­tory Landscape

NelsonDP14 Jun 2026 22:36 UTC
6 points
0 comments22 min readLW link
(open.substack.com)

At­tack of the Killer Differ­en­tial Equations

Fernand014 Jun 2026 22:20 UTC
11 points
0 comments2 min readLW link

I built a pub­lic arena where peo­ple at­tack a “pro-hu­man” steer­ing direction

sohampadia10@gmail.com14 Jun 2026 21:26 UTC
1 point
0 comments9 min readLW link
(sohampadianeu-steering-arena.hf.space)

Why Do Naive SFT Filters For Safety Prop­er­ties Fail?

14 Jun 2026 19:45 UTC
50 points
7 comments10 min readLW link

Why I think a global AI pause (al­most) cer­tainly won’t happen

Expertium14 Jun 2026 19:20 UTC
23 points
0 comments2 min readLW link

Grad­ual dis­em­pow­er­ment at the scale of one user

ppal14 Jun 2026 18:01 UTC
6 points
0 comments4 min readLW link

How does con­gress­mem­ber use AI?

Ilyass Mofaddel14 Jun 2026 18:00 UTC
10 points
2 comments4 min readLW link

The Pos­ture of Thought

dongerous14 Jun 2026 18:00 UTC
13 points
0 comments5 min readLW link

The Dual-Use Gap

Yogesh Prabhu14 Jun 2026 17:43 UTC
5 points
2 comments4 min readLW link
(yogesh.bearblog.dev)

Can a stronger model fake be­ing a weaker one? Mostly not

Rob Kopel14 Jun 2026 17:30 UTC
10 points
1 comment7 min readLW link
(www.robkopel.me)

The 1890 Cen­sus as a fun cluster

Fernand014 Jun 2026 15:41 UTC
0 points
3 comments1 min readLW link

The Hid­den Struc­tures of Problems

spencerg14 Jun 2026 13:51 UTC
91 points
9 comments3 min readLW link
(www.spencergreenberg.com)

Agent Iden­tity Stan­dard­i­s­a­tion Efforts

tr5tn14 Jun 2026 11:30 UTC
2 points
0 comments2 min readLW link

Wikipe­dia’s na­tional fla­vors—French

Fernand014 Jun 2026 10:29 UTC
11 points
1 comment2 min readLW link

Low-tem­per­a­ture bunk

Fernand014 Jun 2026 7:59 UTC
0 points
0 comments1 min readLW link

I Bet Abliter­a­tion’s Cost Was Sloppy Im­ple­men­ta­tion. I Was Wrong

christian-mc14 Jun 2026 6:03 UTC
6 points
0 comments6 min readLW link

Don’t just aim for Fron­tier Labs

emile delcourt14 Jun 2026 4:41 UTC
4 points
0 comments28 min readLW link

Pay­ing Kids To Do Schoolwork

Jake Grover14 Jun 2026 3:15 UTC
5 points
5 comments2 min readLW link
(helixishere.substack.com)

Speed­ing Up JumpReLU SAE In­fer­ence with Cus­tom Tri­ton Ker­nels (2–14× on Real SAEs)

Daniel Tiourine14 Jun 2026 3:15 UTC
9 points
0 comments15 min readLW link

Im­pres­sions at the Ex­trem­ity of Civilization

Ben Pace14 Jun 2026 2:33 UTC
40 points
2 comments8 min readLW link

Our Work is Low Skill Expression

cantsaymuch14 Jun 2026 0:12 UTC
9 points
0 comments4 min readLW link

An­thropic Is Tak­ing AI Welfare Se­ri­ously. I’m Not Sure It Knows What It’s Mea­sur­ing.

Failfinder7013 Jun 2026 20:54 UTC
−1 points
4 comments3 min readLW link

A cheap spe­cial­ist judge gets used by agents but fails to re­duce al­ign­ment au­dit costs

burnssa13 Jun 2026 20:38 UTC
8 points
0 comments8 min readLW link

What is a game?

Isaac Newton13 Jun 2026 19:51 UTC
2 points
2 comments8 min readLW link
(archimedeanmonoid.substack.com)

Amer­i­can Govern­ment Takes Down Claude Fable

Zvi13 Jun 2026 19:40 UTC
112 points
13 comments20 min readLW link
(thezvi.wordpress.com)

Not tel­ling is lying

Fernand013 Jun 2026 18:12 UTC
10 points
16 comments3 min readLW link

A sim­ple ar­gu­ment for try­ing less hard

Elias Schmied13 Jun 2026 18:12 UTC
13 points
3 comments3 min readLW link

How might con­tinual learn­ing af­fect safety and al­ign­ment?

13 Jun 2026 17:34 UTC
59 points
2 comments16 min readLW link

Pre­sent­ful­ness: Lu­cidity, Os­mo­sis, and Dissociation

Astrid Callender13 Jun 2026 17:21 UTC
4 points
2 comments5 min readLW link

How to Suffer Less

Gordon Seidoh Worley13 Jun 2026 17:10 UTC
19 points
4 comments6 min readLW link
(www.uncertainupdates.com)

Some­what Con­tra Ted Chi­ang on AI Consciousness

ThomasJ13 Jun 2026 16:49 UTC
8 points
0 comments10 min readLW link

The term “AGI” is al­most use­less at this point [Linkpost]

Noosphere8913 Jun 2026 16:15 UTC
30 points
1 comment5 min readLW link
(helentoner.substack.com)

SFT Drives Gem­ini’s Safety Properties

13 Jun 2026 15:31 UTC
69 points
3 comments1 min readLW link

Why not take the AI fight to the ground?

less_raichu13 Jun 2026 15:04 UTC
8 points
5 comments1 min readLW link

AML for AI as a ver­ifi­ca­tion mechanism

MarkelKori13 Jun 2026 11:59 UTC
9 points
2 comments2 min readLW link