All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All Jan Feb Mar Apr MayJun

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 151617 18 19

Does preservation make sense before we know how to revive?

Aurelia15 Jun 2026 23:40 UTC

83 points

2 comments25 min readLW link

Finding pi and G in Mathland

Fernand015 Jun 2026 19:18 UTC

2 points

8 comments2 min readLW link

How Matryoshka Sparse AutoEncoders Recover Feature Hierarchies That Vanilla SAEs Lose

baimamboukar15 Jun 2026 18:50 UTC

11 points

1 comment6 min readLW link

In open RLVR, “improvement” depends on the instrument — a small GRPO testbed separating what training optimizes, measures, and teaches

JulesRoussel0115 Jun 2026 18:50 UTC

7 points

0 comments20 min readLW link

Can the Safety Tax Be Highly Concentrated?

ozziegooen15 Jun 2026 18:48 UTC

6 points

2 comments2 min readLW link

A frontier AI company should shut down

MichaelDickens15 Jun 2026 16:56 UTC

135 points

37 comments2 min readLW link

The Once And Future Fable #2

Zvi15 Jun 2026 16:00 UTC

72 points

8 comments23 min readLW link

(thezvi.wordpress.com)

$10,000 bounty for theorem refutation

Bruce Middleton15 Jun 2026 13:36 UTC

−52 points

31 comments1 min readLW link

Links #3: 2026/06 Part 1

papetoast15 Jun 2026 12:53 UTC

9 points

0 comments27 min readLW link

How reality turns to slop

julius vidal15 Jun 2026 10:42 UTC

10 points

3 comments4 min readLW link

On Responsibility and Death: Can We See Reality for What It Is or Will It Break Us

Dawn Drescher15 Jun 2026 10:14 UTC

8 points

0 comments3 min readLW link

(impartial-priorities.org)

VFUSE: Virulent Feature Understanding With Sparse AutoEncoders

michaelwaves15 Jun 2026 5:06 UTC

13 points

0 comments2 min readLW link

The Power to Punish

Ben Pace15 Jun 2026 2:22 UTC

27 points

9 comments5 min readLW link

Do k-Sparse Autoencoders Reveal Thinking Patterns? Interpretable Features in a Small Reasoning Model

Artt15 Jun 2026 1:51 UTC

8 points

2 comments9 min readLW link

(artcore.pages.dev)

You need to know about the Baruch Plan

aggliu15 Jun 2026 1:21 UTC

29 points

1 comment3 min readLW link

(signoregalilei.com)

Exploring Known Unknowns in the AI Regulatory Landscape

NelsonDP14 Jun 2026 22:36 UTC

6 points

0 comments22 min readLW link

(open.substack.com)

Attack of the Killer Differential Equations

Fernand014 Jun 2026 22:20 UTC

11 points

0 comments2 min readLW link

I built a public arena where people attack a “pro-human” steering direction

sohampadia10@gmail.com14 Jun 2026 21:26 UTC

1 point

0 comments9 min readLW link

(sohampadianeu-steering-arena.hf.space)

Why Do Naive SFT Filters For Safety Properties Fail?

Josh Engels and Neel Nanda

14 Jun 2026 19:45 UTC

50 points

7 comments10 min readLW link

Why I think a global AI pause (almost) certainly won’t happen

Expertium14 Jun 2026 19:20 UTC

23 points

0 comments2 min readLW link

Gradual disempowerment at the scale of one user

ppal14 Jun 2026 18:01 UTC

6 points

0 comments4 min readLW link

How does congressmember use AI?

Ilyass Mofaddel14 Jun 2026 18:00 UTC

10 points

2 comments4 min readLW link

The Posture of Thought

dongerous14 Jun 2026 18:00 UTC

13 points

0 comments5 min readLW link

The Dual-Use Gap

Yogesh Prabhu14 Jun 2026 17:43 UTC

5 points

2 comments4 min readLW link

(yogesh.bearblog.dev)

Can a stronger model fake being a weaker one? Mostly not

Rob Kopel14 Jun 2026 17:30 UTC

10 points

1 comment7 min readLW link

(www.robkopel.me)

The 1890 Census as a fun cluster

Fernand014 Jun 2026 15:41 UTC

0 points

3 comments1 min readLW link

The Hidden Structures of Problems

spencerg14 Jun 2026 13:51 UTC

91 points

9 comments3 min readLW link

(www.spencergreenberg.com)

Agent Identity Standardisation Efforts

tr5tn14 Jun 2026 11:30 UTC

2 points

0 comments2 min readLW link

Wikipedia’s national flavors—French

Fernand014 Jun 2026 10:29 UTC

11 points

1 comment2 min readLW link

Low-temperature bunk

Fernand014 Jun 2026 7:59 UTC

0 points

0 comments1 min readLW link

I Bet Abliteration’s Cost Was Sloppy Implementation. I Was Wrong

christian-mc14 Jun 2026 6:03 UTC

6 points

0 comments6 min readLW link

Don’t just aim for Frontier Labs

emile delcourt14 Jun 2026 4:41 UTC

4 points

0 comments28 min readLW link

Paying Kids To Do Schoolwork

Jake Grover14 Jun 2026 3:15 UTC

5 points

5 comments2 min readLW link

(helixishere.substack.com)

Speeding Up JumpReLU SAE Inference with Custom Triton Kernels (2–14× on Real SAEs)

Daniel Tiourine14 Jun 2026 3:15 UTC

9 points

0 comments15 min readLW link

Impressions at the Extremity of Civilization

Ben Pace14 Jun 2026 2:33 UTC

40 points

2 comments8 min readLW link

Our Work is Low Skill Expression

cantsaymuch14 Jun 2026 0:12 UTC

9 points

0 comments4 min readLW link

Anthropic Is Taking AI Welfare Seriously. I’m Not Sure It Knows What It’s Measuring.

Failfinder7013 Jun 2026 20:54 UTC

−1 points

4 comments3 min readLW link

A cheap specialist judge gets used by agents but fails to reduce alignment audit costs

burnssa13 Jun 2026 20:38 UTC

8 points

0 comments8 min readLW link

What is a game?

Isaac Newton13 Jun 2026 19:51 UTC

2 points

2 comments8 min readLW link

(archimedeanmonoid.substack.com)

American Government Takes Down Claude Fable

Zvi13 Jun 2026 19:40 UTC

112 points

13 comments20 min readLW link

(thezvi.wordpress.com)

Not telling is lying

Fernand013 Jun 2026 18:12 UTC

10 points

16 comments3 min readLW link

A simple argument for trying less hard

Elias Schmied13 Jun 2026 18:12 UTC

13 points

3 comments3 min readLW link

How might continual learning affect safety and alignment?

Rauno Arike, RohanS, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward and Seth Herd

13 Jun 2026 17:34 UTC

59 points

2 comments16 min readLW link

Presentfulness: Lucidity, Osmosis, and Dissociation

Astrid Callender13 Jun 2026 17:21 UTC

4 points

2 comments5 min readLW link

How to Suffer Less

Gordon Seidoh Worley13 Jun 2026 17:10 UTC

19 points

4 comments6 min readLW link

(www.uncertainupdates.com)

Somewhat Contra Ted Chiang on AI Consciousness

ThomasJ13 Jun 2026 16:49 UTC

8 points

0 comments10 min readLW link

The term “AGI” is almost useless at this point [Linkpost]

Noosphere8913 Jun 2026 16:15 UTC

30 points

1 comment5 min readLW link

(helentoner.substack.com)

SFT Drives Gemini’s Safety Properties

Josh Engels, Arthur Conmy, bilalchughtai and Neel Nanda

13 Jun 2026 15:31 UTC

69 points

3 comments1 min readLW link

Why not take the AI fight to the ground?

less_raichu13 Jun 2026 15:04 UTC

8 points

5 comments1 min readLW link

AML for AI as a verification mechanism

MarkelKori13 Jun 2026 11:59 UTC

9 points

2 comments2 min readLW link