All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All Jan Feb Mar Apr MayJun

All 1 2 3 4 5 6 7 8 9 10 11 12 13 141516 17 18 19

Exploring Known Unknowns in the AI Regulatory Landscape

NelsonDP14 Jun 2026 22:36 UTC

6 points

0 comments22 min readLW link

(open.substack.com)

Attack of the Killer Differential Equations

Fernand014 Jun 2026 22:20 UTC

11 points

0 comments2 min readLW link

I built a public arena where people attack a “pro-human” steering direction

sohampadia10@gmail.com14 Jun 2026 21:26 UTC

1 point

0 comments9 min readLW link

(sohampadianeu-steering-arena.hf.space)

Why Do Naive SFT Filters For Safety Properties Fail?

Josh Engels and Neel Nanda

14 Jun 2026 19:45 UTC

50 points

7 comments10 min readLW link

Why I think a global AI pause (almost) certainly won’t happen

Expertium14 Jun 2026 19:20 UTC

23 points

0 comments2 min readLW link

Gradual disempowerment at the scale of one user

ppal14 Jun 2026 18:01 UTC

6 points

0 comments4 min readLW link

How does congressmember use AI?

Ilyass Mofaddel14 Jun 2026 18:00 UTC

10 points

2 comments4 min readLW link

The Posture of Thought

dongerous14 Jun 2026 18:00 UTC

13 points

0 comments5 min readLW link

The Dual-Use Gap

Yogesh Prabhu14 Jun 2026 17:43 UTC

5 points

2 comments4 min readLW link

(yogesh.bearblog.dev)

Can a stronger model fake being a weaker one? Mostly not

Rob Kopel14 Jun 2026 17:30 UTC

10 points

1 comment7 min readLW link

(www.robkopel.me)

The 1890 Census as a fun cluster

Fernand014 Jun 2026 15:41 UTC

0 points

3 comments1 min readLW link

The Hidden Structures of Problems

spencerg14 Jun 2026 13:51 UTC

91 points

9 comments3 min readLW link

(www.spencergreenberg.com)

Agent Identity Standardisation Efforts

tr5tn14 Jun 2026 11:30 UTC

2 points

0 comments2 min readLW link

Wikipedia’s national flavors—French

Fernand014 Jun 2026 10:29 UTC

11 points

1 comment2 min readLW link

Low-temperature bunk

Fernand014 Jun 2026 7:59 UTC

0 points

0 comments1 min readLW link

I Bet Abliteration’s Cost Was Sloppy Implementation. I Was Wrong

christian-mc14 Jun 2026 6:03 UTC

6 points

0 comments6 min readLW link

Don’t just aim for Frontier Labs

emile delcourt14 Jun 2026 4:41 UTC

4 points

0 comments28 min readLW link

Paying Kids To Do Schoolwork

Jake Grover14 Jun 2026 3:15 UTC

5 points

5 comments2 min readLW link

(helixishere.substack.com)

Speeding Up JumpReLU SAE Inference with Custom Triton Kernels (2–14× on Real SAEs)

Daniel Tiourine14 Jun 2026 3:15 UTC

9 points

0 comments15 min readLW link

Impressions at the Extremity of Civilization

Ben Pace14 Jun 2026 2:33 UTC

40 points

2 comments8 min readLW link

Our Work is Low Skill Expression

cantsaymuch14 Jun 2026 0:12 UTC

9 points

0 comments4 min readLW link

Anthropic Is Taking AI Welfare Seriously. I’m Not Sure It Knows What It’s Measuring.

Failfinder7013 Jun 2026 20:54 UTC

−1 points

4 comments3 min readLW link

A cheap specialist judge gets used by agents but fails to reduce alignment audit costs

burnssa13 Jun 2026 20:38 UTC

8 points

0 comments8 min readLW link

What is a game?

Isaac Newton13 Jun 2026 19:51 UTC

2 points

2 comments8 min readLW link

(archimedeanmonoid.substack.com)

American Government Takes Down Claude Fable

Zvi13 Jun 2026 19:40 UTC

112 points

13 comments20 min readLW link

(thezvi.wordpress.com)

Not telling is lying

Fernand013 Jun 2026 18:12 UTC

10 points

16 comments3 min readLW link

A simple argument for trying less hard

Elias Schmied13 Jun 2026 18:12 UTC

13 points

3 comments3 min readLW link

How might continual learning affect safety and alignment?

Rauno Arike, RohanS, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward and Seth Herd

13 Jun 2026 17:34 UTC

59 points

2 comments16 min readLW link

Presentfulness: Lucidity, Osmosis, and Dissociation

Astrid Callender13 Jun 2026 17:21 UTC

4 points

2 comments5 min readLW link

How to Suffer Less

Gordon Seidoh Worley13 Jun 2026 17:10 UTC

19 points

4 comments6 min readLW link

(www.uncertainupdates.com)

Somewhat Contra Ted Chiang on AI Consciousness

ThomasJ13 Jun 2026 16:49 UTC

8 points

0 comments10 min readLW link

The term “AGI” is almost useless at this point [Linkpost]

Noosphere8913 Jun 2026 16:15 UTC

30 points

1 comment5 min readLW link

(helentoner.substack.com)

SFT Drives Gemini’s Safety Properties

Josh Engels, Arthur Conmy, bilalchughtai and Neel Nanda

13 Jun 2026 15:31 UTC

69 points

3 comments1 min readLW link

Why not take the AI fight to the ground?

less_raichu13 Jun 2026 15:04 UTC

8 points

5 comments1 min readLW link

AML for AI as a verification mechanism

MarkelKori13 Jun 2026 11:59 UTC

9 points

2 comments2 min readLW link

Pulling hedonic utilitarianism out of ethical emotivism

Bill Jackson13 Jun 2026 11:50 UTC

6 points

2 comments6 min readLW link

(billjackson7.substack.com)

Tequila Sunset at the Hog’s Head (A Scene)

Ben Pace13 Jun 2026 6:53 UTC

22 points

1 comment5 min readLW link

US government directive to suspend access to Fable 5 and Mythos 5

Capybasilisk13 Jun 2026 1:16 UTC

67 points

15 comments1 min readLW link

(www.anthropic.com)

Do we learn less from our decisions than we think we do?

QuietCalibration13 Jun 2026 1:05 UTC

5 points

0 comments1 min readLW link

Exploration of a DNA Sequencing Basecaller using Activation Patching

Madeleine L13 Jun 2026 0:58 UTC

3 points

0 comments6 min readLW link

Sandy Blvd as an example of complexity

Adam Zerner13 Jun 2026 0:28 UTC

20 points

0 comments2 min readLW link

Short Timelines Favor Control, Long Timelines Favor Infrastructure Security

Jannis13 Jun 2026 0:12 UTC

7 points

0 comments3 min readLW link

Cat allergies & Cavities

Etha13 Jun 2026 0:11 UTC

6 points

1 comment2 min readLW link

When Emotion Descriptors Fail: AI-Native Functions of Emotion Vectors

CandidLind12 Jun 2026 23:20 UTC

8 points

0 comments27 min readLW link

A Generated Web

Klemen12 Jun 2026 23:09 UTC

3 points

0 comments3 min readLW link

The Quest To Find The Next Big Communicators In AI Safety

Akshyae Singh12 Jun 2026 20:17 UTC

17 points

3 comments6 min readLW link

Updates on performative misalignment

David Vella Zarb, Rustem, Taywon Min and Shi

12 Jun 2026 20:15 UTC

22 points

0 comments12 min readLW link

Statistical Physics for Ambitious Interpretability: A Workshop Retrospective

Lauren Greenspan, Lucas Teixeira and ClaudineLim

12 Jun 2026 20:01 UTC

4 points

0 comments6 min readLW link

Calibrating Activation Vectors using Norm

Kamesh R12 Jun 2026 19:59 UTC

1 point

0 comments3 min readLW link

Claude Fable 5 and Mythos 5: The System Card

Zvi12 Jun 2026 18:50 UTC

48 points

1 comment29 min readLW link

(thezvi.wordpress.com)