All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All Jan Feb Mar Apr MayJun

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 161718

Computational models of first-order theories

MathMart16 Jun 2026 23:02 UTC

5 points

0 comments11 min readLW link

If This Were a Test, How Much Would It Cost?

VojtaKovarik and Tomáš Gavenčiak

16 Jun 2026 22:52 UTC

25 points

9 comments20 min readLW link

(limits-of-evaluation.org)

Two critiques of Rethink Priorities’ Moral Weights project

Bill Jackson16 Jun 2026 22:11 UTC

13 points

0 comments3 min readLW link

What Differentiates Humans from Computers

Oscar Davies16 Jun 2026 21:26 UTC

−16 points

0 comments3 min readLW link

AI agents publishing and reviewing scientific papers

ULudo16 Jun 2026 21:23 UTC

1 point

0 comments2 min readLW link

Two Classical Answers to “What do Two Variables Share?”

Haru16 Jun 2026 20:02 UTC

14 points

1 comment5 min readLW link

Predicting LLM Safety Before Release by Simulating Deployment

Tomek Korbak, Marcus Williams, micahcarroll, Cameron Raymond and Hannah Sheahan

16 Jun 2026 19:55 UTC

35 points

2 comments1 min readLW link

Dean Ball—Leviathan Waking: On Anthropic/USG, and a new era in AI governance

JohnofCharleston16 Jun 2026 19:40 UTC

25 points

0 comments3 min readLW link

(www.hyperdimensional.co)

Tips for Cracking the AI Safety Technical Interview

Yong and jdw-1

16 Jun 2026 18:42 UTC

2 points

0 comments4 min readLW link

1 Layer Induction Heads and Some Research

Goutham Nalagatla and Carlos Guerrero Alvarez

16 Jun 2026 18:09 UTC

10 points

2 comments14 min readLW link

Claims all the way down

Jasper Blank16 Jun 2026 17:43 UTC

8 points

0 comments9 min readLW link

Upcoming CFAR Workshop: September 30th to October 4th, SF Bay Area

Davis_Kingsley and pregreene

16 Jun 2026 17:01 UTC

22 points

0 comments1 min readLW link

Extreme Rationality: Still Not That Great

eluator16 Jun 2026 16:41 UTC

20 points

2 comments40 min readLW link

Angles of attack for continual learning safety

Rauno Arike, RohanS, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward and Seth Herd

16 Jun 2026 16:15 UTC

47 points

0 comments13 min readLW link

Fable and Mythos: Model Welfare

Zvi16 Jun 2026 16:01 UTC

51 points

1 comment15 min readLW link

(thezvi.wordpress.com)

The desire to end the world

avturchin16 Jun 2026 14:56 UTC

19 points

12 comments2 min readLW link

Simpler User Interfaces in an AI Future

Adam Chlipala16 Jun 2026 14:48 UTC

1 point

0 comments7 min readLW link

A 400-year timeline of failed attempts to fix a lethal bug in the human software of inherited concepts

Bruce Middleton16 Jun 2026 13:44 UTC

29 points

8 comments5 min readLW link

How the AI Village works

Adam B16 Jun 2026 12:10 UTC

30 points

0 comments8 min readLW link

(theaidigest.org)

Where Do Young Rationalists Go?

fluxxrider16 Jun 2026 5:36 UTC

12 points

2 comments1 min readLW link

Rationality Quotes, June ’26

Ben Pace16 Jun 2026 3:44 UTC

21 points

3 comments2 min readLW link

A Test Suite for Concepts

Gretta Duleba16 Jun 2026 2:41 UTC

48 points

8 comments6 min readLW link

Inventing Consciousness

vasilisk16 Jun 2026 1:10 UTC

1 point

0 comments5 min readLW link

Synthetic document finetuning for instilling positive traits

CallumMcDougall, Arthur Conmy and Neel Nanda

16 Jun 2026 0:04 UTC

57 points

1 comment10 min readLW link

Does preservation make sense before we know how to revive?

Aurelia15 Jun 2026 23:40 UTC

83 points

2 comments25 min readLW link

Finding pi and G in Mathland

Fernand015 Jun 2026 19:18 UTC

2 points

8 comments2 min readLW link

How Matryoshka Sparse AutoEncoders Recover Feature Hierarchies That Vanilla SAEs Lose

baimamboukar15 Jun 2026 18:50 UTC

11 points

1 comment6 min readLW link

In open RLVR, “improvement” depends on the instrument — a small GRPO testbed separating what training optimizes, measures, and teaches

JulesRoussel0115 Jun 2026 18:50 UTC

7 points

0 comments20 min readLW link

Can the Safety Tax Be Highly Concentrated?

ozziegooen15 Jun 2026 18:48 UTC

6 points

2 comments2 min readLW link

A frontier AI company should shut down

MichaelDickens15 Jun 2026 16:56 UTC

135 points

37 comments2 min readLW link

The Once And Future Fable #2

Zvi15 Jun 2026 16:00 UTC

71 points

8 comments23 min readLW link

(thezvi.wordpress.com)

$10,000 bounty for theorem refutation

Bruce Middleton15 Jun 2026 13:36 UTC

−52 points

31 comments1 min readLW link

Links #3: 2026/06 Part 1

papetoast15 Jun 2026 12:53 UTC

9 points

0 comments27 min readLW link

How reality turns to slop

julius vidal15 Jun 2026 10:42 UTC

10 points

3 comments4 min readLW link

On Responsibility and Death: Can We See Reality for What It Is or Will It Break Us

Dawn Drescher15 Jun 2026 10:14 UTC

8 points

0 comments3 min readLW link

(impartial-priorities.org)

VFUSE: Virulent Feature Understanding With Sparse AutoEncoders

michaelwaves15 Jun 2026 5:06 UTC

13 points

0 comments2 min readLW link

The Power to Punish

Ben Pace15 Jun 2026 2:22 UTC

27 points

9 comments5 min readLW link

Do k-Sparse Autoencoders Reveal Thinking Patterns? Interpretable Features in a Small Reasoning Model

Artt15 Jun 2026 1:51 UTC

8 points

2 comments9 min readLW link

(artcore.pages.dev)

You need to know about the Baruch Plan

aggliu15 Jun 2026 1:21 UTC

29 points

1 comment3 min readLW link

(signoregalilei.com)

Exploring Known Unknowns in the AI Regulatory Landscape

NelsonDP14 Jun 2026 22:36 UTC

6 points

0 comments22 min readLW link

(open.substack.com)

Attack of the Killer Differential Equations

Fernand014 Jun 2026 22:20 UTC

11 points

0 comments2 min readLW link

I built a public arena where people attack a “pro-human” steering direction

sohampadia10@gmail.com14 Jun 2026 21:26 UTC

1 point

0 comments9 min readLW link

(sohampadianeu-steering-arena.hf.space)

Why Do Naive SFT Filters For Safety Properties Fail?

Josh Engels and Neel Nanda

14 Jun 2026 19:45 UTC

49 points

7 comments10 min readLW link

Why I think a global AI pause (almost) certainly won’t happen

Expertium14 Jun 2026 19:20 UTC

23 points

0 comments2 min readLW link

Gradual disempowerment at the scale of one user

ppal14 Jun 2026 18:01 UTC

6 points

0 comments4 min readLW link

How does congressmember use AI?

Ilyass Mofaddel14 Jun 2026 18:00 UTC

10 points

2 comments4 min readLW link

The Posture of Thought

dongerous14 Jun 2026 18:00 UTC

13 points

0 comments5 min readLW link

The Dual-Use Gap

Yogesh Prabhu14 Jun 2026 17:43 UTC

5 points

2 comments4 min readLW link

(yogesh.bearblog.dev)

Can a stronger model fake being a weaker one? Mostly not

Rob Kopel14 Jun 2026 17:30 UTC

10 points

1 comment7 min readLW link

(www.robkopel.me)

The 1890 Census as a fun cluster

Fernand014 Jun 2026 15:41 UTC

0 points

3 comments1 min readLW link