All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All Jan Feb Mar Apr MayJun

All 1 2 3 4 5 678 9 10 11 12 13 14 15 16 17 18

Frontier Models Still Lag Behind Humans at Robust Belief-State Tracking

Lukas Frei6 Jun 2026 23:54 UTC

13 points

6 comments5 min readLW link

Coming Around To Political Donations

jefftk6 Jun 2026 21:30 UTC

59 points

8 comments2 min readLW link

(www.jefftk.com)

Analysis of Metastable States in the Transformer Activation Space

Zach Baker6 Jun 2026 21:30 UTC

10 points

0 comments20 min readLW link

The Diamond Lemma

Isaac Newton6 Jun 2026 21:15 UTC

21 points

0 comments7 min readLW link

(archimedeanmonoid.substack.com)

Iliad is Hiring

Peter Jean6 Jun 2026 21:08 UTC

13 points

0 comments1 min readLW link

Against Corrigibility

peralice6 Jun 2026 20:28 UTC

66 points

17 comments12 min readLW link

The Residual Stream Has a Geometry of Time

Fodenthal6 Jun 2026 19:57 UTC

23 points

0 comments8 min readLW link

Exponential Solitude

PeterMaui6 Jun 2026 19:49 UTC

5 points

1 comment9 min readLW link

Freud heard a rumor that Science existed, and had a wonderful dream

Bruce Middleton6 Jun 2026 14:47 UTC

8 points

8 comments6 min readLW link

Coalitional Darwinism and the Instrumental Utility of Individuality

CarolusRenniusVitellius6 Jun 2026 12:53 UTC

25 points

5 comments17 min readLW link

(charlesr-w.github.io)

Why Software Automation Is Hard

silentbob6 Jun 2026 8:56 UTC

114 points

20 comments12 min readLW link

What if Anthropic unilaterally paused capabilities development right now?

Karl von Wendt6 Jun 2026 7:39 UTC

61 points

15 comments3 min readLW link

Optimisation over non-stationary distributions creates weirder minds

Samuel Ratnam and Pjain

6 Jun 2026 0:05 UTC

36 points

8 comments4 min readLW link

[Question] Does robotics capabilities research accelerate AGI timelines?

Master Chief5 Jun 2026 23:32 UTC

4 points

3 comments1 min readLW link

Evaluating using Mock Tool Calls to Quarantine Untrusted Prompt Inputs

dgros5 Jun 2026 22:43 UTC

15 points

0 comments11 min readLW link

Two More Methods for Consistency Training and Some New Ways to Apply It

David Africa, Sukrati_Gautam, Neil Shah and arav-dhoot

5 Jun 2026 21:06 UTC

18 points

0 comments7 min readLW link

Revisiting GSM-Symbolic: models seem to reason okay, actually

Sturb5 Jun 2026 20:54 UTC

24 points

0 comments5 min readLW link

Accepting Death & Adult Responsibility

Unreal5 Jun 2026 19:23 UTC

−19 points

10 comments4 min readLW link

The Masochistic Prior

Modulo.Roland5 Jun 2026 19:05 UTC

12 points

2 comments2 min readLW link

(substack.com)

Beyond the lexical personality traits: What is the structure of personality?

tailcalled5 Jun 2026 19:05 UTC

60 points

1 comment5 min readLW link

Do not try to write your first research publication as a single author

Mikhail Mironov5 Jun 2026 18:31 UTC

12 points

0 comments5 min readLW link

Do We Want a Superintelligent People-Pleaser?

GenericHousewife_B5 Jun 2026 18:07 UTC

1 point

0 comments6 min readLW link

Explaining SAE Features With Foreign Natural Language Autoencoders

fzaffino5 Jun 2026 17:51 UTC

17 points

1 comment8 min readLW link

SecureBio Detection is Hiring Software Engineers

jefftk5 Jun 2026 16:50 UTC

33 points

2 comments1 min readLW link

(www.jefftk.com)

One Year of PauseAI UK

Joseph Miller and PauseAI UK

5 Jun 2026 16:41 UTC

94 points

7 comments11 min readLW link

(pauseai.uk)

Learnings from starting an AI safety research team

draganover and Erin Robertson

5 Jun 2026 16:27 UTC

101 points

7 comments6 min readLW link

Preparing for Warning Shots to Catalyze International Cooperation on AGI Risks

Mark Kagach, EliasSchlie, Thomas Van Damme and JustinShovelain

5 Jun 2026 15:49 UTC

40 points

1 comment5 min readLW link

My research: a computational cognitive neuroscience perspective on alignment

Seth Herd5 Jun 2026 14:19 UTC

52 points

0 comments18 min readLW link

Editing is Easy, but Revision is Hard

IanWS5 Jun 2026 11:58 UTC

5 points

0 comments3 min readLW link

(write.ianwsperber.com)

OpenAI Offers A New Policy Blueprint

Zvi5 Jun 2026 11:41 UTC

31 points

3 comments7 min readLW link

(thezvi.wordpress.com)

[Paper] Dictionary Learning Identifiability for Understanding SAEs

William Dorrell5 Jun 2026 0:28 UTC

12 points

0 comments3 min readLW link

What Does Abliteration Actually Cost?

christian-mc5 Jun 2026 0:28 UTC

3 points

0 comments4 min readLW link

Lunar bombardment of earth is practical

anithite4 Jun 2026 23:25 UTC

27 points

0 comments4 min readLW link

Endurance: Shackleton’s Incredible Voyage Review

nomagicpill4 Jun 2026 22:19 UTC

6 points

0 comments11 min readLW link

Rent from oil: a goldmine

TerriLeaf4 Jun 2026 21:05 UTC

15 points

5 comments5 min readLW link

Book of Cron Job

suchow4 Jun 2026 18:58 UTC

4 points

0 comments1 min readLW link

(www.nature.com)

(Mis)generalization of Helpful-Only Fine-tuning

Omar Khursheed, Baram Sosis and Fabien Roger

4 Jun 2026 18:40 UTC

55 points

7 comments11 min readLW link

Defeating Introspection Adapters (and Why Threat Models Matter)

Nick Merrill and zekem

4 Jun 2026 18:39 UTC

10 points

0 comments5 min readLW link

Building Better Activation Oracles

ceselder, Jan Bauer, Niclas Luick, Adam Karvonen and Neel Nanda

4 Jun 2026 18:34 UTC

62 points

1 comment7 min readLW link

What Separates an Optimizer From Something We Merely Describe as Optimizing?

stewart leland jansen4 Jun 2026 18:30 UTC

3 points

2 comments1 min readLW link

Rohin Shah on AGI Safety

anaguma4 Jun 2026 16:57 UTC

38 points

2 comments90 min readLW link

(80000hours.org)

Training Deliberative Monitors for Black-Box Scheming Detection

aksh-n, adityasinha, Victor Gillioz, Simon Storf, Kilian Merkelbach, richbc, Axel Højmark and Marius Hobbhahn

4 Jun 2026 16:43 UTC

33 points

6 comments6 min readLW link

When AI Builds Itself (Anthropic Institute Linkpost)

fluxxrider4 Jun 2026 16:37 UTC

26 points

16 comments1 min readLW link

Lab Leaks, Black Holes, and Eggs: Epistemic Case Study Competition

Oliver Sourbut, Josh Jacobson and Future of Life Foundation (FLF)

4 Jun 2026 16:26 UTC

44 points

6 comments8 min readLW link

(flf.org)

Logits as a new monitor for evaluation awareness

Santiago Aranguri4 Jun 2026 16:12 UTC

34 points

7 comments6 min readLW link

AI #171: False Flag

Zvi4 Jun 2026 15:50 UTC

41 points

1 comment48 min readLW link

(thezvi.wordpress.com)

What should go in a model spec?

James_T4 Jun 2026 14:57 UTC

8 points

0 comments12 min readLW link

(www.forethought.org)

The Psychological Challenges of High-Impact Work—please participate in our survey!

spencerg4 Jun 2026 3:51 UTC

9 points

0 comments1 min readLW link

Running An Air Purifier on Batteries

jefftk4 Jun 2026 2:40 UTC

15 points

0 comments4 min readLW link

(www.jefftk.com)

Voluntary Paternalism

quality_qualia4 Jun 2026 1:34 UTC

5 points

2 comments1 min readLW link

(sidkol1.github.io)