All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All Jan Feb Mar Apr MayJun

All 1 2 3 456 7 8 9 10 11 12 13 14 15 16 17 18

Lunar bombardment of earth is practical

anithite4 Jun 2026 23:25 UTC

27 points

0 comments4 min readLW link

Endurance: Shackleton’s Incredible Voyage Review

nomagicpill4 Jun 2026 22:19 UTC

6 points

0 comments11 min readLW link

Rent from oil: a goldmine

TerriLeaf4 Jun 2026 21:05 UTC

15 points

5 comments5 min readLW link

Book of Cron Job

suchow4 Jun 2026 18:58 UTC

4 points

0 comments1 min readLW link

(www.nature.com)

(Mis)generalization of Helpful-Only Fine-tuning

Omar Khursheed, Baram Sosis and Fabien Roger

4 Jun 2026 18:40 UTC

55 points

7 comments11 min readLW link

Defeating Introspection Adapters (and Why Threat Models Matter)

Nick Merrill and zekem

4 Jun 2026 18:39 UTC

10 points

0 comments5 min readLW link

Building Better Activation Oracles

ceselder, Jan Bauer, Niclas Luick, Adam Karvonen and Neel Nanda

4 Jun 2026 18:34 UTC

62 points

1 comment7 min readLW link

What Separates an Optimizer From Something We Merely Describe as Optimizing?

stewart leland jansen4 Jun 2026 18:30 UTC

3 points

2 comments1 min readLW link

Rohin Shah on AGI Safety

anaguma4 Jun 2026 16:57 UTC

38 points

2 comments90 min readLW link

(80000hours.org)

Training Deliberative Monitors for Black-Box Scheming Detection

aksh-n, adityasinha, Victor Gillioz, Simon Storf, Kilian Merkelbach, richbc, Axel Højmark and Marius Hobbhahn

4 Jun 2026 16:43 UTC

33 points

6 comments6 min readLW link

When AI Builds Itself (Anthropic Institute Linkpost)

fluxxrider4 Jun 2026 16:37 UTC

26 points

16 comments1 min readLW link

Lab Leaks, Black Holes, and Eggs: Epistemic Case Study Competition

Oliver Sourbut, Josh Jacobson and Future of Life Foundation (FLF)

4 Jun 2026 16:26 UTC

44 points

6 comments8 min readLW link

(flf.org)

Logits as a new monitor for evaluation awareness

Santiago Aranguri4 Jun 2026 16:12 UTC

34 points

7 comments6 min readLW link

AI #171: False Flag

Zvi4 Jun 2026 15:50 UTC

41 points

1 comment48 min readLW link

(thezvi.wordpress.com)

What should go in a model spec?

James_T4 Jun 2026 14:57 UTC

8 points

0 comments12 min readLW link

(www.forethought.org)

The Psychological Challenges of High-Impact Work—please participate in our survey!

spencerg4 Jun 2026 3:51 UTC

9 points

0 comments1 min readLW link

Running An Air Purifier on Batteries

jefftk4 Jun 2026 2:40 UTC

15 points

0 comments4 min readLW link

(www.jefftk.com)

Voluntary Paternalism

quality_qualia4 Jun 2026 1:34 UTC

5 points

2 comments1 min readLW link

(sidkol1.github.io)

Sixteen schemes for AI safety

Austin Chen3 Jun 2026 21:50 UTC

32 points

4 comments8 min readLW link

(manifund.substack.com)

Aligning Superintelligent Humans

Elliot Callender3 Jun 2026 20:39 UTC

17 points

2 comments3 min readLW link

A Pipeline for Generating Synthetic Sabotage Trajectories to Red-Team Monitors

Myles H and Tyler Tracy

3 Jun 2026 20:33 UTC

9 points

0 comments12 min readLW link

Beyond Hardcoded Evolutionary Psychology

Elliot Callender3 Jun 2026 20:26 UTC

27 points

10 comments6 min readLW link

Trump Signs Executive Order For AI Testing Prior To Frontier Model Releases

Zvi3 Jun 2026 16:30 UTC

51 points

1 comment13 min readLW link

(thezvi.wordpress.com)

Thoughts on ‘Learning Mechanics’

criticalpoints3 Jun 2026 15:36 UTC

12 points

0 comments10 min readLW link

Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs

Elliott Thornley (EJT), carissacullen, christosi, alexr, LAThomson and Harry Garland

3 Jun 2026 14:24 UTC

20 points

3 comments19 min readLW link

(arxiv.org)

Society Explained: a tool for efficiently exploring >100 theories of society

spencerg3 Jun 2026 14:08 UTC

48 points

5 comments1 min readLW link

Don’t Edit Your Ideas Before Having Them

Hide3 Jun 2026 8:09 UTC

35 points

4 comments3 min readLW link

(hidefromit.substack.com)

China won’t win the AI race but would it be much worse if it did?

Chastity Ruth3 Jun 2026 5:46 UTC

71 points

18 comments13 min readLW link

Bear spray expiry dates: good news, and staggering peer-reviewed pseudoscience

Bruce Middleton3 Jun 2026 3:25 UTC

23 points

1 comment4 min readLW link

Abstraction Boundaries and Bubbles of Legibility

Adam Chlipala2 Jun 2026 23:54 UTC

1 point

0 comments9 min readLW link

Should AI Safety Researchers Experiment with Automated Research

Ephraiem Sarabamoun2 Jun 2026 23:18 UTC

1 point

0 comments1 min readLW link

My favorite depiction of utopia

Caleb Biddulph2 Jun 2026 23:15 UTC

189 points

20 comments33 min readLW link

(docs.google.com)

The Origin of Uncertainty

Gordon Seidoh Worley2 Jun 2026 18:20 UTC

13 points

2 comments2 min readLW link

(www.uncertainupdates.com)

LURE: Alignment Evaluations to Reduce Evaluation Awareness

Igor Ivanov and David Africa

2 Jun 2026 18:20 UTC

26 points

5 comments5 min readLW link

Why Even Experts Don’t Know What to Do About AI Risk

Luc Brinkman and plex

2 Jun 2026 17:31 UTC

78 points

22 comments2 min readLW link

Where does the race to automate AI research end?

Simon Lermen2 Jun 2026 17:21 UTC

16 points

0 comments1 min readLW link

(simonlermen.substack.com)

A Town Without Children

SeñorDingDong2 Jun 2026 16:35 UTC

35 points

7 comments4 min readLW link

Announcing the ARC White-Box Estimation Challenge

Jacob_Hilton, paulfchristiano and Wilson Wu

2 Jun 2026 16:20 UTC

165 points

15 comments3 min readLW link

(www.alignment.org)

Agent Foundations Reminds Me of Continental Philosophy

IanWS2 Jun 2026 14:34 UTC

106 points

15 comments5 min readLW link

(write.ianwsperber.com)

Claude Opus 4.8: Capabilities and Reactions

Zvi2 Jun 2026 14:10 UTC

38 points

2 comments31 min readLW link

(thezvi.wordpress.com)

Why we’re launching the Frontier Biodefense Fellowship

Tobias H2 Jun 2026 9:06 UTC

8 points

0 comments4 min readLW link

Wood Screws and the Methods of Rationality

quanticle2 Jun 2026 7:49 UTC

12 points

7 comments4 min readLW link

Taking the Training Wheels Off: Aligning LLMs without Personas

Matthew Khoriaty2 Jun 2026 6:29 UTC

23 points

16 comments3 min readLW link

Compute Verification on Short Timelines

skunnavakkam2 Jun 2026 3:31 UTC

13 points

0 comments2 min readLW link

Testing Best-Effort Solar

jefftk2 Jun 2026 3:00 UTC

16 points

0 comments2 min readLW link

(www.jefftk.com)

May 2026 Links

nomagicpill2 Jun 2026 1:42 UTC

8 points

0 comments4 min readLW link

% Bureaucracy

PossiblyElaine2 Jun 2026 0:36 UTC

11 points

1 comment5 min readLW link

(possiblyelaine.substack.com)

Tech I’m skeptical of and why

harsimony1 Jun 2026 22:54 UTC

46 points

24 comments24 min readLW link

(splittinginfinity.substack.com)

Critique of current AI safety bug bounty programs

clickyquack1 Jun 2026 21:26 UTC

7 points

0 comments7 min readLW link

[Linkpost] Prefixing names with ‘secure_’ makes agents write more secure code

Jack1 Jun 2026 21:20 UTC

14 points

1 comment1 min readLW link

(antimemeticai.com)