All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All Jan Feb Mar Apr MayJun

All 1 2 3 4 567 8 9 10 11 12 13 14 15 16 17 18 19

[Question] Does robotics capabilities research accelerate AGI timelines?

Master Chief5 Jun 2026 23:32 UTC

4 points

3 comments1 min readLW link

Evaluating using Mock Tool Calls to Quarantine Untrusted Prompt Inputs

dgros5 Jun 2026 22:43 UTC

15 points

0 comments11 min readLW link

Two More Methods for Consistency Training and Some New Ways to Apply It

David Africa, Sukrati_Gautam, Neil Shah and arav-dhoot

5 Jun 2026 21:06 UTC

18 points

0 comments7 min readLW link

Revisiting GSM-Symbolic: models seem to reason okay, actually

Sturb5 Jun 2026 20:54 UTC

24 points

0 comments5 min readLW link

Accepting Death & Adult Responsibility

Unreal5 Jun 2026 19:23 UTC

−19 points

10 comments4 min readLW link

The Masochistic Prior

Modulo.Roland5 Jun 2026 19:05 UTC

12 points

2 comments2 min readLW link

(substack.com)

Beyond the lexical personality traits: What is the structure of personality?

tailcalled5 Jun 2026 19:05 UTC

60 points

1 comment5 min readLW link

Do not try to write your first research publication as a single author

Mikhail Mironov5 Jun 2026 18:31 UTC

12 points

0 comments5 min readLW link

Do We Want a Superintelligent People-Pleaser?

GenericHousewife_B5 Jun 2026 18:07 UTC

1 point

0 comments6 min readLW link

Explaining SAE Features With Foreign Natural Language Autoencoders

fzaffino5 Jun 2026 17:51 UTC

17 points

1 comment8 min readLW link

SecureBio Detection is Hiring Software Engineers

jefftk5 Jun 2026 16:50 UTC

33 points

2 comments1 min readLW link

(www.jefftk.com)

One Year of PauseAI UK

Joseph Miller and PauseAI UK

5 Jun 2026 16:41 UTC

94 points

7 comments11 min readLW link

(pauseai.uk)

Learnings from starting an AI safety research team

draganover and Erin Robertson

5 Jun 2026 16:27 UTC

101 points

7 comments6 min readLW link

Preparing for Warning Shots to Catalyze International Cooperation on AGI Risks

Mark Kagach, EliasSchlie, Thomas Van Damme and JustinShovelain

5 Jun 2026 15:49 UTC

40 points

1 comment5 min readLW link

My research: a computational cognitive neuroscience perspective on alignment

Seth Herd5 Jun 2026 14:19 UTC

52 points

0 comments18 min readLW link

Editing is Easy, but Revision is Hard

IanWS5 Jun 2026 11:58 UTC

5 points

0 comments3 min readLW link

(write.ianwsperber.com)

OpenAI Offers A New Policy Blueprint

Zvi5 Jun 2026 11:41 UTC

31 points

3 comments7 min readLW link

(thezvi.wordpress.com)

[Paper] Dictionary Learning Identifiability for Understanding SAEs

William Dorrell5 Jun 2026 0:28 UTC

12 points

0 comments3 min readLW link

What Does Abliteration Actually Cost?

christian-mc5 Jun 2026 0:28 UTC

3 points

0 comments4 min readLW link

Lunar bombardment of earth is practical

anithite4 Jun 2026 23:25 UTC

27 points

0 comments4 min readLW link

Endurance: Shackleton’s Incredible Voyage Review

nomagicpill4 Jun 2026 22:19 UTC

6 points

0 comments11 min readLW link

Rent from oil: a goldmine

TerriLeaf4 Jun 2026 21:05 UTC

15 points

5 comments5 min readLW link

Book of Cron Job

suchow4 Jun 2026 18:58 UTC

4 points

0 comments1 min readLW link

(www.nature.com)

(Mis)generalization of Helpful-Only Fine-tuning

Omar Khursheed, Baram Sosis and Fabien Roger

4 Jun 2026 18:40 UTC

55 points

7 comments11 min readLW link

Defeating Introspection Adapters (and Why Threat Models Matter)

Nick Merrill and zekem

4 Jun 2026 18:39 UTC

10 points

0 comments5 min readLW link

Building Better Activation Oracles

ceselder, Jan Bauer, Niclas Luick, Adam Karvonen and Neel Nanda

4 Jun 2026 18:34 UTC

62 points

1 comment7 min readLW link

What Separates an Optimizer From Something We Merely Describe as Optimizing?

stewart leland jansen4 Jun 2026 18:30 UTC

3 points

2 comments1 min readLW link

Rohin Shah on AGI Safety

anaguma4 Jun 2026 16:57 UTC

38 points

2 comments90 min readLW link

(80000hours.org)

Training Deliberative Monitors for Black-Box Scheming Detection

aksh-n, adityasinha, Victor Gillioz, Simon Storf, Kilian Merkelbach, richbc, Axel Højmark and Marius Hobbhahn

4 Jun 2026 16:43 UTC

33 points

6 comments6 min readLW link

When AI Builds Itself (Anthropic Institute Linkpost)

fluxxrider4 Jun 2026 16:37 UTC

26 points

16 comments1 min readLW link

Lab Leaks, Black Holes, and Eggs: Epistemic Case Study Competition

Oliver Sourbut, Josh Jacobson and Future of Life Foundation (FLF)

4 Jun 2026 16:26 UTC

44 points

6 comments8 min readLW link

(flf.org)

Logits as a new monitor for evaluation awareness

Santiago Aranguri4 Jun 2026 16:12 UTC

34 points

7 comments6 min readLW link

AI #171: False Flag

Zvi4 Jun 2026 15:50 UTC

41 points

1 comment48 min readLW link

(thezvi.wordpress.com)

What should go in a model spec?

James_T4 Jun 2026 14:57 UTC

8 points

0 comments12 min readLW link

(www.forethought.org)

The Psychological Challenges of High-Impact Work—please participate in our survey!

spencerg4 Jun 2026 3:51 UTC

9 points

0 comments1 min readLW link

Running An Air Purifier on Batteries

jefftk4 Jun 2026 2:40 UTC

15 points

0 comments4 min readLW link

(www.jefftk.com)

Voluntary Paternalism

quality_qualia4 Jun 2026 1:34 UTC

5 points

2 comments1 min readLW link

(sidkol1.github.io)

Sixteen schemes for AI safety

Austin Chen3 Jun 2026 21:50 UTC

32 points

4 comments8 min readLW link

(manifund.substack.com)

Aligning Superintelligent Humans

Elliot Callender3 Jun 2026 20:39 UTC

17 points

2 comments3 min readLW link

A Pipeline for Generating Synthetic Sabotage Trajectories to Red-Team Monitors

Myles H and Tyler Tracy

3 Jun 2026 20:33 UTC

9 points

0 comments12 min readLW link

Beyond Hardcoded Evolutionary Psychology

Elliot Callender3 Jun 2026 20:26 UTC

27 points

10 comments6 min readLW link

Trump Signs Executive Order For AI Testing Prior To Frontier Model Releases

Zvi3 Jun 2026 16:30 UTC

51 points

1 comment13 min readLW link

(thezvi.wordpress.com)

Thoughts on ‘Learning Mechanics’

criticalpoints3 Jun 2026 15:36 UTC

12 points

0 comments10 min readLW link

Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs

Elliott Thornley (EJT), carissacullen, christosi, alexr, LAThomson and Harry Garland

3 Jun 2026 14:24 UTC

20 points

3 comments19 min readLW link

(arxiv.org)

Society Explained: a tool for efficiently exploring >100 theories of society

spencerg3 Jun 2026 14:08 UTC

48 points

5 comments1 min readLW link

Don’t Edit Your Ideas Before Having Them

Hide3 Jun 2026 8:09 UTC

35 points

4 comments3 min readLW link

(hidefromit.substack.com)

China won’t win the AI race but would it be much worse if it did?

Chastity Ruth3 Jun 2026 5:46 UTC

71 points

18 comments13 min readLW link

Bear spray expiry dates: good news, and staggering peer-reviewed pseudoscience

Bruce Middleton3 Jun 2026 3:25 UTC

23 points

1 comment4 min readLW link

Abstraction Boundaries and Bubbles of Legibility

Adam Chlipala2 Jun 2026 23:54 UTC

1 point

0 comments9 min readLW link

Should AI Safety Researchers Experiment with Automated Research

Ephraiem Sarabamoun2 Jun 2026 23:18 UTC

1 point

0 comments1 min readLW link