All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All Jan Feb Mar Apr MayJun

All 1 2 3 4 5 6 7 8 9 101112 13 14 15 16 17 18

Thoughts on Claude Fable’s silent safeguards

Andy Arditi10 Jun 2026 23:35 UTC

51 points

20 comments10 min readLW link

Notes on Algorithms

Menotim10 Jun 2026 23:28 UTC

7 points

0 comments25 min readLW link

[Question] Fuel Crisis: Situation Modeling Thread

Nicholas Kross10 Jun 2026 21:59 UTC

8 points

7 comments1 min readLW link

[Question] Fuel Crisis: Justified Practical Advice Thread

Nicholas Kross10 Jun 2026 21:59 UTC

14 points

0 comments1 min readLW link

Solsong Chord Updates

jefftk10 Jun 2026 21:00 UTC

10 points

0 comments1 min readLW link

(www.jefftk.com)

Dario Amodei—Policy on the AI Exponential

DW1110 Jun 2026 20:56 UTC

22 points

0 comments1 min readLW link

Anthropic did not call for a pause on AI

Andrea_Miotti and Gabriel Alfour

10 Jun 2026 20:02 UTC

80 points

5 comments5 min readLW link

(controlai.news)

Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models

Anders Cairns Woodruff, Francis Rhys Ward, Dewi Gould, Rauno Arike, Jason R Brown, Jo Jiao, wlanderson, ariana_azarbal, harrymayne, Patrick Leask, Twm Stone, Josh Hills, Ida Caspary and Shubhorup Biswas

10 Jun 2026 17:58 UTC

240 points

20 comments4 min readLW link

These Three Thaumata

chaosmage10 Jun 2026 16:42 UTC

11 points

0 comments1 min readLW link

Sequent: scale and automation for higher confidence in alignment

Geoffrey Irving, Alex HT, Jesse Hoogland, Daniel Murfet, Jacob Pfau, Marco Cozzi and Stan van Wingerden

10 Jun 2026 15:37 UTC

276 points

2 comments11 min readLW link

(sequent.org)

You Can Catch Sleeper Agents by Teaching Another Model to Imitate Them

RobinHa10 Jun 2026 15:21 UTC

65 points

5 comments9 min readLW link

(robinhaselhorst.com)

I Started an AI Safety Research Org and Think These 7 Things Matter

Alfie Lamerton10 Jun 2026 14:54 UTC

20 points

0 comments5 min readLW link

Phonies

IanWS10 Jun 2026 14:17 UTC

10 points

0 comments2 min readLW link

(write.ianwsperber.com)

Machinic Psychopharmacology: Do LLMs Self-Medicate?

Sid Black and Joseph Bloom

10 Jun 2026 14:15 UTC

124 points

11 comments23 min readLW link

I didn’t see any METR graph extrapolations so here.

Vermillion10 Jun 2026 12:50 UTC

15 points

2 comments1 min readLW link

ML4Good Summer 2026 Bootcamps - Applications Open!

Jack_S10 Jun 2026 11:07 UTC

3 points

0 comments2 min readLW link

Tracing Eval-Awareness Emergence Through Training of OLMo 3

Ram Bharadwaj and RobertKirk

10 Jun 2026 10:13 UTC

43 points

6 comments6 min readLW link

The Three Filters: Why Almost Every Plan to Survive ASI Fails Miserably

Alex Amadori10 Jun 2026 9:44 UTC

73 points

26 comments16 min readLW link

(alexamadori.substack.com)

Three types of model organism

Francis Rhys Ward10 Jun 2026 8:50 UTC

45 points

7 comments2 min readLW link

Even “illegible” Mythos reasoning traces seem pretty legible

faul_sname10 Jun 2026 8:49 UTC

157 points

23 comments2 min readLW link

MythOS—The Rise of AGI

Byron Lee10 Jun 2026 6:06 UTC

−19 points

0 comments4 min readLW link

Under Violet

Hide10 Jun 2026 1:30 UTC

4 points

0 comments10 min readLW link

(hidefromit.substack.com)

LessOnline 2026

nomagicpill9 Jun 2026 23:24 UTC

3 points

0 comments5 min readLW link

(nomagicpill.substack.com)

“Programmer Science Fiction: My case for a new sub-genre”, Sam T. Oates 2026

gwern9 Jun 2026 23:23 UTC

47 points

10 comments1 min readLW link

(stoates.substack.com)

The Disutility of FDT: on Utility Functions and Voting, Insights from Behavioral Economics and Decision Theory

DanielW9 Jun 2026 23:13 UTC

5 points

3 comments8 min readLW link

Three Labs With a Plan and A Memorandum

Zvi9 Jun 2026 22:40 UTC

45 points

0 comments12 min readLW link

(thezvi.wordpress.com)

Harmfulness Directions in OLMo

Daniele Pace, Bryan Maruyama and LorenzoPacchiardi

9 Jun 2026 22:31 UTC

20 points

0 comments11 min readLW link

“Self-Control” Is A (Neurological) Type Error

Elliot Callender9 Jun 2026 21:34 UTC

−6 points

0 comments1 min readLW link

Towards a Formal Scientific Epistemology

Richard_Ngo9 Jun 2026 20:31 UTC

75 points

9 comments7 min readLW link

(www.mindthefuture.info)

Some Interesting Papers on RLVR

CarolusRenniusVitellius9 Jun 2026 19:00 UTC

20 points

5 comments4 min readLW link

A Mike’s-Eye View of ARC’s Research

Mikewins9 Jun 2026 18:30 UTC

64 points

1 comment11 min readLW link

(www.alignment.org)

An LLM Flagged My Paper About LLMs Flagging Things.

Failfinder709 Jun 2026 18:00 UTC

5 points

0 comments2 min readLW link

The Skeptic, the Bayesian, Empiricism and Claims to Know:

DanielW9 Jun 2026 17:52 UTC

4 points

4 comments4 min readLW link

Claude Fable 5 and Mythos 5 [Linkpost]

fluxxrider9 Jun 2026 17:19 UTC

42 points

10 comments1 min readLW link

5 Things I Learned About People From Doing Stand-Up Comedy

Luise Woehlke9 Jun 2026 15:52 UTC

−4 points

5 comments2 min readLW link

(open.substack.com)

The Machines Lack Honour

Raymond Douglas9 Jun 2026 15:30 UTC

169 points

21 comments12 min readLW link

High Dynamic Range DIY Air Testing

jefftk9 Jun 2026 15:00 UTC

13 points

0 comments4 min readLW link

(www.jefftk.com)

AI Super PAC tracker

Mikhail Samin9 Jun 2026 14:57 UTC

26 points

0 comments1 min readLW link

(electhumans.com)

[Linkpost] Evals for “SPI-incompatible” behavior & reasoning: Guide to initial research

Anthony DiGiovanni9 Jun 2026 13:44 UTC

23 points

0 comments1 min readLW link

(docs.google.com)

Subversion-Resistance for Free from Formal Verification

Adam Chlipala9 Jun 2026 12:01 UTC

7 points

0 comments7 min readLW link

LLMs and almost good code

kqr9 Jun 2026 7:21 UTC

33 points

9 comments3 min readLW link

(entropicthoughts.com)

On Slop

Jan9 Jun 2026 1:08 UTC

32 points

4 comments7 min readLW link

(universalprior.substack.com)

How to build a cancer vaccine, and whether they will work this time

Abhishaike Mahajan8 Jun 2026 20:45 UTC

58 points

9 comments25 min readLW link

(www.owlposting.com)

Efficient tradeoffs and the safety-usefulness tradeoff model

Buck8 Jun 2026 20:28 UTC

42 points

1 comment8 min readLW link

Accelerated Skill Learning via Dream Engineering and Biofeedback

Elliot Callender8 Jun 2026 20:08 UTC

5 points

2 comments3 min readLW link

How valuable are weak AI safety regulations?

MichaelDickens8 Jun 2026 18:24 UTC

28 points

0 comments6 min readLW link

How to reduce capability degradation from off-model SFT

Dylan Xu, SebastianP and Alek Westover

8 Jun 2026 16:24 UTC

21 points

0 comments3 min readLW link

The Next Swan: Frank Ramsey, Variable Hypotheticals, and the Bet on Induction

Ramseyian8 Jun 2026 12:01 UTC

4 points

0 comments18 min readLW link

Coverage-driven alignment—What ‘Teaching Claude Why’ can borrow from AV verification

Yoav Hollander8 Jun 2026 11:42 UTC

16 points

4 comments14 min readLW link

(blog.foretellix.com)

Bun’s Migration from Zig to Rust as a Potential Case Study for Gradual Disempowerment

Sayhan Yalvaçer8 Jun 2026 7:06 UTC

96 points

8 comments3 min readLW link