All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr MayJunJul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 171819 20 21 22 23 24 25 26 27 28 29 30

Fictional Thinking and Real Thinking

johnswentworth17 Jun 2025 19:13 UTC

58 points

11 comments4 min readLW link

The Curious Case of the bos_token

larry-dial17 Jun 2025 19:00 UTC

26 points

4 comments10 min readLW link

AISN #57: The RAISE Act

Corin Katzke and Dan H

17 Jun 2025 18:02 UTC

6 points

0 comments3 min readLW link

(newsletter.safe.ai)

AI Safety at the Frontier: Paper Highlights, May ’25

gasteigerjo17 Jun 2025 17:16 UTC

6 points

0 comments8 min readLW link

(aisafetyfrontier.substack.com)

[Linkpost] The lethal trifecta for AI agents: private data, untrusted content, and external communication

Gunnar_Zarncke17 Jun 2025 16:09 UTC

13 points

3 comments1 min readLW link

(simonwillison.net)

Agentic Interpretability: A Strategy Against Gradual Disempowerment

beenkim and Neel Nanda

17 Jun 2025 14:52 UTC

17 points

6 comments2 min readLW link

Prover-Estimator Debate: A New Scalable Oversight Protocol

Jonah Brown-Cohen and Geoffrey Irving

17 Jun 2025 13:53 UTC

89 points

19 comments5 min readLW link

o3 Turns Pro

Zvi17 Jun 2025 13:50 UTC

30 points

1 comment14 min readLW link

(thezvi.wordpress.com)

Watch R1 “think” with animated chains of thought

future_detective17 Jun 2025 10:38 UTC

4 points

0 comments1 min readLW link

(github.com)

Serving LLM on Huawei CloudMatrix

sanxiyn17 Jun 2025 5:59 UTC

25 points

7 comments1 min readLW link

(arxiv.org)

Personal agents

Roman Leventov17 Jun 2025 2:05 UTC

9 points

1 comment7 min readLW link

I made a card game to reduce cognitive biases and logical fallacies but I’m not sure what DV to test in a study on its effectiveness.

Brad Dunn17 Jun 2025 1:02 UTC

50 points

15 comments5 min readLW link

Notes on Meetup Ideas

Commander Zander17 Jun 2025 0:11 UTC

12 points

4 comments2 min readLW link

Darkness Meditation—for NZ Winter Solstice 2025

joshuamerriam16 Jun 2025 23:58 UTC

2 points

0 comments4 min readLW link

[Question] Are superhuman savants real?

Bunthut16 Jun 2025 22:02 UTC

15 points

4 comments1 min readLW link

Ok, AI Can Write Pretty Good Fiction Now

JustisMills16 Jun 2025 21:13 UTC

59 points

34 comments6 min readLW link

(justismills.substack.com)

Subjective experience is most likely physical

martinkunev16 Jun 2025 20:54 UTC

5 points

3 comments4 min readLW link

VLMs can Aggregate Scattered Training Patches

LINGJIE CHEN16 Jun 2025 18:25 UTC

2 points

0 comments4 min readLW link

Setpoint = The experience we attend to

jimmy16 Jun 2025 17:34 UTC

22 points

0 comments7 min readLW link

Thought Crime: Backdoors & Emergent Misalignment in Reasoning Models

James Chua and Owain_Evans

16 Jun 2025 16:43 UTC

69 points

2 comments8 min readLW link

How LLM Beliefs Change During Chain-of-Thought Reasoning

Filip Sondej, Petr Kašpárek, alex-kazda and Tomáš Gavenčiak

16 Jun 2025 16:18 UTC

32 points

3 comments5 min readLW link

Convergent Linear Representations of Emergent Misalignment

Anna Soligo, Edward Turner, Senthooran Rajamanoharan and Neel Nanda

16 Jun 2025 15:47 UTC

77 points

1 comment8 min readLW link

Model Organisms for Emergent Misalignment

Anna Soligo, Edward Turner, Mia Taylor, Senthooran Rajamanoharan and Neel Nanda

16 Jun 2025 15:46 UTC

120 points

19 comments5 min readLW link

Coaching AI: A Relational Approach to AI Safety

Priyanka Bharadwaj16 Jun 2025 15:33 UTC

12 points

0 comments5 min readLW link

Memories of the Neutral Zone

Jordan Rubin16 Jun 2025 15:33 UTC

7 points

0 comments3 min readLW link

(jordanmrubin.substack.com)

Do LLMs Comply Differently During Tests? Is This a Hidden Variable in Safety Evaluation? And Can We Steer That?

Sahar Abdelnabi16 Jun 2025 13:52 UTC

18 points

1 comment6 min readLW link

RTFB: The RAISE Act

Zvi16 Jun 2025 12:50 UTC

99 points

8 comments8 min readLW link

(thezvi.wordpress.com)

[Question] Galaxy-Brain Hobo Antibiotics?

Lorec16 Jun 2025 12:43 UTC

3 points

9 comments4 min readLW link

The EU commission seeks expert advisers on AI

PabloAMC16 Jun 2025 12:28 UTC

7 points

0 comments1 min readLW link

Double Crux: Master the art of productive disagreement

marta_k16 Jun 2025 11:15 UTC

2 points

0 comments1 min readLW link

From Paperclips to Bombs: The Evolution of AI Risk Discourse on LessWrong

David Harket16 Jun 2025 5:16 UTC

3 points

0 comments24 min readLW link

Donutting is bad

Jarrah16 Jun 2025 4:12 UTC

20 points

4 comments1 min readLW link

Futarchy using a sealed-bid auction to avoid liquidity problems

Christopher King16 Jun 2025 1:34 UTC

21 points

6 comments8 min readLW link

Memory Decoding Journal Club: Neocortical synaptic engrams for remote contextual memories

Devin Ward15 Jun 2025 23:22 UTC

1 point

0 comments1 min readLW link

Every Major LLM Endorses Newcomb One-Boxing

Jack Thompson15 Jun 2025 20:44 UTC

20 points

13 comments1 min readLW link

(jacktlab.substack.com)

Can We Change the Goals of a Toy RL Agent?

tuphs and Adrià Garriga-alonso

15 Jun 2025 20:34 UTC

20 points

0 comments9 min readLW link

Some reprogenetics-related projects you could help with

TsviBT15 Jun 2025 20:25 UTC

80 points

1 comment4 min readLW link

Risk Tokens: Economic Security in AI Safety

mhdempsey15 Jun 2025 19:25 UTC

1 point

0 comments6 min readLW link

(www.michaeldempsey.me)

Aligned monetization of modern dating

kwang15 Jun 2025 16:01 UTC

0 points

0 comments3 min readLW link

(kevw.substack.com)

Intelligence Is Not Magic, But Your Threshold For “Magic” Is Pretty Low

Expertium15 Jun 2025 15:23 UTC

226 points

27 comments1 min readLW link

Estrogen: A trip report

cube_flipper15 Jun 2025 13:15 UTC

166 points

42 comments27 min readLW link

(smoothbrains.net)

[Question] Do multimodal LLMs (like 4o) use OCR under the hood to read dense text in images?

2PuNCheeZ15 Jun 2025 11:20 UTC

4 points

1 comment1 min readLW link

Book review: Air-borne by Carl Zimmer

eukaryote15 Jun 2025 5:49 UTC

34 points

0 comments11 min readLW link

(eukaryotewritesblog.com)

My favorite Soviet songs

Nina Panickssery15 Jun 2025 2:48 UTC

22 points

1 comment5 min readLW link

(ninapanickssery.substack.com)

Side quests in curriculum learning and regularization

Sandy Fraser15 Jun 2025 2:03 UTC

6 points

0 comments10 min readLW link

AXRP Episode 43 - David Lindner on Myopic Optimization with Non-myopic Approval

DanielFilan15 Jun 2025 1:20 UTC

12 points

0 comments56 min readLW link

Jailbreaking Claude 4 and Other Frontier Language Models

James Sullivan15 Jun 2025 0:31 UTC

1 point

0 comments3 min readLW link

(open.substack.com)

Endometriosis is an incredibly interesting disease

Abhishaike Mahajan14 Jun 2025 22:14 UTC

167 points

5 comments16 min readLW link

(www.owlposting.com)

Field Notes from Shipping Real Code with Claude

creatorrr14 Jun 2025 16:36 UTC

22 points

0 comments12 min readLW link

(diwank.space)

Training Superior Sparse Autoencoders for Instruct Models

Haoran Ye14 Jun 2025 16:35 UTC

4 points

0 comments7 min readLW link