All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May Jun Jul Aug SepOctNov Dec

All 1 2 3 4 5 6 7 8 91011 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Stars are a rounding error

Algon9 Oct 2025 23:35 UTC

79 points

20 comments3 min readLW link

Towards a Typology of Strange LLM Chains-of-Thought

1a3orn9 Oct 2025 22:02 UTC

312 points

29 comments9 min readLW link

Training Qwen-1.5B with a CoT legibility penalty

Fabien Roger9 Oct 2025 21:33 UTC

68 points

7 comments4 min readLW link

Interview with a drone expert on the future of AI warfare

NunoSempere and rai sur

9 Oct 2025 20:16 UTC

33 points

0 comments25 min readLW link

(blog.sentinel-team.org)

Investigating Neural Scaling Laws Emerging from Deep Data Structure

Nathaniel Mitrani and Ari Brill

9 Oct 2025 20:11 UTC

6 points

0 comments8 min readLW link

I take antidepressants. You’re welcome

Elizabeth9 Oct 2025 19:30 UTC

284 points

11 comments3 min readLW link

(acesounderglass.com)

Training fails to elicit subtle reasoning in current language models

mishajw, Fabien Roger, Hoagy, gasteigerjo, Joe Benton and Vlad Mikulik

9 Oct 2025 19:04 UTC

49 points

3 comments4 min readLW link

(alignment.anthropic.com)

Realistic Reward Hacking Induces Different and Deeper Misalignment

Jozdien9 Oct 2025 18:45 UTC

146 points

2 comments23 min readLW link

Why am I not currently starting a religion around AI or similar topics?

samuelshadrach9 Oct 2025 18:31 UTC

8 points

2 comments18 min readLW link

(samuelshadrach.com)

The Underexplored Prospects of Benevolent Superintelligences—PART 1: THE WISE, THE GOOD, THE POWERFUL

Jesper L.9 Oct 2025 17:49 UTC

3 points

7 comments25 min readLW link

“Yes, and—” Requires the Possibility of “No, Because—”

Zack_M_Davis9 Oct 2025 17:39 UTC

42 points

4 comments3 min readLW link

(zackmdavis.net)

Four Questions to Refine Your Policy Proposal

Mass_Driver9 Oct 2025 16:30 UTC

11 points

2 comments6 min readLW link

A Snippet On The Epistemically Hygienic Containment Of Faith-In-Reason-Itself

JenniferRM9 Oct 2025 16:19 UTC

10 points

0 comments1 min readLW link

Alignment progress doesn’t compensate for higher capabilities

Joe Rogero9 Oct 2025 16:06 UTC

4 points

0 comments6 min readLW link

The Thinking Machines Tinker API is good news for AI control and security

Buck9 Oct 2025 15:22 UTC

92 points

10 comments6 min readLW link

Biouploading: Preserving My Living Neurons and Connectome as a Spatially Distributed Mesh

avturchin9 Oct 2025 15:19 UTC

16 points

0 comments3 min readLW link

self reflections of a striver

thiccythot9 Oct 2025 14:59 UTC

18 points

0 comments8 min readLW link

Hospitalization: A Review

Logan Riggs9 Oct 2025 14:36 UTC

380 points

21 comments9 min readLW link

AI #137: An OpenAI App For That

Zvi9 Oct 2025 14:00 UTC

32 points

4 comments57 min readLW link

(thezvi.wordpress.com)

CRC Follow-up Report v1.0 — OpenAI Feedback Integration Edition

Seira9 Oct 2025 6:12 UTC

−4 points

2 comments2 min readLW link

[Question] Are We Leaving Literature To The Psychotic?

Yitz9 Oct 2025 6:09 UTC

13 points

4 comments1 min readLW link

Lessons from the Mountains

Philipreal9 Oct 2025 4:10 UTC

15 points

2 comments3 min readLW link

Probabilistic Societies

Benjamin_Sturisky9 Oct 2025 4:08 UTC

0 points

0 comments3 min readLW link

Inverting the Most Forbidden Technique: What happens when we train LLMs to lie detectably?

Peter Jordan9 Oct 2025 0:43 UTC

21 points

4 comments4 min readLW link

Inoculation prompting: Instructing models to misbehave at train-time can improve run-time behavior

Sam Marks, Nevan Wichers, Daniel Tan, Aram Ebtekar, Jozdien, David Africa, Alex Mallen and Fabien Roger

8 Oct 2025 22:02 UTC

177 points

37 comments2 min readLW link

NEPA, Permitting and Energy Roundup #2

Zvi8 Oct 2025 20:20 UTC

27 points

1 comment28 min readLW link

(thezvi.wordpress.com)

What shapes does reasoning take but circular?

Algon8 Oct 2025 20:18 UTC

9 points

2 comments2 min readLW link

The Oracle’s Gift

Karthik Tadepalli8 Oct 2025 20:13 UTC

5 points

1 comment3 min readLW link

Thinking Mathematically—Convergent Sequences

Yair Halberstadt8 Oct 2025 19:44 UTC

18 points

5 comments4 min readLW link

The Relationship Between Social Punishment and Shared Maps

Zack_M_Davis8 Oct 2025 19:38 UTC

64 points

14 comments4 min readLW link

(zackmdavis.net)

IABIED: Paradigm Confusion and Overconfidence

PeterMcCluskey8 Oct 2025 19:19 UTC

12 points

14 comments11 min readLW link

(bayesianinvestor.com)

The Wise Baboon of Loyalty

Zander_Drax8 Oct 2025 18:48 UTC

13 points

0 comments4 min readLW link

Spooky Collusion at a Distance with Superrational AI

bira8 Oct 2025 18:13 UTC

79 points

9 comments6 min readLW link

The Architecture of the Narcissistic False Self

Dawn Drescher8 Oct 2025 17:39 UTC

4 points

0 comments12 min readLW link

(impartial-priorities.org)

Reflections on The Curve 2025

Gordon Seidoh Worley8 Oct 2025 17:20 UTC

18 points

0 comments2 min readLW link

(www.uncertainupdates.com)

Plans A, B, C, and D for misalignment risk

ryan_greenblatt8 Oct 2025 17:18 UTC

139 points

78 comments6 min readLW link

Halfhaven Digest #1

Taylor G. Lunt8 Oct 2025 14:24 UTC

15 points

0 comments3 min readLW link

Three Paths Through Manifold

Aleph Head, Ashe Vazquez Nuñez and Yulia

8 Oct 2025 13:48 UTC

10 points

1 comment17 min readLW link

(open.substack.com)

The “cool idea” bias

James Diacoumis8 Oct 2025 12:29 UTC

18 points

2 comments3 min readLW link

(jamesdiacoumis.substack.com)

Irresponsible Companies Can Be Made of Responsible Employees

VojtaKovarik8 Oct 2025 11:47 UTC

80 points

16 comments5 min readLW link

Heaven, Hell, and Mechanics

Chris Scammell8 Oct 2025 11:05 UTC

46 points

5 comments3 min readLW link

10 Ways to Waste a Decade

Taylor G. Lunt8 Oct 2025 2:51 UTC

13 points

4 comments5 min readLW link

You Should Get a Reusable Mask

jefftk8 Oct 2025 2:40 UTC

104 points

32 comments1 min readLW link

(www.jefftk.com)

Replacing RL w/ Parameter-based Evolutionary Strategies

Logan Riggs8 Oct 2025 1:02 UTC

64 points

5 comments3 min readLW link

Intent alignment seems incoherent

Joe Rogero7 Oct 2025 23:01 UTC

24 points

2 comments6 min readLW link

Petri: An open-source auditing tool to accelerate AI safety research

Sam Marks7 Oct 2025 20:39 UTC

77 points

0 comments1 min readLW link

(alignment.anthropic.com)

Bending The Curve

Zvi7 Oct 2025 20:00 UTC

91 points

12 comments21 min readLW link

(thezvi.wordpress.com)

Kairos is hiring: Founding Generalist & SPAR Contractor

agucova7 Oct 2025 18:43 UTC

8 points

0 comments4 min readLW link

Messy on Purpose: Part 2 of A Conservative Vision for the Future

Davidmanheim and Ram Rachum

7 Oct 2025 17:00 UTC

17 points

3 comments12 min readLW link

Going Phoneless

Rob Ennals7 Oct 2025 16:40 UTC

18 points

5 comments5 min readLW link

(messyprogress.substack.com)