All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 282930 31

How to never make a bad decision

Wes R28 Dec 2025 23:21 UTC

−4 points

0 comments3 min readLW link

Research agenda for training aligned AIs using concave utility functions following the principles of homeostasis and diminishing returns

Roland Pihlakas28 Dec 2025 21:53 UTC

14 points

0 comments8 min readLW link

Training Matching Pursuit SAEs on LLMs

chanind28 Dec 2025 18:57 UTC

19 points

2 comments7 min readLW link

Do LLMs Condition Safety Behaviour on Dialect? Preliminary Evidence

Aakash Rana28 Dec 2025 18:21 UTC

7 points

2 comments5 min readLW link

Meditations on Suffering

MeditationsOnShrimp28 Dec 2025 17:39 UTC

−1 points

0 comments2 min readLW link

November 2025 Links

nomagicpill28 Dec 2025 15:51 UTC

19 points

2 comments7 min readLW link

(nomagicpill.substack.com)

Reviews I: Everyone’s Responsibility

nomagicpill28 Dec 2025 15:48 UTC

2 points

0 comments4 min readLW link

(nomagicpill.substack.com)

Introspection via localization

Victor Godet28 Dec 2025 14:26 UTC

36 points

8 comments3 min readLW link

Crystals in NNs: Technical Companion Piece

Jonas Hallgren28 Dec 2025 10:44 UTC

24 points

5 comments15 min readLW link

Have You Tried Thinking About It As Crystals?

Jonas Hallgren28 Dec 2025 10:44 UTC

77 points

12 comments10 min readLW link

Alignment Is Not One Problem: A 3D Map of AI Risk

Anurag 28 Dec 2025 8:44 UTC

3 points

0 comments14 min readLW link

Orpheus’ Basilisk

pulwat28 Dec 2025 0:43 UTC

22 points

1 comment2 min readLW link

A Conflict Between AI Alignment and Philosophical Competence

Wei Dai27 Dec 2025 21:32 UTC

70 points

14 comments2 min readLW link

Glucose Supplementation for Sustained Stimulant Cognition

Johannes C. Mayer27 Dec 2025 19:58 UTC

34 points

13 comments1 min readLW link

A Brief Proof That You Are Every Conscious Thing

Jason R27 Dec 2025 17:16 UTC

−16 points

15 comments3 min readLW link

Introducing the XLab AI Security Guide

zroe1, jcksanderson and Julian H

27 Dec 2025 16:50 UTC

19 points

1 comment5 min readLW link

Shared Houses Illegal?

jefftk27 Dec 2025 15:10 UTC

56 points

3 comments2 min readLW link

(www.jefftk.com)

Enhance Funding Applications: Share Utility Function Over Money (+Tool)

plex27 Dec 2025 13:02 UTC

35 points

1 comment1 min readLW link

Jailbreaks Peak Early, Then Drop: Layer Trajectories in Llama-3.1-70B

James Hoffend27 Dec 2025 12:39 UTC

13 points

0 comments8 min readLW link

Are We In A Coding Overhang?

Michaël Trazzi27 Dec 2025 8:16 UTC

110 points

14 comments3 min readLW link

Moving Goalposts: Modern Transformer Based Agents Have Been Weak ASI For A Bit Now

JenniferRM27 Dec 2025 7:32 UTC

69 points

39 comments8 min readLW link

Uploaded Human Intelligence

Byron Lee27 Dec 2025 5:28 UTC

8 points

0 comments5 min readLW link

Wanted: Advice for College Students on Weathering the Storm

kudos3l27 Dec 2025 5:27 UTC

20 points

5 comments3 min readLW link

Thoughts on epistemic virtue in science

foodforthought27 Dec 2025 1:06 UTC

12 points

1 comment4 min readLW link

Burnout, depression, and AI safety: some concrete mental health strategies

KatWoods26 Dec 2025 19:52 UTC

45 points

2 comments4 min readLW link

The moral critic of the AI industry—a Q&A with Holly Elmore

Mordechai Rorvig26 Dec 2025 17:49 UTC

8 points

0 comments2 min readLW link

(www.foommagazine.org)

Apply for Alignment Mentorship from TurnTrout and Alex Cloud

TurnTrout and cloud

26 Dec 2025 17:20 UTC

42 points

0 comments2 min readLW link

(turntrout.com)

Measuring no CoT math time horizon (single forward pass)

ryan_greenblatt26 Dec 2025 16:37 UTC

215 points

18 comments3 min readLW link

Whole Brain Emulation as an Anchor for AI Welfare

Sturb26 Dec 2025 14:45 UTC

52 points

13 comments6 min readLW link

Childhood and Education #16: Letting Kids Be Kids

Zvi26 Dec 2025 13:50 UTC

56 points

3 comments18 min readLW link

(thezvi.wordpress.com)

Regression by Composition

Anders_H26 Dec 2025 12:18 UTC

13 points

0 comments1 min readLW link

(rss.org.uk)

Unknown Knowns: Five Ideas You Can’t Unsee

Linch25 Dec 2025 23:28 UTC

75 points

37 comments6 min readLW link

(linch.substack.com)

There’s Room in the Manger

Celer25 Dec 2025 18:00 UTC

20 points

0 comments2 min readLW link

(keller.substack.com)

Call for Science of Eval Awareness (+ Research Directions)

Igor Ivanov25 Dec 2025 17:26 UTC

31 points

24 comments5 min readLW link

AI #148: Christmas Break

Zvi25 Dec 2025 14:00 UTC

31 points

4 comments39 min readLW link

(thezvi.wordpress.com)

Clipboard Normalization

jefftk25 Dec 2025 13:50 UTC

105 points

9 comments1 min readLW link

(www.jefftk.com)

The Intelligence Axis: A Functional Typology

Anurag 25 Dec 2025 12:18 UTC

3 points

0 comments5 min readLW link

Honorable AI

Kaarel24 Dec 2025 21:20 UTC

42 points

23 comments41 min readLW link

Catch-Up Algorithmic Progress Might Actually be 60× per Year

Aaron_Scher24 Dec 2025 21:03 UTC

94 points

16 comments10 min readLW link

The Ones who Feed their Children

xhnk7jwvqj-max24 Dec 2025 19:15 UTC

22 points

2 comments3 min readLW link

[Book Review] “Reality+” by David Chalmers

lsdev24 Dec 2025 19:14 UTC

4 points

0 comments2 min readLW link

Kids and Space

jefftk24 Dec 2025 15:30 UTC

75 points

5 comments3 min readLW link

(www.jefftk.com)

Zvi’s 2025 In Movies

Zvi24 Dec 2025 13:30 UTC

28 points

1 comment11 min readLW link

(thezvi.wordpress.com)

Methodological considerations in making malign initializations for control research

Alek Westover, Vivek Hebbar and Julian Stastny

24 Dec 2025 1:18 UTC

16 points

0 comments13 min readLW link

Immunodeficiency to Parasitic AI

Andrii Shportko24 Dec 2025 0:17 UTC

4 points

1 comment2 min readLW link

An introduction to modular induction and some attempts to solve it

Thomas Kehrenberg23 Dec 2025 22:35 UTC

12 points

1 comment18 min readLW link

Rules clarification for the Write like lsusr competition

Isusr23 Dec 2025 21:12 UTC

8 points

2 comments2 min readLW link

Human Values

Maitreya23 Dec 2025 21:08 UTC

32 points

1 comment3 min readLW link

Alignment Fellowship

rich_anon23 Dec 2025 20:29 UTC

58 points

14 comments1 min readLW link

Iterative Matrix Steering: Forcing LLMs to “Rationalize” Hallucinations via Subspace Alignment

Artem Herasymenko23 Dec 2025 20:13 UTC

10 points

2 comments4 min readLW link