How to never make a bad decision

Wes R28 Dec 2025 23:21 UTC
−4 points
0 comments3 min readLW link

Re­search agenda for train­ing al­igned AIs us­ing con­cave util­ity func­tions fol­low­ing the prin­ci­ples of home­osta­sis and diminish­ing returns

Roland Pihlakas28 Dec 2025 21:53 UTC
14 points
0 comments8 min readLW link

Train­ing Match­ing Pur­suit SAEs on LLMs

chanind28 Dec 2025 18:57 UTC
19 points
2 comments7 min readLW link

Do LLMs Con­di­tion Safety Be­havi­our on Dialect? Pre­limi­nary Evidence

Aakash Rana28 Dec 2025 18:21 UTC
7 points
2 comments5 min readLW link

Med­i­ta­tions on Suffer­ing

MeditationsOnShrimp28 Dec 2025 17:39 UTC
−1 points
0 comments2 min readLW link

Novem­ber 2025 Links

nomagicpill28 Dec 2025 15:51 UTC
19 points
2 comments7 min readLW link
(nomagicpill.substack.com)

Re­views I: Every­one’s Responsibility

nomagicpill28 Dec 2025 15:48 UTC
2 points
0 comments4 min readLW link
(nomagicpill.substack.com)

In­tro­spec­tion via localization

Victor Godet28 Dec 2025 14:26 UTC
36 points
8 comments3 min readLW link

Crys­tals in NNs: Tech­ni­cal Com­pan­ion Piece

Jonas Hallgren28 Dec 2025 10:44 UTC
24 points
5 comments15 min readLW link

Have You Tried Think­ing About It As Crys­tals?

Jonas Hallgren28 Dec 2025 10:44 UTC
77 points
12 comments10 min readLW link

Align­ment Is Not One Prob­lem: A 3D Map of AI Risk

Anurag 28 Dec 2025 8:44 UTC
3 points
0 comments14 min readLW link

Or­pheus’ Basilisk

pulwat28 Dec 2025 0:43 UTC
22 points
1 comment2 min readLW link

A Con­flict Between AI Align­ment and Philo­soph­i­cal Competence

Wei Dai27 Dec 2025 21:32 UTC
70 points
14 comments2 min readLW link

Glu­cose Sup­ple­men­ta­tion for Sus­tained Stim­u­lant Cognition

Johannes C. Mayer27 Dec 2025 19:58 UTC
34 points
13 comments1 min readLW link

A Brief Proof That You Are Every Con­scious Thing

Jason R27 Dec 2025 17:16 UTC
−16 points
15 comments3 min readLW link

In­tro­duc­ing the XLab AI Se­cu­rity Guide

27 Dec 2025 16:50 UTC
19 points
1 comment5 min readLW link

Shared Houses Ille­gal?

jefftk27 Dec 2025 15:10 UTC
56 points
3 comments2 min readLW link
(www.jefftk.com)

En­hance Fund­ing Ap­pli­ca­tions: Share Utility Func­tion Over Money (+Tool)

plex27 Dec 2025 13:02 UTC
35 points
1 comment1 min readLW link

Jailbreaks Peak Early, Then Drop: Layer Tra­jec­to­ries in Llama-3.1-70B

James Hoffend27 Dec 2025 12:39 UTC
13 points
0 comments8 min readLW link

Are We In A Cod­ing Over­hang?

Michaël Trazzi27 Dec 2025 8:16 UTC
110 points
14 comments3 min readLW link

Mov­ing Goal­posts: Modern Trans­former Based Agents Have Been Weak ASI For A Bit Now

JenniferRM27 Dec 2025 7:32 UTC
69 points
39 comments8 min readLW link

Uploaded Hu­man Intelligence

Byron Lee27 Dec 2025 5:28 UTC
8 points
0 comments5 min readLW link

Wanted: Ad­vice for Col­lege Stu­dents on Weather­ing the Storm

kudos3l27 Dec 2025 5:27 UTC
20 points
5 comments3 min readLW link

Thoughts on epistemic virtue in science

foodforthought27 Dec 2025 1:06 UTC
12 points
1 comment4 min readLW link

Burnout, de­pres­sion, and AI safety: some con­crete men­tal health strategies

KatWoods26 Dec 2025 19:52 UTC
45 points
2 comments4 min readLW link

The moral critic of the AI in­dus­try—a Q&A with Holly Elmore

Mordechai Rorvig26 Dec 2025 17:49 UTC
8 points
0 comments2 min readLW link
(www.foommagazine.org)

Ap­ply for Align­ment Men­tor­ship from TurnTrout and Alex Cloud

26 Dec 2025 17:20 UTC
42 points
0 comments2 min readLW link
(turntrout.com)

Mea­sur­ing no CoT math time hori­zon (sin­gle for­ward pass)

ryan_greenblatt26 Dec 2025 16:37 UTC
215 points
18 comments3 min readLW link

Whole Brain Emu­la­tion as an An­chor for AI Welfare

Sturb26 Dec 2025 14:45 UTC
52 points
13 comments6 min readLW link

Child­hood and Ed­u­ca­tion #16: Let­ting Kids Be Kids

Zvi26 Dec 2025 13:50 UTC
56 points
3 comments18 min readLW link
(thezvi.wordpress.com)

Re­gres­sion by Composition

Anders_H26 Dec 2025 12:18 UTC
13 points
0 comments1 min readLW link
(rss.org.uk)

Un­known Knowns: Five Ideas You Can’t Unsee

Linch25 Dec 2025 23:28 UTC
75 points
37 comments6 min readLW link
(linch.substack.com)

There’s Room in the Manger

Celer25 Dec 2025 18:00 UTC
20 points
0 comments2 min readLW link
(keller.substack.com)

Call for Science of Eval Aware­ness (+ Re­search Direc­tions)

Igor Ivanov25 Dec 2025 17:26 UTC
31 points
24 comments5 min readLW link

AI #148: Christ­mas Break

Zvi25 Dec 2025 14:00 UTC
31 points
4 comments39 min readLW link
(thezvi.wordpress.com)

Clip­board Normalization

jefftk25 Dec 2025 13:50 UTC
105 points
9 comments1 min readLW link
(www.jefftk.com)

The In­tel­li­gence Axis: A Func­tional Ty­pol­ogy

Anurag 25 Dec 2025 12:18 UTC
3 points
0 comments5 min readLW link

Honor­able AI

Kaarel24 Dec 2025 21:20 UTC
42 points
23 comments41 min readLW link

Catch-Up Al­gorith­mic Progress Might Ac­tu­ally be 60× per Year

Aaron_Scher24 Dec 2025 21:03 UTC
94 points
16 comments10 min readLW link

The Ones who Feed their Children

xhnk7jwvqj-max24 Dec 2025 19:15 UTC
22 points
2 comments3 min readLW link

[Book Re­view] “Real­ity+” by David Chalmers

lsdev24 Dec 2025 19:14 UTC
4 points
0 comments2 min readLW link

Kids and Space

jefftk24 Dec 2025 15:30 UTC
75 points
5 comments3 min readLW link
(www.jefftk.com)

Zvi’s 2025 In Movies

Zvi24 Dec 2025 13:30 UTC
28 points
1 comment11 min readLW link
(thezvi.wordpress.com)

Method­olog­i­cal con­sid­er­a­tions in mak­ing ma­lign ini­tial­iza­tions for con­trol research

24 Dec 2025 1:18 UTC
16 points
0 comments13 min readLW link

Im­mun­odefi­ciency to Par­a­sitic AI

Andrii Shportko24 Dec 2025 0:17 UTC
4 points
1 comment2 min readLW link

An in­tro­duc­tion to mod­u­lar in­duc­tion and some at­tempts to solve it

Thomas Kehrenberg23 Dec 2025 22:35 UTC
12 points
1 comment18 min readLW link

Rules clar­ifi­ca­tion for the Write like lsusr competition

Isusr23 Dec 2025 21:12 UTC
8 points
2 comments2 min readLW link

Hu­man Values

Maitreya23 Dec 2025 21:08 UTC
32 points
1 comment3 min readLW link

Align­ment Fellowship

rich_anon23 Dec 2025 20:29 UTC
58 points
14 comments1 min readLW link

Iter­a­tive Ma­trix Steer­ing: Forc­ing LLMs to “Ra­tion­al­ize” Hal­lu­ci­na­tions via Sub­space Alignment

Artem Herasymenko23 Dec 2025 20:13 UTC
10 points
2 comments4 min readLW link