Towards a Cat­e­go­riza­tion of Adle­rian Excuses

romeostevensit8 Dec 2025 23:22 UTC
89 points
11 comments6 min readLW link

A Falsifi­able Causal Ar­gu­ment for Sub­strate Independence

rife8 Dec 2025 22:47 UTC
10 points
0 comments5 min readLW link

Prompt­ing Models to Obfus­cate Their CoT

8 Dec 2025 21:00 UTC
15 points
4 comments7 min readLW link

Gödel’s On­tolog­i­cal Proof

GenericModel8 Dec 2025 20:49 UTC
19 points
74 comments13 min readLW link
(enrichedjamsham.substack.com)

High-level ap­proaches to rigor in interpretability

David Scott Krueger (formerly: capybaralet)8 Dec 2025 20:46 UTC
24 points
0 comments1 min readLW link

If It Can Learn It, It Can Un­learn It: AI Safety as Ar­chi­tec­ture, Not Training

Timothy Danforth8 Dec 2025 20:38 UTC
1 point
0 comments4 min readLW link

Hu­man Dig­nity: a review

owencb8 Dec 2025 20:37 UTC
32 points
0 comments7 min readLW link
(strangecities.substack.com)

A few quick thoughts on mea­sur­ing disempowerment

David Scott Krueger (formerly: capybaralet)8 Dec 2025 20:03 UTC
29 points
3 comments1 min readLW link

How Stealth Works

Linch8 Dec 2025 19:46 UTC
48 points
5 comments3 min readLW link
(linch.substack.com)

Re­ward Func­tion De­sign: a starter pack

Steven Byrnes8 Dec 2025 19:15 UTC
80 points
12 comments16 min readLW link

We need a field of Re­ward Func­tion Design

Steven Byrnes8 Dec 2025 19:15 UTC
118 points
12 comments5 min readLW link

When cir­cu­lar rea­son­ing is log­i­cal evidence

ConformalInfinity8 Dec 2025 19:09 UTC
6 points
7 comments2 min readLW link

I have hope

TristanTrim8 Dec 2025 18:20 UTC
12 points
0 comments2 min readLW link

The Pos­si­bil­ity of an On­go­ing Mo­ral Catastrophe

Bentham's Bulldog8 Dec 2025 16:40 UTC
10 points
6 comments4 min readLW link

Build­ing an AI Oracle

Gordon Seidoh Worley8 Dec 2025 16:10 UTC
16 points
0 comments6 min readLW link
(www.uncertainupdates.com)

[Paper] Does Self-Eval­u­a­tion En­able Wire­head­ing in Lan­guage Models?

David Africa8 Dec 2025 16:03 UTC
25 points
2 comments2 min readLW link

Al­gorith­mic ther­mo­dy­nam­ics and three types of op­ti­miza­tion

8 Dec 2025 15:40 UTC
11 points
0 comments12 min readLW link

Lit­tle Echo

Zvi8 Dec 2025 15:30 UTC
160 points
15 comments2 min readLW link
(thezvi.wordpress.com)

From Bar­ri­ers to Align­ment to the First For­mal Cor­rigi­bil­ity Guarantees

Aran Nayebi8 Dec 2025 12:31 UTC
61 points
11 comments11 min readLW link

Scal­ing what used not to scale

Alexandre Variengien8 Dec 2025 8:40 UTC
11 points
0 comments12 min readLW link
(alexandrevariengien.com)

The effec­tive­ness of sys­tem­atic thinking

Alexandre Variengien8 Dec 2025 8:38 UTC
12 points
0 comments6 min readLW link
(alexandrevariengien.com)

I said hello and greeted 1,000 peo­ple at 5am this morning

Declan Molony8 Dec 2025 3:35 UTC
128 points
7 comments2 min readLW link

Your Digi­tal Foot­print Could Make You Unemployable

Declan Molony7 Dec 2025 23:50 UTC
38 points
13 comments3 min readLW link

2025 Unoffi­cial LessWrong Cen­sus/​Survey

Screwtape7 Dec 2025 22:08 UTC
69 points
33 comments1 min readLW link

AI in 2025: gestalt

technicalities7 Dec 2025 21:25 UTC
248 points
44 comments20 min readLW link

Think­ing in Predictions

Julius7 Dec 2025 21:11 UTC
20 points
0 comments8 min readLW link
(thegreymatter.substack.com)

[Linkpost] The­ory and AI Align­ment (Scott Aaron­son)

Oliver Daniels7 Dec 2025 19:17 UTC
15 points
1 comment3 min readLW link
(scottaaronson.blog)

About Nat­u­ral & Syn­thetic Be­ings (In­ter­ac­tive Ty­pol­ogy)

Anurag 7 Dec 2025 16:59 UTC
2 points
2 comments3 min readLW link

Lawyers are uniquely well-placed to re­sist AI job automation

beyarkay7 Dec 2025 16:28 UTC
18 points
18 comments2 min readLW link
(boydkane.com)

[Question] Have there been any ra­tio­nal analy­ses of mind­body tech­niques for chronic pain/​ill­ness?

Liface7 Dec 2025 16:13 UTC
4 points
5 comments1 min readLW link

How a bug of AI hard­ware may be­come a fea­ture for AI governance

Naci Cankaya7 Dec 2025 14:55 UTC
9 points
0 comments1 min readLW link
(nacicankaya.substack.com)

Kar­ls­ruhe—If Any­one Builds It, Every­one Dies

wilm7 Dec 2025 14:49 UTC
2 points
0 comments1 min readLW link

Eliezer’s Un­teach­able Meth­ods of Sanity

Eliezer Yudkowsky7 Dec 2025 2:46 UTC
491 points
147 comments10 min readLW link

Order­ing Pizza Ahead While Driving

jefftk7 Dec 2025 2:01 UTC
22 points
0 comments1 min readLW link
(www.jefftk.com)

Ex­is­ten­tial de­spair, with hope

foodforthought6 Dec 2025 20:48 UTC
10 points
0 comments1 min readLW link

I Need Your Help

Jaivardhan Nawani6 Dec 2025 18:48 UTC
8 points
1 comment1 min readLW link

Crazy ideas in AI Safety part 1: Easy Mea­surable Com­mu­ni­ca­tion

Valentin20266 Dec 2025 17:59 UTC
7 points
0 comments2 min readLW link

The cor­rigi­bil­ity basin of at­trac­tion is a mis­lead­ing gloss

Jeremy Gillen6 Dec 2025 15:38 UTC
92 points
37 comments18 min readLW link

LW Transcendence

Annabelle6 Dec 2025 6:53 UTC
9 points
0 comments2 min readLW link

The Ad­e­quacy of Class Separation

milanrosko6 Dec 2025 6:10 UTC
4 points
0 comments5 min readLW link

An­swer­ing a child’s questions

Alex_Altair6 Dec 2025 3:52 UTC
39 points
0 comments6 min readLW link

AI Mood Ring: A Win­dow Into LLM Emotions

michaelwaves6 Dec 2025 2:56 UTC
7 points
0 comments2 min readLW link

Crit­i­cal Med­i­ta­tion Theory

lsusr6 Dec 2025 2:24 UTC
57 points
11 comments2 min readLW link

Tools, Agents, and Sy­co­phan­tic Things

Eleni Angelou6 Dec 2025 1:50 UTC
25 points
0 comments4 min readLW link

What Hap­pens When You Train Models on False Facts?

David Vella Zarb6 Dec 2025 1:39 UTC
16 points
2 comments7 min readLW link

why amer­ica can’t build ships

bhauth6 Dec 2025 0:35 UTC
92 points
18 comments6 min readLW link
(www.bhauth.com)

An Am­bi­tious Vi­sion for Interpretability

leogao5 Dec 2025 22:57 UTC
168 points
7 comments4 min readLW link

Rea­sons to care about Ca­nary Strings

Alice Blair5 Dec 2025 21:41 UTC
27 points
3 comments2 min readLW link

An AI-2027-like anal­y­sis of hu­mans’ goals and ethics with con­ser­va­tive results

StanislavKrym5 Dec 2025 21:37 UTC
6 points
0 comments4 min readLW link

Man­age­ment of Sub­strate-Sen­si­tive AI Ca­pa­bil­ities (MoSSAIC) Part 3: Resolution

5 Dec 2025 18:58 UTC
10 points
0 comments9 min readLW link