All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 7 8910 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Towards a Categorization of Adlerian Excuses

romeostevensit8 Dec 2025 23:22 UTC

89 points

11 comments6 min readLW link

A Falsifiable Causal Argument for Substrate Independence

rife8 Dec 2025 22:47 UTC

10 points

0 comments5 min readLW link

Prompting Models to Obfuscate Their CoT

Josh Engels and Felix Tudose

8 Dec 2025 21:00 UTC

15 points

4 comments7 min readLW link

Gödel’s Ontological Proof

GenericModel8 Dec 2025 20:49 UTC

19 points

74 comments13 min readLW link

(enrichedjamsham.substack.com)

High-level approaches to rigor in interpretability

David Scott Krueger (formerly: capybaralet)8 Dec 2025 20:46 UTC

24 points

0 comments1 min readLW link

If It Can Learn It, It Can Unlearn It: AI Safety as Architecture, Not Training

Timothy Danforth8 Dec 2025 20:38 UTC

1 point

0 comments4 min readLW link

Human Dignity: a review

owencb8 Dec 2025 20:37 UTC

32 points

0 comments7 min readLW link

(strangecities.substack.com)

A few quick thoughts on measuring disempowerment

David Scott Krueger (formerly: capybaralet)8 Dec 2025 20:03 UTC

29 points

3 comments1 min readLW link

How Stealth Works

Linch8 Dec 2025 19:46 UTC

48 points

5 comments3 min readLW link

(linch.substack.com)

Reward Function Design: a starter pack

Steven Byrnes8 Dec 2025 19:15 UTC

80 points

12 comments16 min readLW link

We need a field of Reward Function Design

Steven Byrnes8 Dec 2025 19:15 UTC

118 points

12 comments5 min readLW link

When circular reasoning is logical evidence

ConformalInfinity8 Dec 2025 19:09 UTC

6 points

7 comments2 min readLW link

I have hope

TristanTrim8 Dec 2025 18:20 UTC

12 points

0 comments2 min readLW link

The Possibility of an Ongoing Moral Catastrophe

Bentham's Bulldog8 Dec 2025 16:40 UTC

10 points

6 comments4 min readLW link

Building an AI Oracle

Gordon Seidoh Worley8 Dec 2025 16:10 UTC

16 points

0 comments6 min readLW link

(www.uncertainupdates.com)

[Paper] Does Self-Evaluation Enable Wireheading in Language Models?

David Africa8 Dec 2025 16:03 UTC

25 points

2 comments2 min readLW link

Algorithmic thermodynamics and three types of optimization

Daniel C and Aram Ebtekar

8 Dec 2025 15:40 UTC

11 points

0 comments12 min readLW link

Little Echo

Zvi8 Dec 2025 15:30 UTC

160 points

15 comments2 min readLW link

(thezvi.wordpress.com)

From Barriers to Alignment to the First Formal Corrigibility Guarantees

Aran Nayebi8 Dec 2025 12:31 UTC

61 points

11 comments11 min readLW link

Scaling what used not to scale

Alexandre Variengien8 Dec 2025 8:40 UTC

11 points

0 comments12 min readLW link

(alexandrevariengien.com)

The effectiveness of systematic thinking

Alexandre Variengien8 Dec 2025 8:38 UTC

12 points

0 comments6 min readLW link

(alexandrevariengien.com)

I said hello and greeted 1,000 people at 5am this morning

Declan Molony8 Dec 2025 3:35 UTC

128 points

7 comments2 min readLW link

Your Digital Footprint Could Make You Unemployable

Declan Molony7 Dec 2025 23:50 UTC

38 points

13 comments3 min readLW link

2025 Unofficial LessWrong Census/Survey

Screwtape7 Dec 2025 22:08 UTC

69 points

33 comments1 min readLW link

AI in 2025: gestalt

technicalities7 Dec 2025 21:25 UTC

248 points

44 comments20 min readLW link

Thinking in Predictions

Julius7 Dec 2025 21:11 UTC

20 points

0 comments8 min readLW link

(thegreymatter.substack.com)

[Linkpost] Theory and AI Alignment (Scott Aaronson)

Oliver Daniels7 Dec 2025 19:17 UTC

15 points

1 comment3 min readLW link

(scottaaronson.blog)

About Natural & Synthetic Beings (Interactive Typology)

Anurag 7 Dec 2025 16:59 UTC

2 points

2 comments3 min readLW link

Lawyers are uniquely well-placed to resist AI job automation

beyarkay7 Dec 2025 16:28 UTC

18 points

18 comments2 min readLW link

(boydkane.com)

[Question] Have there been any rational analyses of mindbody techniques for chronic pain/illness?

Liface7 Dec 2025 16:13 UTC

4 points

5 comments1 min readLW link

How a bug of AI hardware may become a feature for AI governance

Naci Cankaya7 Dec 2025 14:55 UTC

9 points

0 comments1 min readLW link

(nacicankaya.substack.com)

Karlsruhe—If Anyone Builds It, Everyone Dies

wilm7 Dec 2025 14:49 UTC

2 points

0 comments1 min readLW link

Eliezer’s Unteachable Methods of Sanity

Eliezer Yudkowsky7 Dec 2025 2:46 UTC

491 points

147 comments10 min readLW link

Ordering Pizza Ahead While Driving

jefftk7 Dec 2025 2:01 UTC

22 points

0 comments1 min readLW link

(www.jefftk.com)

Existential despair, with hope

foodforthought6 Dec 2025 20:48 UTC

10 points

0 comments1 min readLW link

I Need Your Help

Jaivardhan Nawani6 Dec 2025 18:48 UTC

8 points

1 comment1 min readLW link

Crazy ideas in AI Safety part 1: Easy Measurable Communication

Valentin20266 Dec 2025 17:59 UTC

7 points

0 comments2 min readLW link

The corrigibility basin of attraction is a misleading gloss

Jeremy Gillen6 Dec 2025 15:38 UTC

92 points

37 comments18 min readLW link

LW Transcendence

Annabelle6 Dec 2025 6:53 UTC

9 points

0 comments2 min readLW link

The Adequacy of Class Separation

milanrosko6 Dec 2025 6:10 UTC

4 points

0 comments5 min readLW link

Answering a child’s questions

Alex_Altair6 Dec 2025 3:52 UTC

39 points

0 comments6 min readLW link

AI Mood Ring: A Window Into LLM Emotions

michaelwaves6 Dec 2025 2:56 UTC

7 points

0 comments2 min readLW link

Critical Meditation Theory

lsusr6 Dec 2025 2:24 UTC

57 points

11 comments2 min readLW link

Tools, Agents, and Sycophantic Things

Eleni Angelou6 Dec 2025 1:50 UTC

25 points

0 comments4 min readLW link

What Happens When You Train Models on False Facts?

David Vella Zarb6 Dec 2025 1:39 UTC

16 points

2 comments7 min readLW link

why america can’t build ships

bhauth6 Dec 2025 0:35 UTC

92 points

18 comments6 min readLW link

(www.bhauth.com)

An Ambitious Vision for Interpretability

leogao5 Dec 2025 22:57 UTC

168 points

7 comments4 min readLW link

Reasons to care about Canary Strings

Alice Blair5 Dec 2025 21:41 UTC

27 points

3 comments2 min readLW link

An AI-2027-like analysis of humans’ goals and ethics with conservative results

StanislavKrym5 Dec 2025 21:37 UTC

6 points

0 comments4 min readLW link

Management of Substrate-Sensitive AI Capabilities (MoSSAIC) Part 3: Resolution

mfatt and Sahil

5 Dec 2025 18:58 UTC

10 points

0 comments9 min readLW link