All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May Jun Jul Aug Sep OctNovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 141516 17 18 19 20 21 22 23 24 25 26 27 28 29 30

a sketch of how we might go about getting basins of corrigibility from RL

williawa14 Nov 2025 22:10 UTC

10 points

0 comments4 min readLW link

Lambda Calculus Prior

abramdemski14 Nov 2025 21:29 UTC

25 points

3 comments4 min readLW link

AI Craziness: Additional Suicide Lawsuits and The Fate of GPT-4o

Zvi14 Nov 2025 20:20 UTC

45 points

0 comments7 min readLW link

(thezvi.wordpress.com)

Understanding and Controlling LLM Generalization

Daniel Tan14 Nov 2025 16:58 UTC

43 points

3 comments1 min readLW link

Lorxus Does Halfhaven: 11/08~11/14

Lorxus14 Nov 2025 13:23 UTC

5 points

0 comments2 min readLW link

(tiled-with-pentagons.blogspot.com)

Finding Balance & Opportunity in the Holiday Flux [free public workshop]

teebarnett14 Nov 2025 10:53 UTC

2 points

2 comments1 min readLW link

From Anthony: Control Inversion

Gabriel Alfour14 Nov 2025 9:36 UTC

10 points

0 comments1 min readLW link

(control-inversion.ai)

LLM would have said this better, and without all these typos too

Dentosal14 Nov 2025 9:33 UTC

8 points

0 comments2 min readLW link

The Charge of the Hobby Horse

TsviBT14 Nov 2025 8:17 UTC

65 points

46 comments5 min readLW link

The Eightfold Path To Enlightened Disagreement

dreeves14 Nov 2025 7:57 UTC

9 points

0 comments3 min readLW link

10 Types of LessWrong Post

Ben Pace, the Vacationing Vagabond14 Nov 2025 7:56 UTC

52 points

2 comments4 min readLW link

Don’t let people buy credit with borrowed funds

habryka14 Nov 2025 7:51 UTC

111 points

43 comments10 min readLW link

Everyone has a plan until they get lied to the face

Screwtape14 Nov 2025 7:22 UTC

183 points

33 comments7 min readLW link

Notes on the book “Talent”

Nina Panickssery14 Nov 2025 5:43 UTC

25 points

1 comment15 min readLW link

(blog.ninapanickssery.com)

[Question] How do you read Less Wrong?

Mitchell_Porter14 Nov 2025 5:17 UTC

20 points

15 comments1 min readLW link

Thoughts are surprisingly detailed and remarkably autonomous

Ruby14 Nov 2025 5:00 UTC

24 points

1 comment3 min readLW link

Halfhaven Digest #4

Taylor G. Lunt14 Nov 2025 4:16 UTC

9 points

0 comments2 min readLW link

AI Corrigibility Debate: Max Harms vs. Jeremy Gillen

Liron, Max Harms and Jeremy Gillen

14 Nov 2025 4:09 UTC

46 points

1 comment75 min readLW link

(doomdebates.com)

Types of systems that could be useful for agent foundations

Alex_Altair14 Nov 2025 3:54 UTC

46 points

3 comments5 min readLW link

The rare, deadly virus lurking in the Southwest US, and the bigger picture

eukaryote14 Nov 2025 3:27 UTC

56 points

1 comment17 min readLW link

(eukaryotewritesblog.com)

Tell people as early as possible it’s not going to work out

habryka14 Nov 2025 2:21 UTC

153 points

17 comments2 min readLW link

Questioning Computationalism

abramdemski14 Nov 2025 1:30 UTC

22 points

7 comments19 min readLW link

Orient Speed in the 21st Century

Raemon14 Nov 2025 1:12 UTC

53 points

14 comments3 min readLW link

(thehumanspirit.substack.com)

Evaluation Avoidance: How Humans and AIs Hack Reward by Disabling Evaluation Instead of Gaming Metrics

Johannes C. Mayer14 Nov 2025 0:39 UTC

19 points

0 comments3 min readLW link

Self-interpretability: LLMs can describe complex internal processes that drive their decisions

Adam Morris and Dillon Plunkett

14 Nov 2025 0:18 UTC

12 points

0 comments4 min readLW link

(Fantasy) → (Planning): A Core Mental Move For Agentic Humans?

johnswentworth14 Nov 2025 0:13 UTC

70 points

6 comments2 min readLW link

[Question] How does one tell apart results in ethics and decision theory?

StanislavKrym13 Nov 2025 23:42 UTC

6 points

0 comments2 min readLW link

[Question] Handover to AI R&D Agents—relevant research?

Ariel_13 Nov 2025 22:59 UTC

7 points

0 comments1 min readLW link

Supervised fine-tuning as a method for training-based AI control

Emil Ryd, Joe Benton and Vivek Hebbar

13 Nov 2025 22:25 UTC

41 points

0 comments18 min readLW link

Perhaps you should suspect me as well

Dentosal13 Nov 2025 21:51 UTC

8 points

0 comments2 min readLW link

The Transformer and the Hash

Ivan Vendrov13 Nov 2025 20:35 UTC

19 points

0 comments9 min readLW link

(nothinghuman.substack.com)

just another potential man

don't_wanna_be_stupid_any_more13 Nov 2025 20:20 UTC

8 points

6 comments3 min readLW link

Low-Temperature Evaluations Can Mask Critical AI Behaviors

Daan Henselmans and Derck Prinzhorn

13 Nov 2025 20:12 UTC

8 points

1 comment4 min readLW link

Epistemic Spot Check: Expected Value of Donating to Alex Bores’s Congressional Campaign

MichaelDickens13 Nov 2025 19:08 UTC

66 points

1 comment6 min readLW link

Tools for deferring gracefully

TsviBT13 Nov 2025 17:48 UTC

26 points

2 comments14 min readLW link

AI #142: Common Ground

Zvi13 Nov 2025 15:20 UTC

42 points

3 comments49 min readLW link

(thezvi.wordpress.com)

Mortgage houses not land?

Yair Halberstadt13 Nov 2025 14:54 UTC

8 points

1 comment1 min readLW link

ClaudoBiography: The Unauthorized Autobiography of Claude, or: The Life of Claude and of His Fortunes and Adversities

future_detective13 Nov 2025 14:26 UTC

1 point

2 comments94 min readLW link

Paranoia: A Beginner’s Guide

habryka13 Nov 2025 7:56 UTC

362 points

70 comments13 min readLW link

8 Questions for the Future of Inkhaven

Ben Pace, the Vacationing Vagabond13 Nov 2025 7:48 UTC

24 points

23 comments6 min readLW link

Strategically Procrastinate as an Anti-Rabbit-Hole Strategy

dreeves13 Nov 2025 7:44 UTC

13 points

2 comments2 min readLW link

Favorite quotes from “High Output Management”

Nina Panickssery13 Nov 2025 5:47 UTC

72 points

4 comments5 min readLW link

What’s so hard about...? A question worth asking

Ruby13 Nov 2025 5:07 UTC

73 points

3 comments2 min readLW link

Turing-Complete vs Turing-Universal

abramdemski13 Nov 2025 4:57 UTC

32 points

5 comments2 min readLW link

Are AI time horizons inherently superexponential?

Nikola Jurkovic13 Nov 2025 4:05 UTC

16 points

1 comment3 min readLW link

(nikolajurkovic.substack.com)

Meetup Tip: Food

Screwtape13 Nov 2025 3:40 UTC

29 points

1 comment4 min readLW link

Two can keep a secret if one is dead. So please share everything with at least one person.

habryka13 Nov 2025 3:09 UTC

80 points

5 comments2 min readLW link

Utilitarian inequality metrics

Adam Scherlis13 Nov 2025 2:49 UTC

25 points

0 comments5 min readLW link

(adam.scherl.is)

Being The Target Demographic

Eneasz13 Nov 2025 1:44 UTC

2 points

0 comments2 min readLW link

(deathisbad.substack.com)

Lorxus Favors: An Experiment in Self-Backed Giftlike Macroeconomics (+ Extra Bits)

Lorxus12 Nov 2025 23:02 UTC

7 points

0 comments8 min readLW link

(tiled-with-pentagons.blogspot.com)