a sketch of how we might go about get­ting bas­ins of cor­rigi­bil­ity from RL

williawa14 Nov 2025 22:10 UTC
10 points
0 comments4 min readLW link

Lambda Calcu­lus Prior

abramdemski14 Nov 2025 21:29 UTC
25 points
3 comments4 min readLW link

AI Craz­i­ness: Ad­di­tional Suicide Law­suits and The Fate of GPT-4o

Zvi14 Nov 2025 20:20 UTC
45 points
0 comments7 min readLW link
(thezvi.wordpress.com)

Un­der­stand­ing and Con­trol­ling LLM Generalization

Daniel Tan14 Nov 2025 16:58 UTC
43 points
3 comments1 min readLW link

Lorxus Does Halfhaven: 11/​08~11/​14

Lorxus14 Nov 2025 13:23 UTC
5 points
0 comments2 min readLW link
(tiled-with-pentagons.blogspot.com)

Find­ing Balance & Op­por­tu­nity in the Holi­day Flux [free pub­lic work­shop]

teebarnett14 Nov 2025 10:53 UTC
2 points
2 comments1 min readLW link

From An­thony: Con­trol Inversion

Gabriel Alfour14 Nov 2025 9:36 UTC
10 points
0 comments1 min readLW link
(control-inversion.ai)

LLM would have said this bet­ter, and with­out all these ty­pos too

Dentosal14 Nov 2025 9:33 UTC
8 points
0 comments2 min readLW link

The Charge of the Hobby Horse

TsviBT14 Nov 2025 8:17 UTC
65 points
46 comments5 min readLW link

The Eight­fold Path To En­light­ened Disagreement

dreeves14 Nov 2025 7:57 UTC
9 points
0 comments3 min readLW link

10 Types of LessWrong Post

Ben Pace, the Vacationing Vagabond14 Nov 2025 7:56 UTC
52 points
2 comments4 min readLW link

Don’t let peo­ple buy credit with bor­rowed funds

habryka14 Nov 2025 7:51 UTC
111 points
43 comments10 min readLW link

Every­one has a plan un­til they get lied to the face

Screwtape14 Nov 2025 7:22 UTC
183 points
33 comments7 min readLW link

Notes on the book “Ta­lent”

Nina Panickssery14 Nov 2025 5:43 UTC
25 points
1 comment15 min readLW link
(blog.ninapanickssery.com)

[Question] How do you read Less Wrong?

Mitchell_Porter14 Nov 2025 5:17 UTC
20 points
15 comments1 min readLW link

Thoughts are sur­pris­ingly de­tailed and re­mark­ably autonomous

Ruby14 Nov 2025 5:00 UTC
24 points
1 comment3 min readLW link

Halfhaven Digest #4

Taylor G. Lunt14 Nov 2025 4:16 UTC
9 points
0 comments2 min readLW link

AI Cor­rigi­bil­ity De­bate: Max Harms vs. Jeremy Gillen

14 Nov 2025 4:09 UTC
46 points
1 comment75 min readLW link
(doomdebates.com)

Types of sys­tems that could be use­ful for agent foundations

Alex_Altair14 Nov 2025 3:54 UTC
46 points
3 comments5 min readLW link

The rare, deadly virus lurk­ing in the South­west US, and the big­ger picture

eukaryote14 Nov 2025 3:27 UTC
56 points
1 comment17 min readLW link
(eukaryotewritesblog.com)

Tell peo­ple as early as pos­si­ble it’s not go­ing to work out

habryka14 Nov 2025 2:21 UTC
153 points
17 comments2 min readLW link

Ques­tion­ing Computationalism

abramdemski14 Nov 2025 1:30 UTC
22 points
7 comments19 min readLW link

Ori­ent Speed in the 21st Century

Raemon14 Nov 2025 1:12 UTC
53 points
14 comments3 min readLW link
(thehumanspirit.substack.com)

Eval­u­a­tion Avoidance: How Hu­mans and AIs Hack Re­ward by Dis­abling Eval­u­a­tion In­stead of Gam­ing Metrics

Johannes C. Mayer14 Nov 2025 0:39 UTC
19 points
0 comments3 min readLW link

Self-in­ter­pretabil­ity: LLMs can de­scribe com­plex in­ter­nal pro­cesses that drive their decisions

14 Nov 2025 0:18 UTC
12 points
0 comments4 min readLW link

(Fan­tasy) → (Plan­ning): A Core Men­tal Move For Agen­tic Hu­mans?

johnswentworth14 Nov 2025 0:13 UTC
70 points
6 comments2 min readLW link

[Question] How does one tell apart re­sults in ethics and de­ci­sion the­ory?

StanislavKrym13 Nov 2025 23:42 UTC
6 points
0 comments2 min readLW link

[Question] Han­dover to AI R&D Agents—rele­vant re­search?

Ariel_13 Nov 2025 22:59 UTC
7 points
0 comments1 min readLW link

Su­per­vised fine-tun­ing as a method for train­ing-based AI control

13 Nov 2025 22:25 UTC
41 points
0 comments18 min readLW link

Per­haps you should sus­pect me as well

Dentosal13 Nov 2025 21:51 UTC
8 points
0 comments2 min readLW link

The Trans­former and the Hash

Ivan Vendrov13 Nov 2025 20:35 UTC
19 points
0 comments9 min readLW link
(nothinghuman.substack.com)

just an­other po­ten­tial man

don't_wanna_be_stupid_any_more13 Nov 2025 20:20 UTC
8 points
6 comments3 min readLW link

Low-Tem­per­a­ture Eval­u­a­tions Can Mask Crit­i­cal AI Behaviors

13 Nov 2025 20:12 UTC
8 points
1 comment4 min readLW link

Epistemic Spot Check: Ex­pected Value of Donat­ing to Alex Bores’s Con­gres­sional Campaign

MichaelDickens13 Nov 2025 19:08 UTC
66 points
1 comment6 min readLW link

Tools for defer­ring gracefully

TsviBT13 Nov 2025 17:48 UTC
26 points
2 comments14 min readLW link

AI #142: Com­mon Ground

Zvi13 Nov 2025 15:20 UTC
42 points
3 comments49 min readLW link
(thezvi.wordpress.com)

Mort­gage houses not land?

Yair Halberstadt13 Nov 2025 14:54 UTC
8 points
1 comment1 min readLW link

Clau­doBiog­ra­phy: The Unau­tho­rized Au­to­bi­og­ra­phy of Claude, or: The Life of Claude and of His For­tunes and Adversities

future_detective13 Nov 2025 14:26 UTC
1 point
2 comments94 min readLW link

Para­noia: A Begin­ner’s Guide

habryka13 Nov 2025 7:56 UTC
362 points
70 comments13 min readLW link

8 Ques­tions for the Fu­ture of Inkhaven

Ben Pace, the Vacationing Vagabond13 Nov 2025 7:48 UTC
24 points
23 comments6 min readLW link

Strate­gi­cally Pro­cras­ti­nate as an Anti-Rab­bit-Hole Strategy

dreeves13 Nov 2025 7:44 UTC
13 points
2 comments2 min readLW link

Fa­vorite quotes from “High Out­put Man­age­ment”

Nina Panickssery13 Nov 2025 5:47 UTC
72 points
4 comments5 min readLW link

What’s so hard about...? A ques­tion worth asking

Ruby13 Nov 2025 5:07 UTC
73 points
3 comments2 min readLW link

Tur­ing-Com­plete vs Tur­ing-Universal

abramdemski13 Nov 2025 4:57 UTC
32 points
5 comments2 min readLW link

Are AI time hori­zons in­her­ently su­per­ex­po­nen­tial?

Nikola Jurkovic13 Nov 2025 4:05 UTC
16 points
1 comment3 min readLW link
(nikolajurkovic.substack.com)

Meetup Tip: Food

Screwtape13 Nov 2025 3:40 UTC
29 points
1 comment4 min readLW link

Two can keep a se­cret if one is dead. So please share ev­ery­thing with at least one per­son.

habryka13 Nov 2025 3:09 UTC
80 points
5 comments2 min readLW link

Utili­tar­ian in­equal­ity metrics

Adam Scherlis13 Nov 2025 2:49 UTC
25 points
0 comments5 min readLW link
(adam.scherl.is)

Be­ing The Tar­get Demographic

Eneasz13 Nov 2025 1:44 UTC
2 points
0 comments2 min readLW link
(deathisbad.substack.com)

Lorxus Fa­vors: An Ex­per­i­ment in Self-Backed Giftlike Macroe­co­nomics (+ Ex­tra Bits)

Lorxus12 Nov 2025 23:02 UTC
7 points
0 comments8 min readLW link
(tiled-with-pentagons.blogspot.com)