Shap­ing Model Cog­ni­tion Through Reflec­tive Dialogue—Ex­per­i­ment & Findings

Anurag 3 Dec 2025 23:50 UTC
2 points
0 comments4 min readLW link

Cat­e­go­riz­ing Selec­tion Effects

romeostevensit3 Dec 2025 20:32 UTC
44 points
6 comments5 min readLW link

Blog post: how im­por­tant is the model spec if al­ign­ment fails?

Mia Taylor3 Dec 2025 20:19 UTC
11 points
1 comment1 min readLW link
(newsletter.forethought.org)

[Paper] Difficul­ties with Eval­u­at­ing a De­cep­tion De­tec­tor for AIs

3 Dec 2025 20:07 UTC
30 points
2 comments6 min readLW link
(arxiv.org)

Beat­ing China to ASI

PeterMcCluskey3 Dec 2025 19:52 UTC
74 points
11 comments6 min readLW link
(bayesianinvestor.com)

6 rea­sons why “al­ign­ment-is-hard” dis­course seems alien to hu­man in­tu­itions, and vice-versa

Steven Byrnes3 Dec 2025 18:37 UTC
357 points
89 comments17 min readLW link

Man­age­ment of Sub­strate-Sen­si­tive AI Ca­pa­bil­ities (MoSSAIC) Part 1: Exposition

3 Dec 2025 18:29 UTC
14 points
0 comments5 min readLW link

Embed­ded Univer­sal Pre­dic­tive Intelligence

Cole Wyeth3 Dec 2025 17:23 UTC
79 points
13 comments1 min readLW link
(www.arxiv.org)

Hu­man-AI iden­tity cou­pling is emergent

soycarts3 Dec 2025 17:14 UTC
4 points
1 comment3 min readLW link

On Dwarkesh Pa­tel’s Se­cond In­ter­view With Ilya Sutskever

Zvi3 Dec 2025 16:31 UTC
47 points
4 comments21 min readLW link
(thezvi.wordpress.com)

A Cri­tique of Yud­kowsky’s Protein Fold­ing Heuristic

milanrosko3 Dec 2025 14:59 UTC
11 points
12 comments4 min readLW link

Recol­lec­tion of a Din­ner Party

Srdjan Miletic3 Dec 2025 14:49 UTC
14 points
0 comments6 min readLW link
(www.dissent.blog)

For­mal­iz­ing New­com­bian Prob­lems with Fuzzy In­fra-Bayesianism

Brittany Gelb3 Dec 2025 14:35 UTC
16 points
0 comments22 min readLW link

Proof Sec­tion to For­mal­iz­ing New­com­bian Prob­lems with Fuzzy In­fra-Bayesianism

Brittany Gelb3 Dec 2025 14:34 UTC
12 points
0 comments2 min readLW link

Hu­man art in a post-AI world should be strange

Abhishaike Mahajan3 Dec 2025 14:27 UTC
48 points
7 comments12 min readLW link

It’s tricky to tell what % of the econ­omy the state controls

Srdjan Miletic3 Dec 2025 14:02 UTC
7 points
0 comments1 min readLW link
(www.dissent.blog)

I’m Skep­ti­cal of and Con­fused About The Mul­ti­plier in Macroeconomics

Srdjan Miletic3 Dec 2025 14:00 UTC
8 points
0 comments3 min readLW link
(www.dissent.blog)

Reliti­gat­ing the Race to Build Friendly AI

Wei Dai3 Dec 2025 11:34 UTC
83 points
43 comments3 min readLW link

In­tu­ition Pump: The AI Society

Jonas Hallgren3 Dec 2025 9:00 UTC
17 points
0 comments5 min readLW link

GiveCalc: Open-source tool to calcu­late the true cost of char­i­ta­ble giving

Max Ghenis2 Dec 2025 23:56 UTC
5 points
1 comment2 min readLW link

Effec­tive Pizzaism

Screwtape2 Dec 2025 23:50 UTC
45 points
1 comment8 min readLW link

TastyBench: Toward Mea­sur­ing Re­search Taste in LLM

2 Dec 2025 23:26 UTC
27 points
2 comments6 min readLW link

AI Safety at the Fron­tier: Paper High­lights of Novem­ber 2025

gasteigerjo2 Dec 2025 21:11 UTC
6 points
0 comments8 min readLW link
(aisafetyfrontier.substack.com)

Open Thread Win­ter 2025/​26

kave2 Dec 2025 19:27 UTC
21 points
59 comments1 min readLW link

Prac­ti­cal AI risk II: Train­ing transparency

Gustavo Ramires2 Dec 2025 19:26 UTC
1 point
0 comments1 min readLW link

Five ways AI can tell you’re test­ing it

sjadler2 Dec 2025 17:25 UTC
16 points
0 comments15 min readLW link
(stevenadler.substack.com)

Why Moloch is ac­tu­ally the God of Evolu­tion­ary Pri­soner’s Dilemmas

Jonah Wilberg2 Dec 2025 16:54 UTC
32 points
2 comments11 min readLW link

Re­ward Mis­matches in RL Cause Emer­gent Misalignment

Zvi2 Dec 2025 16:31 UTC
70 points
1 comment7 min readLW link
(thezvi.wordpress.com)

Sci.STEPS in­vites mentee applications

Valentin20262 Dec 2025 13:33 UTC
7 points
0 comments1 min readLW link

How Claude Opus 4.5 de­scribes its ex­pe­rience of var­i­ous concepts

Kaj_Sotala2 Dec 2025 13:05 UTC
16 points
1 comment65 min readLW link

Safety Cases Ex­plained: How to Ar­gue an AI is Safe

JanWehner2 Dec 2025 11:03 UTC
16 points
2 comments9 min readLW link

The Hid­den Asym­me­try in Per­sonal Pre­pared­ness: Early Costs, Late Losses

Ulrik Horn2 Dec 2025 10:33 UTC
6 points
5 comments15 min readLW link

Halfhaven Digest 6 + Retrospective

Taylor G. Lunt2 Dec 2025 5:27 UTC
20 points
2 comments3 min readLW link

Met­ric-haven (quick stats on how Inkhaven im­pacted LessWrong)

Ruby2 Dec 2025 3:31 UTC
26 points
3 comments1 min readLW link

MIRI’s 2025 Fundraiser

alexvermeer2 Dec 2025 1:53 UTC
176 points
7 comments8 min readLW link

Every­one Can Be High Sta­tus In Utopia

Algon1 Dec 2025 23:43 UTC
12 points
5 comments2 min readLW link

GRPO is terrible

RobinHa1 Dec 2025 22:54 UTC
4 points
2 comments5 min readLW link
(robinhaselhorst.com)

How to Write Fast, Weird, and Well

Linch1 Dec 2025 21:30 UTC
44 points
1 comment18 min readLW link
(inchpin.substack.com)

The 2024 LessWrong Review

RobertM1 Dec 2025 21:06 UTC
63 points
10 comments7 min readLW link

Fu­ture Proofing Solstice

Raemon1 Dec 2025 20:57 UTC
51 points
7 comments1 min readLW link

Why ra­tio­nal­ists get depressed

Pjain1 Dec 2025 20:07 UTC
28 points
0 comments17 min readLW link

An­nounc­ing: OpenAI’s Align­ment Re­search Blog

Naomi Bashkansky1 Dec 2025 19:52 UTC
120 points
11 comments1 min readLW link

AI Men­tal Health Chat­bots for Low-Re­source Set­tings: A Pri­ori­ti­za­tion Framework

Dawn Drescher1 Dec 2025 17:41 UTC
6 points
0 comments16 min readLW link

Which planet is clos­est to the Earth, and why is it Mer­cury?

Menotim1 Dec 2025 17:16 UTC
27 points
5 comments4 min readLW link

How mid­dle pow­ers may pre­vent the de­vel­op­ment of ar­tifi­cial superintelligence

1 Dec 2025 16:48 UTC
127 points
12 comments3 min readLW link
(asi-prevention.com)

Be­com­ing a Chi­nese Room

Raelifin1 Dec 2025 16:34 UTC
39 points
3 comments6 min readLW link
(raelifin.substack.com)

Well, Sea­sons Great­ings Every­one! [Short Fic­tion]

Shiva's Right Foot1 Dec 2025 16:29 UTC
15 points
0 comments3 min readLW link

23 thoughts on Ar­tifi­cial In­tel­li­gence (2025)

Annapurna1 Dec 2025 16:01 UTC
1 point
0 comments5 min readLW link

Lorxus Does Halfhaven: 11/​22~11/​28

Lorxus1 Dec 2025 14:47 UTC
5 points
0 comments2 min readLW link
(tiled-with-pentagons.blogspot.com)

Would ASI de­vel­op­ment in non-party states un­der­mine a non­pro­lifer­a­tion agree­ment?

Robi Rahman1 Dec 2025 14:22 UTC
13 points
0 comments9 min readLW link