On the Aes­thetic of Wizard Power

Cole Wyeth4 Dec 2025 23:18 UTC
30 points
8 comments5 min readLW link

Will mis­al­igned AIs know that they’re mis­al­igned?

Alexa Pan4 Dec 2025 21:58 UTC
13 points
5 comments9 min readLW link

An Ab­stract Arse­nal: Fu­ture To­kens in Claude Skills

Jordan Rubin4 Dec 2025 20:01 UTC
2 points
0 comments4 min readLW link
(jordanmrubin.substack.com)

OC ACXLW Meetup #109 — When the Num­bers Stop Mean­ing Any­thing Amer­ica’s Bro­ken Poverty Line & UCSD’s Grade Mirage, Satur­day, De­cem­ber 6, 2025

Michael Michalchik4 Dec 2025 19:58 UTC
1 point
0 comments2 min readLW link

Cross Layer Transcoders for the Qwen3 LLM Family

Gunnar Carlsson4 Dec 2025 19:11 UTC
26 points
1 comment2 min readLW link

The be­hav­ioral se­lec­tion model for pre­dict­ing AI motivations

4 Dec 2025 18:46 UTC
189 points
27 comments16 min readLW link

Man­age­ment of Sub­strate-Sen­si­tive AI Ca­pa­bil­ities (MoSSAIC) Part 2: Conflict

mfatt4 Dec 2025 18:27 UTC
8 points
0 comments9 min readLW link

Livestream for Bay Sec­u­lar Solstice

Raemon4 Dec 2025 18:18 UTC
24 points
1 comment1 min readLW link

Cen­ter on Long-Term Risk: An­nual Re­view & Fundraiser 2025

Tristan Cook4 Dec 2025 18:14 UTC
44 points
0 comments4 min readLW link
(longtermrisk.org)

Power Over­whelming: dis­sect­ing the $1.5T AI rev­enue shortfall

ykevinzhang4 Dec 2025 17:13 UTC
33 points
3 comments11 min readLW link

on self-knowledge

Vadim Golub4 Dec 2025 16:55 UTC
0 points
0 comments5 min readLW link

AI #145: You’ve Got Soul

Zvi4 Dec 2025 15:00 UTC
43 points
4 comments60 min readLW link
(thezvi.wordpress.com)

Is Friendly AI an At­trac­tor? Self-Re­ports from 22 Models Say Prob­a­bly Not

Josh Snider4 Dec 2025 14:31 UTC
44 points
5 comments15 min readLW link

Model­ling Tra­jec­to­ries—In­terim results

4 Dec 2025 13:34 UTC
11 points
0 comments4 min readLW link

Emer­gent Ma­chine Ethics: A Foun­da­tional Re­search Frame­work for the In­tel­li­gence Sym­bio­sis Paradigm

4 Dec 2025 12:42 UTC
19 points
0 comments9 min readLW link

Help us find founders for new AI safety projects

lukeprog4 Dec 2025 12:40 UTC
33 points
1 comment1 min readLW link

[Question] Do we have ter­minol­ogy for “heuris­tic util­i­tar­i­anism” as op­posed to clas­si­cal act util­i­tar­i­anism or for­mal rule util­i­tar­i­anism?

SpectrumDT4 Dec 2025 12:26 UTC
8 points
8 comments1 min readLW link

What is the most im­pres­sive game an LLM can im­ple­ment from scratch?

lilkim20254 Dec 2025 3:35 UTC
16 points
0 comments4 min readLW link

Syd­ney AI Safety Fel­low­ship 2026 (Pri­or­ity dead­line this Sun­day)

Chris_Leong4 Dec 2025 3:25 UTC
10 points
0 comments3 min readLW link
(sasf26.com)

Episte­mol­ogy of Ro­mance, Part 2

DaystarEld4 Dec 2025 2:53 UTC
44 points
1 comment18 min readLW link

Front-Load Giv­ing Be­cause of An­thropic Donors?

jefftk4 Dec 2025 2:30 UTC
84 points
8 comments1 min readLW link
(www.jefftk.com)

Cen­ter for Re­duc­ing Suffer­ing (CRS) S-Risk In­tro­duc­tory Fel­low­ship ap­pli­ca­tions are open!

Zoé4 Dec 2025 1:21 UTC
8 points
0 comments1 min readLW link
(centerforreducingsuffering.org)

An AI Ca­pa­bil­ity Thresh­old for Fund­ing a UBI (Even If No New Jobs Are Created)

Aran Nayebi4 Dec 2025 1:06 UTC
14 points
0 comments3 min readLW link

Shap­ing Model Cog­ni­tion Through Reflec­tive Dialogue—Ex­per­i­ment & Findings

Anurag 3 Dec 2025 23:50 UTC
2 points
0 comments4 min readLW link

Cat­e­go­riz­ing Selec­tion Effects

romeostevensit3 Dec 2025 20:32 UTC
44 points
6 comments5 min readLW link

Blog post: how im­por­tant is the model spec if al­ign­ment fails?

Mia Taylor3 Dec 2025 20:19 UTC
11 points
1 comment1 min readLW link
(newsletter.forethought.org)

[Paper] Difficul­ties with Eval­u­at­ing a De­cep­tion De­tec­tor for AIs

3 Dec 2025 20:07 UTC
30 points
2 comments6 min readLW link
(arxiv.org)

Beat­ing China to ASI

PeterMcCluskey3 Dec 2025 19:52 UTC
74 points
11 comments6 min readLW link
(bayesianinvestor.com)

6 rea­sons why “al­ign­ment-is-hard” dis­course seems alien to hu­man in­tu­itions, and vice-versa

Steven Byrnes3 Dec 2025 18:37 UTC
357 points
89 comments17 min readLW link

Man­age­ment of Sub­strate-Sen­si­tive AI Ca­pa­bil­ities (MoSSAIC) Part 1: Exposition

3 Dec 2025 18:29 UTC
14 points
0 comments5 min readLW link

Embed­ded Univer­sal Pre­dic­tive Intelligence

Cole Wyeth3 Dec 2025 17:23 UTC
79 points
13 comments1 min readLW link
(www.arxiv.org)

Hu­man-AI iden­tity cou­pling is emergent

soycarts3 Dec 2025 17:14 UTC
4 points
1 comment3 min readLW link

On Dwarkesh Pa­tel’s Se­cond In­ter­view With Ilya Sutskever

Zvi3 Dec 2025 16:31 UTC
47 points
4 comments21 min readLW link
(thezvi.wordpress.com)

A Cri­tique of Yud­kowsky’s Protein Fold­ing Heuristic

milanrosko3 Dec 2025 14:59 UTC
11 points
12 comments4 min readLW link

Recol­lec­tion of a Din­ner Party

Srdjan Miletic3 Dec 2025 14:49 UTC
14 points
0 comments6 min readLW link
(www.dissent.blog)

For­mal­iz­ing New­com­bian Prob­lems with Fuzzy In­fra-Bayesianism

Brittany Gelb3 Dec 2025 14:35 UTC
16 points
0 comments22 min readLW link

Proof Sec­tion to For­mal­iz­ing New­com­bian Prob­lems with Fuzzy In­fra-Bayesianism

Brittany Gelb3 Dec 2025 14:34 UTC
12 points
0 comments2 min readLW link

Hu­man art in a post-AI world should be strange

Abhishaike Mahajan3 Dec 2025 14:27 UTC
48 points
7 comments12 min readLW link

It’s tricky to tell what % of the econ­omy the state controls

Srdjan Miletic3 Dec 2025 14:02 UTC
7 points
0 comments1 min readLW link
(www.dissent.blog)

I’m Skep­ti­cal of and Con­fused About The Mul­ti­plier in Macroeconomics

Srdjan Miletic3 Dec 2025 14:00 UTC
8 points
0 comments3 min readLW link
(www.dissent.blog)

Reliti­gat­ing the Race to Build Friendly AI

Wei Dai3 Dec 2025 11:34 UTC
83 points
43 comments3 min readLW link

In­tu­ition Pump: The AI Society

Jonas Hallgren3 Dec 2025 9:00 UTC
17 points
0 comments5 min readLW link

GiveCalc: Open-source tool to calcu­late the true cost of char­i­ta­ble giving

Max Ghenis2 Dec 2025 23:56 UTC
5 points
1 comment2 min readLW link

Effec­tive Pizzaism

Screwtape2 Dec 2025 23:50 UTC
45 points
1 comment8 min readLW link

TastyBench: Toward Mea­sur­ing Re­search Taste in LLM

2 Dec 2025 23:26 UTC
27 points
2 comments6 min readLW link

AI Safety at the Fron­tier: Paper High­lights of Novem­ber 2025

gasteigerjo2 Dec 2025 21:11 UTC
6 points
0 comments8 min readLW link
(aisafetyfrontier.substack.com)

Open Thread Win­ter 2025/​26

kave2 Dec 2025 19:27 UTC
21 points
59 comments1 min readLW link

Prac­ti­cal AI risk II: Train­ing transparency

Gustavo Ramires2 Dec 2025 19:26 UTC
1 point
0 comments1 min readLW link

Five ways AI can tell you’re test­ing it

sjadler2 Dec 2025 17:25 UTC
16 points
0 comments15 min readLW link
(stevenadler.substack.com)

Why Moloch is ac­tu­ally the God of Evolu­tion­ary Pri­soner’s Dilemmas

Jonah Wilberg2 Dec 2025 16:54 UTC
32 points
2 comments11 min readLW link