Ex­is­ten­tial de­spair, with hope

foodforthought6 Dec 2025 20:48 UTC
10 points
0 comments1 min readLW link

I Need Your Help

Jaivardhan Nawani6 Dec 2025 18:48 UTC
8 points
1 comment1 min readLW link

Crazy ideas in AI Safety part 1: Easy Mea­surable Com­mu­ni­ca­tion

Valentin20266 Dec 2025 17:59 UTC
7 points
0 comments2 min readLW link

The cor­rigi­bil­ity basin of at­trac­tion is a mis­lead­ing gloss

Jeremy Gillen6 Dec 2025 15:38 UTC
92 points
37 comments18 min readLW link

LW Transcendence

Annabelle6 Dec 2025 6:53 UTC
9 points
0 comments2 min readLW link

The Ad­e­quacy of Class Separation

milanrosko6 Dec 2025 6:10 UTC
4 points
0 comments5 min readLW link

An­swer­ing a child’s questions

Alex_Altair6 Dec 2025 3:52 UTC
39 points
0 comments6 min readLW link

AI Mood Ring: A Win­dow Into LLM Emotions

michaelwaves6 Dec 2025 2:56 UTC
7 points
0 comments2 min readLW link

Crit­i­cal Med­i­ta­tion Theory

lsusr6 Dec 2025 2:24 UTC
57 points
11 comments2 min readLW link

Tools, Agents, and Sy­co­phan­tic Things

Eleni Angelou6 Dec 2025 1:50 UTC
25 points
0 comments4 min readLW link

What Hap­pens When You Train Models on False Facts?

David Vella Zarb6 Dec 2025 1:39 UTC
16 points
2 comments7 min readLW link

why amer­ica can’t build ships

bhauth6 Dec 2025 0:35 UTC
92 points
18 comments6 min readLW link
(www.bhauth.com)

An Am­bi­tious Vi­sion for Interpretability

leogao5 Dec 2025 22:57 UTC
168 points
7 comments4 min readLW link

Rea­sons to care about Ca­nary Strings

Alice Blair5 Dec 2025 21:41 UTC
27 points
3 comments2 min readLW link

An AI-2027-like anal­y­sis of hu­mans’ goals and ethics with con­ser­va­tive results

StanislavKrym5 Dec 2025 21:37 UTC
6 points
0 comments4 min readLW link

Man­age­ment of Sub­strate-Sen­si­tive AI Ca­pa­bil­ities (MoSSAIC) Part 3: Resolution

5 Dec 2025 18:58 UTC
10 points
0 comments9 min readLW link

An­nounc­ing: Agent Foun­da­tions 2026 at CMU

5 Dec 2025 18:37 UTC
59 points
2 comments1 min readLW link

Deep­Seek v3.2 Is Okay And Cheap But Slow

Zvi5 Dec 2025 16:30 UTC
33 points
3 comments9 min readLW link
(thezvi.wordpress.com)

Jour­nal­ist’s in­quiry into a core or­ganiser break­ing his non­vi­o­lence com­mit­ment and leav­ing Stop AI

Remmelt5 Dec 2025 15:47 UTC
49 points
1 comment4 min readLW link
(www.theatlantic.com)

Who is AGI for, and who benefits from AGI?

maddi5 Dec 2025 15:43 UTC
2 points
8 comments4 min readLW link

Eval-un­aware­ness ≠ Eval-invariance

Mo Baker5 Dec 2025 2:51 UTC
26 points
3 comments2 min readLW link

Try Train­ing SAEs with RLAIF

WCargo5 Dec 2025 1:10 UTC
5 points
0 comments2 min readLW link

Arch-an­ar­chy, the end of state and digi­tal anarchism

Peter lawless 5 Dec 2025 0:39 UTC
0 points
0 comments2 min readLW link

On the Aes­thetic of Wizard Power

Cole Wyeth4 Dec 2025 23:18 UTC
30 points
8 comments5 min readLW link

Will mis­al­igned AIs know that they’re mis­al­igned?

Alexa Pan4 Dec 2025 21:58 UTC
13 points
5 comments9 min readLW link

An Ab­stract Arse­nal: Fu­ture To­kens in Claude Skills

Jordan Rubin4 Dec 2025 20:01 UTC
2 points
0 comments4 min readLW link
(jordanmrubin.substack.com)

OC ACXLW Meetup #109 — When the Num­bers Stop Mean­ing Any­thing Amer­ica’s Bro­ken Poverty Line & UCSD’s Grade Mirage, Satur­day, De­cem­ber 6, 2025

Michael Michalchik4 Dec 2025 19:58 UTC
1 point
0 comments2 min readLW link

Cross Layer Transcoders for the Qwen3 LLM Family

Gunnar Carlsson4 Dec 2025 19:11 UTC
26 points
1 comment2 min readLW link

The be­hav­ioral se­lec­tion model for pre­dict­ing AI motivations

4 Dec 2025 18:46 UTC
190 points
27 comments16 min readLW link

Man­age­ment of Sub­strate-Sen­si­tive AI Ca­pa­bil­ities (MoSSAIC) Part 2: Conflict

mfatt4 Dec 2025 18:27 UTC
8 points
0 comments9 min readLW link

Livestream for Bay Sec­u­lar Solstice

Raemon4 Dec 2025 18:18 UTC
24 points
1 comment1 min readLW link

Cen­ter on Long-Term Risk: An­nual Re­view & Fundraiser 2025

Tristan Cook4 Dec 2025 18:14 UTC
44 points
0 comments4 min readLW link
(longtermrisk.org)

Power Over­whelming: dis­sect­ing the $1.5T AI rev­enue shortfall

ykevinzhang4 Dec 2025 17:13 UTC
33 points
3 comments11 min readLW link

on self-knowledge

Vadim Golub4 Dec 2025 16:55 UTC
0 points
0 comments5 min readLW link

AI #145: You’ve Got Soul

Zvi4 Dec 2025 15:00 UTC
43 points
4 comments60 min readLW link
(thezvi.wordpress.com)

Is Friendly AI an At­trac­tor? Self-Re­ports from 22 Models Say Prob­a­bly Not

Josh Snider4 Dec 2025 14:31 UTC
44 points
5 comments15 min readLW link

Model­ling Tra­jec­to­ries—In­terim results

4 Dec 2025 13:34 UTC
11 points
0 comments4 min readLW link

Emer­gent Ma­chine Ethics: A Foun­da­tional Re­search Frame­work for the In­tel­li­gence Sym­bio­sis Paradigm

4 Dec 2025 12:42 UTC
19 points
0 comments9 min readLW link

Help us find founders for new AI safety projects

lukeprog4 Dec 2025 12:40 UTC
33 points
1 comment1 min readLW link

[Question] Do we have ter­minol­ogy for “heuris­tic util­i­tar­i­anism” as op­posed to clas­si­cal act util­i­tar­i­anism or for­mal rule util­i­tar­i­anism?

SpectrumDT4 Dec 2025 12:26 UTC
8 points
8 comments1 min readLW link

What is the most im­pres­sive game an LLM can im­ple­ment from scratch?

lilkim20254 Dec 2025 3:35 UTC
16 points
0 comments4 min readLW link

Syd­ney AI Safety Fel­low­ship 2026 (Pri­or­ity dead­line this Sun­day)

Chris_Leong4 Dec 2025 3:25 UTC
10 points
0 comments3 min readLW link
(sasf26.com)

Episte­mol­ogy of Ro­mance, Part 2

DaystarEld4 Dec 2025 2:53 UTC
44 points
1 comment18 min readLW link

Front-Load Giv­ing Be­cause of An­thropic Donors?

jefftk4 Dec 2025 2:30 UTC
84 points
8 comments1 min readLW link
(www.jefftk.com)

Cen­ter for Re­duc­ing Suffer­ing (CRS) S-Risk In­tro­duc­tory Fel­low­ship ap­pli­ca­tions are open!

Zoé4 Dec 2025 1:21 UTC
8 points
0 comments1 min readLW link
(centerforreducingsuffering.org)

An AI Ca­pa­bil­ity Thresh­old for Fund­ing a UBI (Even If No New Jobs Are Created)

Aran Nayebi4 Dec 2025 1:06 UTC
14 points
0 comments3 min readLW link

Shap­ing Model Cog­ni­tion Through Reflec­tive Dialogue—Ex­per­i­ment & Findings

Anurag 3 Dec 2025 23:50 UTC
2 points
0 comments4 min readLW link

Cat­e­go­riz­ing Selec­tion Effects

romeostevensit3 Dec 2025 20:32 UTC
44 points
6 comments5 min readLW link

Blog post: how im­por­tant is the model spec if al­ign­ment fails?

Mia Taylor3 Dec 2025 20:19 UTC
11 points
1 comment1 min readLW link
(newsletter.forethought.org)

[Paper] Difficul­ties with Eval­u­at­ing a De­cep­tion De­tec­tor for AIs

3 Dec 2025 20:07 UTC
30 points
2 comments6 min readLW link
(arxiv.org)