All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 678 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Existential despair, with hope

foodforthought6 Dec 2025 20:48 UTC

10 points

0 comments1 min readLW link

I Need Your Help

Jaivardhan Nawani6 Dec 2025 18:48 UTC

8 points

1 comment1 min readLW link

Crazy ideas in AI Safety part 1: Easy Measurable Communication

Valentin20266 Dec 2025 17:59 UTC

7 points

0 comments2 min readLW link

The corrigibility basin of attraction is a misleading gloss

Jeremy Gillen6 Dec 2025 15:38 UTC

92 points

37 comments18 min readLW link

LW Transcendence

Annabelle6 Dec 2025 6:53 UTC

9 points

0 comments2 min readLW link

The Adequacy of Class Separation

milanrosko6 Dec 2025 6:10 UTC

4 points

0 comments5 min readLW link

Answering a child’s questions

Alex_Altair6 Dec 2025 3:52 UTC

39 points

0 comments6 min readLW link

AI Mood Ring: A Window Into LLM Emotions

michaelwaves6 Dec 2025 2:56 UTC

7 points

0 comments2 min readLW link

Critical Meditation Theory

lsusr6 Dec 2025 2:24 UTC

57 points

11 comments2 min readLW link

Tools, Agents, and Sycophantic Things

Eleni Angelou6 Dec 2025 1:50 UTC

25 points

0 comments4 min readLW link

What Happens When You Train Models on False Facts?

David Vella Zarb6 Dec 2025 1:39 UTC

16 points

2 comments7 min readLW link

why america can’t build ships

bhauth6 Dec 2025 0:35 UTC

92 points

18 comments6 min readLW link

(www.bhauth.com)

An Ambitious Vision for Interpretability

leogao5 Dec 2025 22:57 UTC

168 points

7 comments4 min readLW link

Reasons to care about Canary Strings

Alice Blair5 Dec 2025 21:41 UTC

27 points

3 comments2 min readLW link

An AI-2027-like analysis of humans’ goals and ethics with conservative results

StanislavKrym5 Dec 2025 21:37 UTC

6 points

0 comments4 min readLW link

Management of Substrate-Sensitive AI Capabilities (MoSSAIC) Part 3: Resolution

mfatt and Sahil

5 Dec 2025 18:58 UTC

10 points

0 comments9 min readLW link

Announcing: Agent Foundations 2026 at CMU

David Udell, Alexander Gietelink Oldenziel, windows and Matthias Dellago

5 Dec 2025 18:37 UTC

59 points

2 comments1 min readLW link

DeepSeek v3.2 Is Okay And Cheap But Slow

Zvi5 Dec 2025 16:30 UTC

33 points

3 comments9 min readLW link

(thezvi.wordpress.com)

Journalist’s inquiry into a core organiser breaking his nonviolence commitment and leaving Stop AI

Remmelt5 Dec 2025 15:47 UTC

49 points

1 comment4 min readLW link

(www.theatlantic.com)

Who is AGI for, and who benefits from AGI?

maddi5 Dec 2025 15:43 UTC

2 points

8 comments4 min readLW link

Eval-unawareness ≠ Eval-invariance

Mo Baker5 Dec 2025 2:51 UTC

26 points

3 comments2 min readLW link

Try Training SAEs with RLAIF

WCargo5 Dec 2025 1:10 UTC

5 points

0 comments2 min readLW link

Arch-anarchy, the end of state and digital anarchism

Peter lawless 5 Dec 2025 0:39 UTC

0 points

0 comments2 min readLW link

On the Aesthetic of Wizard Power

Cole Wyeth4 Dec 2025 23:18 UTC

30 points

8 comments5 min readLW link

Will misaligned AIs know that they’re misaligned?

Alexa Pan4 Dec 2025 21:58 UTC

13 points

5 comments9 min readLW link

An Abstract Arsenal: Future Tokens in Claude Skills

Jordan Rubin4 Dec 2025 20:01 UTC

2 points

0 comments4 min readLW link

(jordanmrubin.substack.com)

OC ACXLW Meetup #109 — When the Numbers Stop Meaning Anything America’s Broken Poverty Line & UCSD’s Grade Mirage, Saturday, December 6, 2025

Michael Michalchik4 Dec 2025 19:58 UTC

1 point

0 comments2 min readLW link

Cross Layer Transcoders for the Qwen3 LLM Family

Gunnar Carlsson4 Dec 2025 19:11 UTC

26 points

1 comment2 min readLW link

The behavioral selection model for predicting AI motivations

Alex Mallen and Buck

4 Dec 2025 18:46 UTC

190 points

27 comments16 min readLW link

Management of Substrate-Sensitive AI Capabilities (MoSSAIC) Part 2: Conflict

mfatt4 Dec 2025 18:27 UTC

8 points

0 comments9 min readLW link

Livestream for Bay Secular Solstice

Raemon4 Dec 2025 18:18 UTC

24 points

1 comment1 min readLW link

Center on Long-Term Risk: Annual Review & Fundraiser 2025

Tristan Cook4 Dec 2025 18:14 UTC

44 points

0 comments4 min readLW link

(longtermrisk.org)

Power Overwhelming: dissecting the $1.5T AI revenue shortfall

ykevinzhang4 Dec 2025 17:13 UTC

33 points

3 comments11 min readLW link

on self-knowledge

Vadim Golub4 Dec 2025 16:55 UTC

0 points

0 comments5 min readLW link

AI #145: You’ve Got Soul

Zvi4 Dec 2025 15:00 UTC

43 points

4 comments60 min readLW link

(thezvi.wordpress.com)

Is Friendly AI an Attractor? Self-Reports from 22 Models Say Probably Not

Josh Snider4 Dec 2025 14:31 UTC

44 points

5 comments15 min readLW link

Modelling Trajectories—Interim results

NickyP, Einar Urdshals, Micurie and Éloïse B

4 Dec 2025 13:34 UTC

11 points

0 comments4 min readLW link

Emergent Machine Ethics: A Foundational Research Framework for the Intelligence Symbiosis Paradigm

Hiroshi Yamakawa and Taichiro Endo

4 Dec 2025 12:42 UTC

19 points

0 comments9 min readLW link

Help us find founders for new AI safety projects

lukeprog4 Dec 2025 12:40 UTC

33 points

1 comment1 min readLW link

[Question] Do we have terminology for “heuristic utilitarianism” as opposed to classical act utilitarianism or formal rule utilitarianism?

SpectrumDT4 Dec 2025 12:26 UTC

8 points

8 comments1 min readLW link

What is the most impressive game an LLM can implement from scratch?

lilkim20254 Dec 2025 3:35 UTC

16 points

0 comments4 min readLW link

Sydney AI Safety Fellowship 2026 (Priority deadline this Sunday)

Chris_Leong4 Dec 2025 3:25 UTC

10 points

0 comments3 min readLW link

(sasf26.com)

Epistemology of Romance, Part 2

DaystarEld4 Dec 2025 2:53 UTC

44 points

1 comment18 min readLW link

Front-Load Giving Because of Anthropic Donors?

jefftk4 Dec 2025 2:30 UTC

84 points

8 comments1 min readLW link

(www.jefftk.com)

Center for Reducing Suffering (CRS) S-Risk Introductory Fellowship applications are open!

Zoé4 Dec 2025 1:21 UTC

8 points

0 comments1 min readLW link

(centerforreducingsuffering.org)

An AI Capability Threshold for Funding a UBI (Even If No New Jobs Are Created)

Aran Nayebi4 Dec 2025 1:06 UTC

14 points

0 comments3 min readLW link

Shaping Model Cognition Through Reflective Dialogue—Experiment & Findings

Anurag 3 Dec 2025 23:50 UTC

2 points

0 comments4 min readLW link

Categorizing Selection Effects

romeostevensit3 Dec 2025 20:32 UTC

44 points

6 comments5 min readLW link

Blog post: how important is the model spec if alignment fails?

Mia Taylor3 Dec 2025 20:19 UTC

11 points

1 comment1 min readLW link

(newsletter.forethought.org)

[Paper] Difficulties with Evaluating a Deception Detector for AIs

bilalchughtai, lewis smith and Neel Nanda

3 Dec 2025 20:07 UTC

30 points

2 comments6 min readLW link

(arxiv.org)