All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 456 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

On the Aesthetic of Wizard Power

Cole Wyeth4 Dec 2025 23:18 UTC

30 points

8 comments5 min readLW link

Will misaligned AIs know that they’re misaligned?

Alexa Pan4 Dec 2025 21:58 UTC

13 points

5 comments9 min readLW link

An Abstract Arsenal: Future Tokens in Claude Skills

Jordan Rubin4 Dec 2025 20:01 UTC

2 points

0 comments4 min readLW link

(jordanmrubin.substack.com)

OC ACXLW Meetup #109 — When the Numbers Stop Meaning Anything America’s Broken Poverty Line & UCSD’s Grade Mirage, Saturday, December 6, 2025

Michael Michalchik4 Dec 2025 19:58 UTC

1 point

0 comments2 min readLW link

Cross Layer Transcoders for the Qwen3 LLM Family

Gunnar Carlsson4 Dec 2025 19:11 UTC

26 points

1 comment2 min readLW link

The behavioral selection model for predicting AI motivations

Alex Mallen and Buck

4 Dec 2025 18:46 UTC

204 points

31 comments16 min readLW link

Management of Substrate-Sensitive AI Capabilities (MoSSAIC) Part 2: Conflict

mfatt4 Dec 2025 18:27 UTC

9 points

0 comments9 min readLW link

Livestream for Bay Secular Solstice

Raemon4 Dec 2025 18:18 UTC

24 points

1 comment1 min readLW link

Center on Long-Term Risk: Annual Review & Fundraiser 2025

Tristan Cook4 Dec 2025 18:14 UTC

44 points

0 comments4 min readLW link

(longtermrisk.org)

Power Overwhelming: dissecting the $1.5T AI revenue shortfall

ykevinzhang4 Dec 2025 17:13 UTC

33 points

3 comments11 min readLW link

on self-knowledge

Vadim Golub4 Dec 2025 16:55 UTC

0 points

0 comments5 min readLW link

AI #145: You’ve Got Soul

Zvi4 Dec 2025 15:00 UTC

44 points

4 comments60 min readLW link

(thezvi.wordpress.com)

Is Friendly AI an Attractor? Self-Reports from 22 Models Say Probably Not

Josh Snider4 Dec 2025 14:31 UTC

45 points

5 comments15 min readLW link

Modelling Trajectories—Interim results

NickyP, Einar Urdshals, Micurie and Éloïse Benito-Rodriguez

4 Dec 2025 13:34 UTC

11 points

0 comments4 min readLW link

Emergent Machine Ethics: A Foundational Research Framework for the Intelligence Symbiosis Paradigm

Hiroshi Yamakawa and Taichiro Endo

4 Dec 2025 12:42 UTC

20 points

0 comments9 min readLW link

Help us find founders for new AI safety projects

lukeprog4 Dec 2025 12:40 UTC

36 points

1 comment1 min readLW link

[Question] Do we have terminology for “heuristic utilitarianism” as opposed to classical act utilitarianism or formal rule utilitarianism?

SpectrumDT4 Dec 2025 12:26 UTC

8 points

8 comments1 min readLW link

What is the most impressive game an LLM can implement from scratch?

lilkim20254 Dec 2025 3:35 UTC

19 points

1 comment4 min readLW link

Sydney AI Safety Fellowship 2026 (Priority deadline this Sunday)

Chris_Leong4 Dec 2025 3:25 UTC

10 points

0 comments3 min readLW link

(sasf26.com)

Epistemology of Romance, Part 2

DaystarEld4 Dec 2025 2:53 UTC

47 points

1 comment18 min readLW link

Front-Load Giving Because of Anthropic Donors?

jefftk4 Dec 2025 2:30 UTC

85 points

8 comments1 min readLW link

(www.jefftk.com)

Center for Reducing Suffering (CRS) S-Risk Introductory Fellowship applications are open!

Zoé4 Dec 2025 1:21 UTC

8 points

0 comments1 min readLW link

(centerforreducingsuffering.org)

An AI Capability Threshold for Funding a UBI (Even If No New Jobs Are Created)

Aran Nayebi4 Dec 2025 1:06 UTC

14 points

0 comments3 min readLW link

Shaping Model Cognition Through Reflective Dialogue—Experiment & Findings

Anurag 3 Dec 2025 23:50 UTC

2 points

0 comments4 min readLW link

Categorizing Selection Effects

romeostevensit3 Dec 2025 20:32 UTC

46 points

6 comments5 min readLW link

Blog post: how important is the model spec if alignment fails?

Mia Taylor3 Dec 2025 20:19 UTC

11 points

1 comment1 min readLW link

(newsletter.forethought.org)

[Paper] Difficulties with Evaluating a Deception Detector for AIs

bilalchughtai, lewis smith and Neel Nanda

3 Dec 2025 20:07 UTC

30 points

2 comments6 min readLW link

(arxiv.org)

Beating China to ASI

PeterMcCluskey3 Dec 2025 19:52 UTC

74 points

11 comments6 min readLW link

(bayesianinvestor.com)

6 reasons why “alignment-is-hard” discourse seems alien to human intuitions, and vice-versa

Steven Byrnes3 Dec 2025 18:37 UTC

367 points

92 comments17 min readLW link

Management of Substrate-Sensitive AI Capabilities (MoSSAIC) Part 1: Exposition

mfatt and Aditya

3 Dec 2025 18:29 UTC

15 points

0 comments5 min readLW link

Embedded Universal Predictive Intelligence

Cole Wyeth3 Dec 2025 17:23 UTC

81 points

13 comments1 min readLW link

(www.arxiv.org)

Human-AI identity coupling is emergent

soycarts3 Dec 2025 17:14 UTC

4 points

1 comment3 min readLW link

On Dwarkesh Patel’s Second Interview With Ilya Sutskever

Zvi3 Dec 2025 16:31 UTC

48 points

4 comments21 min readLW link

(thezvi.wordpress.com)

A Critique of Yudkowsky’s Protein Folding Heuristic

milanrosko3 Dec 2025 14:59 UTC

11 points

12 comments4 min readLW link

Recollection of a Dinner Party

Srdjan Miletic3 Dec 2025 14:49 UTC

14 points

0 comments6 min readLW link

(www.dissent.blog)

Formalizing Newcombian Problems with Fuzzy Infra-Bayesianism

Brittany Gelb3 Dec 2025 14:35 UTC

18 points

2 comments22 min readLW link

Proof Section to Formalizing Newcombian Problems with Fuzzy Infra-Bayesianism

Brittany Gelb3 Dec 2025 14:34 UTC

12 points

0 comments2 min readLW link

Human art in a post-AI world should be strange

Abhishaike Mahajan3 Dec 2025 14:27 UTC

48 points

7 comments12 min readLW link

It’s tricky to tell what % of the economy the state controls

Srdjan Miletic3 Dec 2025 14:02 UTC

7 points

0 comments1 min readLW link

(www.dissent.blog)

I’m Skeptical of and Confused About The Multiplier in Macroeconomics

Srdjan Miletic3 Dec 2025 14:00 UTC

8 points

0 comments3 min readLW link

(www.dissent.blog)

Relitigating the Race to Build Friendly AI

Wei Dai3 Dec 2025 11:34 UTC

84 points

43 comments3 min readLW link

Intuition Pump: The AI Society

Jonas Hallgren3 Dec 2025 9:00 UTC

17 points

0 comments5 min readLW link

GiveCalc: Open-source tool to calculate the true cost of charitable giving

Max Ghenis2 Dec 2025 23:56 UTC

5 points

1 comment2 min readLW link

Effective Pizzaism

Screwtape2 Dec 2025 23:50 UTC

46 points

1 comment8 min readLW link

TastyBench: Toward Measuring Research Taste in LLM

Parv Mahajan, Yilin and yix

2 Dec 2025 23:26 UTC

33 points

2 comments6 min readLW link

AI Safety at the Frontier: Paper Highlights of November 2025

gasteigerjo2 Dec 2025 21:11 UTC

6 points

0 comments8 min readLW link

(aisafetyfrontier.substack.com)

Open Thread Winter 2025/26

kave2 Dec 2025 19:27 UTC

27 points

162 comments1 min readLW link

Practical AI risk II: Training transparency

Gustavo Ramires2 Dec 2025 19:26 UTC

1 point

0 comments1 min readLW link

Five ways AI can tell you’re testing it

sjadler2 Dec 2025 17:25 UTC

16 points

0 comments15 min readLW link

(stevenadler.substack.com)

Why Moloch is actually the God of Evolutionary Prisoner’s Dilemmas

Jonah Wilberg2 Dec 2025 16:54 UTC

34 points

2 comments11 min readLW link