All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 171819 20 21 22 23 24 25 26 27 28 29 30 31

A browser game about AI safety

NickSharp17 Dec 2025 22:36 UTC

18 points

4 comments1 min readLW link

What if we could grow human tissue by recapitulating embryogenesis?

Abhishaike Mahajan17 Dec 2025 21:53 UTC

23 points

0 comments1 min readLW link

(www.owlposting.com)

Transmitting Misalignment with Subliminal Learning via Paraphrasing

Matthew Bozoukov, Taywon Min, CallumMcDougall and J Rosser

17 Dec 2025 19:34 UTC

38 points

0 comments10 min readLW link

Shallow review of technical AI safety, 2025

technicalities, Tomáš Gavenčiak, Stephen McAleese, peligrietzer, Stag, jordine, ozziegooen, Violet Hour and ramennaut

17 Dec 2025 18:18 UTC

175 points

9 comments83 min readLW link

Announcing RoastMyPost: LLMs Eval Blog Posts and More

ozziegooen17 Dec 2025 18:10 UTC

110 points

17 comments5 min readLW link

Alignment Fine-Tuning: Lessons from Operant Conditioning

foodforthought17 Dec 2025 16:57 UTC

5 points

4 comments10 min readLW link

Bryan Caplan on Ethical Intuitionism

vatsal_newsletter17 Dec 2025 16:48 UTC

−5 points

0 comments1 min readLW link

(www.readvatsal.com)

The Bleeding Mind

Adele Lopez17 Dec 2025 16:27 UTC

65 points

10 comments6 min readLW link

Could space debris block access to outer space?

fin17 Dec 2025 15:59 UTC

12 points

5 comments3 min readLW link

(www.forethought.org)

An intuitive explanation of backdoor paths using DAGs

enterthewoods17 Dec 2025 15:42 UTC

8 points

0 comments6 min readLW link

Still Too Soon

Gordon Seidoh Worley17 Dec 2025 15:40 UTC

75 points

3 comments2 min readLW link

(www.uncertainupdates.com)

The $140K Question: Cost Changes Over Time

Zvi17 Dec 2025 14:10 UTC

28 points

2 comments18 min readLW link

(thezvi.wordpress.com)

[Question] Can you recommend some reading about effective environmentalism?

SpectrumDT17 Dec 2025 11:15 UTC

3 points

0 comments1 min readLW link

Memory Consolidation

Elliot Callender17 Dec 2025 11:03 UTC

2 points

0 comments2 min readLW link

(substack.com)

[Question] Cognition Augmentation Org

Elliot Callender17 Dec 2025 10:49 UTC

3 points

2 comments1 min readLW link

On publishing every day for 30 days

Alexandre Variengien17 Dec 2025 8:30 UTC

9 points

0 comments5 min readLW link

(alexandrevariengien.com)

Dancing in a World of Horseradish

lsusr17 Dec 2025 5:50 UTC

134 points

31 comments4 min readLW link

Video and transcript of talk on human-like-ness in AI safety

Joe Carlsmith17 Dec 2025 4:09 UTC

10 points

0 comments36 min readLW link

Lessons from a failed ambitious alignment program

Kabir Kumar17 Dec 2025 1:50 UTC

57 points

5 comments3 min readLW link

43 SAE Features Differentiate Concealment from Confession in Anthropic’s Deceptive Model Organism

James Hoffend17 Dec 2025 1:40 UTC

12 points

0 comments4 min readLW link

Announcing TARA: Receive (and Give) Technical AI Safety Training Without Leaving Your Home City

Zac Broeren17 Dec 2025 1:33 UTC

5 points

0 comments4 min readLW link

Announcing: MIRI Technical Governance Team Research Fellowship

yams, peterbarnett, Aaron_Scher and Robi Rahman

17 Dec 2025 0:02 UTC

61 points

5 comments2 min readLW link

(techgov.intelligence.org)

Non-Scheming Saints (Whether Human Or Digital) Might Be Shirking Their Governance Duties, And, If True, It Is Probably An Objective Tragedy

JenniferRM16 Dec 2025 23:56 UTC

42 points

3 comments9 min readLW link

A Primer on Operant Conditioning

foodforthought16 Dec 2025 21:26 UTC

5 points

0 comments4 min readLW link

Towards training-time mitigations for alignment faking in RL

Vlad Mikulik, gasteigerjo, Hoagy, Joe Benton, Benjamin Wright, Jonathan Uesato, Monte M, Fabien Roger and evhub

16 Dec 2025 21:01 UTC

33 points

1 comment5 min readLW link

(alignment.anthropic.com)

Measuring Drug Target Success

sarahconstantin16 Dec 2025 21:00 UTC

19 points

3 comments2 min readLW link

(sarahconstantin.substack.com)

A Study in Attention

hamilton16 Dec 2025 20:39 UTC

14 points

0 comments2 min readLW link

Emergent Sycophancy

ohdearohdear16 Dec 2025 20:21 UTC

8 points

0 comments5 min readLW link

Systems of Control

phoenix16 Dec 2025 19:00 UTC

15 points

3 comments22 min readLW link

Discursive Games, Discursive Warfare

Suspended Reason16 Dec 2025 18:24 UTC

36 points

0 comments30 min readLW link

Scientific breakthroughs of the year

technicalities16 Dec 2025 18:00 UTC

178 points

13 comments3 min readLW link

(x.com)

In defense of slop

jasoncrawford16 Dec 2025 17:36 UTC

20 points

3 comments4 min readLW link

(newsletter.rootsofprogress.org)

TSMC most definitely has a golden record of all AI chips it made

Naci Cankaya16 Dec 2025 17:20 UTC

3 points

0 comments1 min readLW link

(nacicankaya.substack.com)

The $140,000 Question

Zvi16 Dec 2025 16:50 UTC

19 points

0 comments15 min readLW link

(thezvi.wordpress.com)

Radiology Automation Does Not Generalize to Other Jobs

Xodarap16 Dec 2025 14:32 UTC

47 points

5 comments1 min readLW link

Fermi paradox solutions map

avturchin16 Dec 2025 14:21 UTC

26 points

9 comments1 min readLW link

According to doctors, how feasible is preserving the dying for future revival?

Ariel Zeleznikow-Johnston16 Dec 2025 13:18 UTC

18 points

2 comments2 min readLW link

(open.substack.com)

A friction in my dealings with friends who have not yet bought into the reality of AI risk

Olle Häggström16 Dec 2025 8:12 UTC

18 points

13 comments4 min readLW link

A Rationalist Christmas

Ryan Meservey16 Dec 2025 7:23 UTC

5 points

1 comment4 min readLW link

[Question] Why do LLMs so often say “It’s not an X, it’s a Y”?

ChristianKl16 Dec 2025 1:02 UTC

28 points

13 comments1 min readLW link

Response to titotal’s critique of our AI 2027 timelines model

elifland and Daniel Kokotajlo

16 Dec 2025 0:51 UTC

38 points

6 comments43 min readLW link

(aifuturesnotes.substack.com)

Introducing Lunette: auditing agents for evals and environments

zef, leni and kaivu

15 Dec 2025 23:17 UTC

23 points

0 comments1 min readLW link

(fulcrumresearch.ai)

Private AI clouds are the future of inference

perfectfwd15 Dec 2025 23:04 UTC

3 points

0 comments9 min readLW link

(perfectforward.substack.com)

Naming

CTA15 Dec 2025 23:00 UTC

3 points

0 comments4 min readLW link

Viewing animals as economic agents

foodforthought15 Dec 2025 18:13 UTC

10 points

2 comments5 min readLW link

Digital Freedom Fund open for grant applications (Deadline: 17th February)

gergogaspar15 Dec 2025 16:25 UTC

8 points

0 comments1 min readLW link

Луна Лавгуд и Комната Тайн, Часть 9

Kongo Landwalker and lsusr

15 Dec 2025 16:01 UTC

2 points

0 comments1 min readLW link

Defending Against Model Weight Exfiltration Through Inference Verification

Roy Rinberg, Adam Karvonen, dreuter and Keri Warr

15 Dec 2025 15:26 UTC

119 points

15 comments8 min readLW link

Rotations in Superposition

Linda Linsefors and Lucius Bushnaq

15 Dec 2025 14:58 UTC

54 points

6 comments11 min readLW link

What is an evaluation, and why this definition matters

Igor Ivanov15 Dec 2025 14:53 UTC

33 points

1 comment7 min readLW link