All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 222324 25 26 27 28 29 30 31

Unsupervised Agent Discovery

Gunnar_Zarncke22 Dec 2025 22:01 UTC

27 points

0 comments6 min readLW link

Announcing Gemma Scope 2

CallumMcDougall, Arthur Conmy, János Kramár, Tom Lieberum, Senthooran Rajamanoharan and Neel Nanda

22 Dec 2025 21:56 UTC

96 points

1 comment2 min readLW link

[Advanced Intro to AI Alignment] 0. Overview and Foundations

Towards_Keeperhood22 Dec 2025 21:20 UTC

15 points

3 comments5 min readLW link

$500 Write like lsusr competition

lsusr22 Dec 2025 20:09 UTC

29 points

43 comments3 min readLW link

Appendices: Supervised finetuning on low-harm reward hacking generalises to high-harm reward hacking

Isaac Dunn, Kei Nishimura-Gasparian, Carson Denison, Ethan Perez and Robert Kirk

22 Dec 2025 19:33 UTC

17 points

0 comments1 min readLW link

Supervised finetuning on low-harm reward hacking generalises to high-harm reward hacking

Isaac Dunn, Kei Nishimura-Gasparian, Carson Denison, Ethan Perez and Robert Kirk

22 Dec 2025 19:32 UTC

15 points

0 comments30 min readLW link

Recent LLMs can use filler tokens or problem repeats to improve (no-CoT) math performance

ryan_greenblatt22 Dec 2025 17:21 UTC

153 points

19 comments7 min readLW link

Can we interpret latent reasoning using current mechanistic interpretability tools?

Bartosz Cywiński, Bart Bussmann, Arthur Conmy, Josh Engels, Neel Nanda and Senthooran Rajamanoharan

22 Dec 2025 16:56 UTC

44 points

1 comment9 min readLW link

[Question] Why does Eliezer make abrasive public comments?

k6422 Dec 2025 16:45 UTC

97 points

65 comments1 min readLW link

The Revolution of Rising Expectations

Zvi22 Dec 2025 13:40 UTC

71 points

6 comments19 min readLW link

(thezvi.wordpress.com)

Irresponsible and Unreasonable Takes on Meetups Organizing

Screwtape22 Dec 2025 7:42 UTC

67 points

3 comments6 min readLW link

Most successful entrepreneurship is unproductive

lc22 Dec 2025 6:33 UTC

38 points

27 comments3 min readLW link

AIXI with general utility functions: “Value under ignorance in UAI”

Cole Wyeth22 Dec 2025 5:46 UTC

25 points

0 comments1 min readLW link

(arxiv.org)

Update: 5 months of Retatrutide

Brendan Long22 Dec 2025 0:02 UTC

26 points

0 comments1 min readLW link

Energy and Ingenuity

datawitch21 Dec 2025 22:22 UTC

9 points

0 comments7 min readLW link

Small Models Can Introspect, Too

vgel21 Dec 2025 22:20 UTC

124 points

8 comments4 min readLW link

(vgel.me)

Two Notions of a Goal: Target States vs. Success Metrics

paul_dfr21 Dec 2025 21:28 UTC

10 points

0 comments7 min readLW link

What’s the Current Stock Market Bubble?

PeterMcCluskey21 Dec 2025 20:08 UTC

48 points

4 comments2 min readLW link

(bayesianinvestor.com)

EA Yale Destiny Debate Discussion:

Nathan Young21 Dec 2025 19:10 UTC

10 points

11 comments1 min readLW link

(www.youtube.com)

Can Claude teach me to make coffee?

philh21 Dec 2025 16:23 UTC

151 points

25 comments16 min readLW link

Retrospective on Copenhagen Secular Solstice 2025

Søren Elverlin21 Dec 2025 15:34 UTC

7 points

0 comments4 min readLW link

Google seemingly solved efficient attention

ceselder21 Dec 2025 13:54 UTC

26 points

4 comments4 min readLW link

Witness or Wager: Enforcing ‘Show Your Work’ in Model Outputs

markacochran21 Dec 2025 13:12 UTC

3 points

2 comments1 min readLW link

Turning 20 in the probable pre-apocalypse

Parv Mahajan21 Dec 2025 10:14 UTC

444 points

66 comments3 min readLW link

Technoromanticism

lsusr21 Dec 2025 9:00 UTC

110 points

20 comments5 min readLW link

Analysis of Whisper-Tiny Using Sparse Autoencoders

Omar Khursheed21 Dec 2025 8:44 UTC

8 points

0 comments4 min readLW link

Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment

Cam, Puria, Kyle O’Brien, David Africa, Samuel Ratnam and andyk

21 Dec 2025 0:53 UTC

201 points

25 comments9 min readLW link

The unreasonable deepness of number theory

OhadA20 Dec 2025 22:16 UTC

65 points

6 comments9 min readLW link

Digital intentionality: What’s the point?

mingyuan20 Dec 2025 21:46 UTC

53 points

7 comments3 min readLW link

(mingyuan.substack.com)

Contradict my take on OpenPhil’s past AI beliefs

Eliezer Yudkowsky20 Dec 2025 21:15 UTC

197 points

94 comments3 min readLW link

Why the alchemists couldn’t build rockets

Garrett Baker20 Dec 2025 20:25 UTC

17 points

1 comment2 min readLW link

Experiments to understand Singular Learning Theory’s Free Energy & Local Learning Coefficient (LLC)

anish-lakkapragada20 Dec 2025 17:38 UTC

7 points

0 comments6 min readLW link

Chain-of-Thought as Contextual Stabilization and Associative Retrieval

Aditya Raj20 Dec 2025 17:32 UTC

5 points

1 comment6 min readLW link

How to game the METR plot

shash4220 Dec 2025 13:46 UTC

243 points

32 comments5 min readLW link

No God Can Help You

Ape in the coat20 Dec 2025 8:32 UTC

37 points

0 comments3 min readLW link

(apeinthecoat102771.substack.com)

Claude Opus 4.5 Achieves 50%-Time Horizon Of Around 4 hrs 49 Mins

Michaël Trazzi20 Dec 2025 7:13 UTC

92 points

14 comments1 min readLW link

Show LW: Alignment Scry

Xyra Sinclair20 Dec 2025 2:48 UTC

17 points

4 comments2 min readLW link

Opinionated Takes on Meetups Organizing

jenn20 Dec 2025 0:17 UTC

251 points

34 comments9 min readLW link

A Full Epistemic Stack: Knowledge Commons for the 21st Century

Oliver Sourbut and Ben Goldhaber

19 Dec 2025 22:48 UTC

46 points

7 comments11 min readLW link

(www.oliversourbut.net)

Opinion Fuzzing: A Proposal for Reducing & Exploring Variance in LLM Judgments Via Sampling

ozziegooen19 Dec 2025 21:41 UTC

11 points

0 comments5 min readLW link

Progress links and short notes, 2025-12-19

jasoncrawford19 Dec 2025 19:44 UTC

8 points

0 comments6 min readLW link

(newsletter.rootsofprogress.org)

Linch’s Top Inkhaven Posts and Reflections

Linch19 Dec 2025 19:40 UTC

38 points

0 comments9 min readLW link

(linch.substack.com)

When Were Things The Best?

Zvi19 Dec 2025 18:00 UTC

62 points

16 comments15 min readLW link

(thezvi.wordpress.com)

Response to Introspective Awareness research

maddi19 Dec 2025 17:23 UTC

6 points

0 comments9 min readLW link

SPAR Spring 2026: 130+ research projects now accepting applications

agucova19 Dec 2025 14:23 UTC

22 points

0 comments2 min readLW link

Space view

kapedalex19 Dec 2025 14:20 UTC

5 points

0 comments6 min readLW link

Digital Minds in 2025: A Year in Review

tbs and lucius

19 Dec 2025 14:18 UTC

16 points

0 comments21 min readLW link

(digitalminds.substack.com)

Scratchpad

Karthik Tadepalli19 Dec 2025 14:15 UTC

12 points

0 comments4 min readLW link

AI Safety has a scaling problem

beyarkay (Boyd Kane)19 Dec 2025 13:58 UTC

34 points

10 comments4 min readLW link

When Are Concealment Features Learned? And Does the Model Know Who’s Watching?

James Hoffend19 Dec 2025 8:19 UTC

13 points

1 comment6 min readLW link