All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 232425 26 27 28 29 30 31

An introduction to modular induction and some attempts to solve it

Thomas Kehrenberg23 Dec 2025 22:35 UTC

12 points

1 comment18 min readLW link

Rules clarification for the Write like lsusr competition

Isusr23 Dec 2025 21:12 UTC

8 points

2 comments2 min readLW link

Human Values

Maitreya23 Dec 2025 21:08 UTC

32 points

1 comment3 min readLW link

Alignment Fellowship

rich_anon23 Dec 2025 20:29 UTC

58 points

14 comments1 min readLW link

Iterative Matrix Steering: Forcing LLMs to “Rationalize” Hallucinations via Subspace Alignment

Artem Herasymenko23 Dec 2025 20:13 UTC

9 points

2 comments4 min readLW link

Unpacking Geometric Rationality

MorgneticField23 Dec 2025 20:10 UTC

2 points

0 comments33 min readLW link

Dreaming Vectors: Gradient-descented steering vectors from Activation Oracles and using them to Red-Team AOs

ceselder23 Dec 2025 19:28 UTC

22 points

4 comments12 min readLW link

The Center for Reducing Suffering wants input from the suffering reduction community

Zoé23 Dec 2025 18:27 UTC

1 point

0 comments1 min readLW link

(centerforreducingsuffering.org)

It’s Good To Create Happy People: A Comprehensive Case

Bentham's Bulldog23 Dec 2025 16:43 UTC

1 point

5 comments33 min readLW link

I Died on DMT

Rebecca Dai23 Dec 2025 16:15 UTC

12 points

2 comments7 min readLW link

(rebeccadai.substack.com)

Open Source is a Normal Term

jefftk23 Dec 2025 15:40 UTC

24 points

4 comments1 min readLW link

(www.jefftk.com)

Don’t Trust Your Brain

silentbob23 Dec 2025 15:06 UTC

37 points

5 comments4 min readLW link

The ML drug discovery startup trying really, really hard to not cheat

Abhishaike Mahajan23 Dec 2025 14:48 UTC

86 points

2 comments19 min readLW link

(www.owlposting.com)

Keeping Up Against the Joneses: Balsa’s 2025 Fundraiser

Zvi23 Dec 2025 14:40 UTC

49 points

1 comment6 min readLW link

(thezvi.wordpress.com)

Does 1025 modulo 57 equal 59?

Jan Betley23 Dec 2025 13:00 UTC

33 points

3 comments2 min readLW link

What Can Wittgenstein Teach Us About LLM Safety Research?

Manqing Liu23 Dec 2025 4:14 UTC

8 points

0 comments4 min readLW link

Job Listing (CLOSED): CBAI Research Managers

Maite Abadia-Manthei and emreyavuz

23 Dec 2025 4:03 UTC

1 point

0 comments1 min readLW link

Grounding Value Learning in Evolutionary Psychology: an Alternative Proposal to CEV

RogerDearnaley23 Dec 2025 3:40 UTC

40 points

25 comments20 min readLW link

The Benefits of Meditation Come From Telling People That You Meditate

ThirdEyeJoe (cousin of CottonEyedJoe)23 Dec 2025 1:48 UTC

35 points

5 comments2 min readLW link

The future of alignment if LLMs are a bubble

Stuart_Armstrong23 Dec 2025 0:08 UTC

47 points

13 comments5 min readLW link

Unsupervised Agent Discovery

Gunnar_Zarncke22 Dec 2025 22:01 UTC

24 points

0 comments6 min readLW link

Announcing Gemma Scope 2

CallumMcDougall, Arthur Conmy, János Kramár, Tom Lieberum, Senthooran Rajamanoharan and Neel Nanda

22 Dec 2025 21:56 UTC

94 points

1 comment2 min readLW link

[Advanced Intro to AI Alignment] 0. Overview and Foundations

Towards_Keeperhood22 Dec 2025 21:20 UTC

15 points

0 comments5 min readLW link

$500 Write like lsusr competition

lsusr22 Dec 2025 20:09 UTC

29 points

43 comments3 min readLW link

Appendices: Supervised finetuning on low-harm reward hacking generalises to high-harm reward hacking

Isaac Dunn, Kei Nishimura-Gasparian, Carson Denison, Ethan Perez and Robert Kirk

22 Dec 2025 19:33 UTC

17 points

0 comments1 min readLW link

Supervised finetuning on low-harm reward hacking generalises to high-harm reward hacking

Isaac Dunn, Kei Nishimura-Gasparian, Carson Denison, Ethan Perez and Robert Kirk

22 Dec 2025 19:32 UTC

14 points

0 comments30 min readLW link

Recent LLMs can use filler tokens or problem repeats to improve (no-CoT) math performance

ryan_greenblatt22 Dec 2025 17:21 UTC

152 points

18 comments7 min readLW link

Can we interpret latent reasoning using current mechanistic interpretability tools?

Bartosz Cywiński, Bart Bussmann, Arthur Conmy, Josh Engels, Neel Nanda and Senthooran Rajamanoharan

22 Dec 2025 16:56 UTC

34 points

0 comments9 min readLW link

[Question] Why does Eliezer make abrasive public comments?

k6422 Dec 2025 16:45 UTC

96 points

65 comments1 min readLW link

The Revolution of Rising Expectations

Zvi22 Dec 2025 13:40 UTC

71 points

6 comments19 min readLW link

(thezvi.wordpress.com)

Irresponsible and Unreasonable Takes on Meetups Organizing

Screwtape22 Dec 2025 7:42 UTC

66 points

3 comments6 min readLW link

Most successful entrepreneurship is unproductive

lc22 Dec 2025 6:33 UTC

41 points

27 comments3 min readLW link

AIXI with general utility functions: “Value under ignorance in UAI”

Cole Wyeth22 Dec 2025 5:46 UTC

25 points

0 comments1 min readLW link

(arxiv.org)

Update: 5 months of Retatrutide

Brendan Long22 Dec 2025 0:02 UTC

24 points

0 comments1 min readLW link

Energy and Ingenuity

datawitch21 Dec 2025 22:22 UTC

9 points

0 comments7 min readLW link

Small Models Can Introspect, Too

vgel21 Dec 2025 22:20 UTC

121 points

8 comments4 min readLW link

(vgel.me)

Two Notions of a Goal: Target States vs. Success Metrics

paul_dfr21 Dec 2025 21:28 UTC

10 points

0 comments7 min readLW link

What’s the Current Stock Market Bubble?

PeterMcCluskey21 Dec 2025 20:08 UTC

46 points

2 comments2 min readLW link

(bayesianinvestor.com)

EA Yale Destiny Debate Discussion:

Nathan Young21 Dec 2025 19:10 UTC

10 points

11 comments1 min readLW link

(www.youtube.com)

Can Claude teach me to make coffee?

philh21 Dec 2025 16:23 UTC

120 points

19 comments16 min readLW link

Retrospective on Copenhagen Secular Solstice 2025

Søren Elverlin21 Dec 2025 15:34 UTC

7 points

0 comments4 min readLW link

Google seemingly solved efficient attention

ceselder21 Dec 2025 13:54 UTC

26 points

4 comments4 min readLW link

Witness or Wager: Enforcing ‘Show Your Work’ in Model Outputs

markacochran21 Dec 2025 13:12 UTC

3 points

2 comments1 min readLW link

Turning 20 in the probable pre-apocalypse

Parv Mahajan21 Dec 2025 10:14 UTC

408 points

65 comments3 min readLW link

Technoromanticism

lsusr21 Dec 2025 9:00 UTC

111 points

18 comments5 min readLW link

Analysis of Whisper-Tiny Using Sparse Autoencoders

Omar Khursheed21 Dec 2025 8:44 UTC

9 points

0 comments4 min readLW link

A Way to Test and Train Creativity

SebastianT21 Dec 2025 8:43 UTC

3 points

2 comments3 min readLW link

Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment

Cam, Puria Radmard, Kyle O’Brien, David Africa, Samuel Ratnam and andyk

21 Dec 2025 0:53 UTC

184 points

23 comments9 min readLW link

The unreasonable deepness of number theory

wingspan20 Dec 2025 22:16 UTC

65 points

6 comments9 min readLW link

Digital intentionality: What’s the point?

mingyuan20 Dec 2025 21:46 UTC

45 points

7 comments3 min readLW link

(mingyuan.substack.com)