An in­tro­duc­tion to mod­u­lar in­duc­tion and some at­tempts to solve it

Thomas Kehrenberg23 Dec 2025 22:35 UTC
12 points
1 comment18 min readLW link

Rules clar­ifi­ca­tion for the Write like lsusr competition

Isusr23 Dec 2025 21:12 UTC
8 points
2 comments2 min readLW link

Hu­man Values

Maitreya23 Dec 2025 21:08 UTC
32 points
1 comment3 min readLW link

Align­ment Fellowship

rich_anon23 Dec 2025 20:29 UTC
58 points
14 comments1 min readLW link

Iter­a­tive Ma­trix Steer­ing: Forc­ing LLMs to “Ra­tion­al­ize” Hal­lu­ci­na­tions via Sub­space Alignment

Artem Herasymenko23 Dec 2025 20:13 UTC
9 points
2 comments4 min readLW link

Un­pack­ing Geo­met­ric Rationality

MorgneticField23 Dec 2025 20:10 UTC
2 points
0 comments33 min readLW link

Dream­ing Vec­tors: Gra­di­ent-de­scented steer­ing vec­tors from Ac­ti­va­tion Or­a­cles and us­ing them to Red-Team AOs

ceselder23 Dec 2025 19:28 UTC
22 points
4 comments12 min readLW link

The Cen­ter for Re­duc­ing Suffer­ing wants in­put from the suffer­ing re­duc­tion community

Zoé23 Dec 2025 18:27 UTC
1 point
0 comments1 min readLW link
(centerforreducingsuffering.org)

It’s Good To Create Happy Peo­ple: A Com­pre­hen­sive Case

Bentham's Bulldog23 Dec 2025 16:43 UTC
1 point
5 comments33 min readLW link

I Died on DMT

Rebecca Dai23 Dec 2025 16:15 UTC
12 points
2 comments7 min readLW link
(rebeccadai.substack.com)

Open Source is a Nor­mal Term

jefftk23 Dec 2025 15:40 UTC
24 points
4 comments1 min readLW link
(www.jefftk.com)

Don’t Trust Your Brain

silentbob23 Dec 2025 15:06 UTC
37 points
5 comments4 min readLW link

The ML drug dis­cov­ery startup try­ing re­ally, re­ally hard to not cheat

Abhishaike Mahajan23 Dec 2025 14:48 UTC
86 points
2 comments19 min readLW link
(www.owlposting.com)

Keep­ing Up Against the Jone­ses: Balsa’s 2025 Fundraiser

Zvi23 Dec 2025 14:40 UTC
49 points
1 comment6 min readLW link
(thezvi.wordpress.com)

Does 1025 mod­ulo 57 equal 59?

Jan Betley23 Dec 2025 13:00 UTC
33 points
3 comments2 min readLW link

What Can Wittgen­stein Teach Us About LLM Safety Re­search?

Manqing Liu23 Dec 2025 4:14 UTC
8 points
0 comments4 min readLW link

Job List­ing (CLOSED): CBAI Re­search Managers

23 Dec 2025 4:03 UTC
1 point
0 comments1 min readLW link

Ground­ing Value Learn­ing in Evolu­tion­ary Psy­chol­ogy: an Alter­na­tive Pro­posal to CEV

RogerDearnaley23 Dec 2025 3:40 UTC
40 points
25 comments20 min readLW link

The Benefits of Med­i­ta­tion Come From Tel­ling Peo­ple That You Meditate

ThirdEyeJoe (cousin of CottonEyedJoe)23 Dec 2025 1:48 UTC
35 points
5 comments2 min readLW link

The fu­ture of al­ign­ment if LLMs are a bubble

Stuart_Armstrong23 Dec 2025 0:08 UTC
47 points
13 comments5 min readLW link

Un­su­per­vised Agent Discovery

Gunnar_Zarncke22 Dec 2025 22:01 UTC
24 points
0 comments6 min readLW link

An­nounc­ing Gemma Scope 2

22 Dec 2025 21:56 UTC
94 points
1 comment2 min readLW link

[Ad­vanced In­tro to AI Align­ment] 0. Overview and Foundations

Towards_Keeperhood22 Dec 2025 21:20 UTC
15 points
0 comments5 min readLW link

$500 Write like lsusr competition

lsusr22 Dec 2025 20:09 UTC
29 points
43 comments3 min readLW link

Ap­pen­dices: Su­per­vised fine­tun­ing on low-harm re­ward hack­ing gen­er­al­ises to high-harm re­ward hacking

22 Dec 2025 19:33 UTC
17 points
0 comments1 min readLW link

Su­per­vised fine­tun­ing on low-harm re­ward hack­ing gen­er­al­ises to high-harm re­ward hacking

22 Dec 2025 19:32 UTC
14 points
0 comments30 min readLW link

Re­cent LLMs can use filler to­kens or prob­lem re­peats to im­prove (no-CoT) math performance

ryan_greenblatt22 Dec 2025 17:21 UTC
152 points
18 comments7 min readLW link

Can we in­ter­pret la­tent rea­son­ing us­ing cur­rent mechanis­tic in­ter­pretabil­ity tools?

22 Dec 2025 16:56 UTC
34 points
0 comments9 min readLW link

[Question] Why does Eliezer make abra­sive pub­lic com­ments?

k6422 Dec 2025 16:45 UTC
96 points
65 comments1 min readLW link

The Revolu­tion of Ris­ing Expectations

Zvi22 Dec 2025 13:40 UTC
71 points
6 comments19 min readLW link
(thezvi.wordpress.com)

Ir­re­spon­si­ble and Un­rea­son­able Takes on Mee­tups Organizing

Screwtape22 Dec 2025 7:42 UTC
66 points
3 comments6 min readLW link

Most suc­cess­ful en­trepreneur­ship is unproductive

lc22 Dec 2025 6:33 UTC
41 points
27 comments3 min readLW link

AIXI with gen­eral util­ity func­tions: “Value un­der ig­no­rance in UAI”

Cole Wyeth22 Dec 2025 5:46 UTC
25 points
0 comments1 min readLW link
(arxiv.org)

Up­date: 5 months of Retatrutide

Brendan Long22 Dec 2025 0:02 UTC
24 points
0 comments1 min readLW link

En­ergy and Ingenuity

datawitch21 Dec 2025 22:22 UTC
9 points
0 comments7 min readLW link

Small Models Can In­tro­spect, Too

vgel21 Dec 2025 22:20 UTC
121 points
8 comments4 min readLW link
(vgel.me)

Two No­tions of a Goal: Tar­get States vs. Suc­cess Metrics

paul_dfr21 Dec 2025 21:28 UTC
10 points
0 comments7 min readLW link

What’s the Cur­rent Stock Mar­ket Bub­ble?

PeterMcCluskey21 Dec 2025 20:08 UTC
46 points
2 comments2 min readLW link
(bayesianinvestor.com)

EA Yale Destiny De­bate Dis­cus­sion:

Nathan Young21 Dec 2025 19:10 UTC
10 points
11 comments1 min readLW link
(www.youtube.com)

Can Claude teach me to make coffee?

philh21 Dec 2025 16:23 UTC
120 points
19 comments16 min readLW link

Ret­ro­spec­tive on Copen­hagen Sec­u­lar Sols­tice 2025

Søren Elverlin21 Dec 2025 15:34 UTC
7 points
0 comments4 min readLW link

Google seem­ingly solved effi­cient attention

ceselder21 Dec 2025 13:54 UTC
26 points
4 comments4 min readLW link

Wit­ness or Wager: En­forc­ing ‘Show Your Work’ in Model Outputs

markacochran21 Dec 2025 13:12 UTC
3 points
2 comments1 min readLW link

Turn­ing 20 in the prob­a­ble pre-apoc­a­lypse

Parv Mahajan21 Dec 2025 10:14 UTC
408 points
65 comments3 min readLW link

Technoromanticism

lsusr21 Dec 2025 9:00 UTC
111 points
18 comments5 min readLW link

Anal­y­sis of Whisper-Tiny Us­ing Sparse Autoencoders

Omar Khursheed21 Dec 2025 8:44 UTC
9 points
0 comments4 min readLW link

A Way to Test and Train Creativity

SebastianT21 Dec 2025 8:43 UTC
3 points
2 comments3 min readLW link

Align­ment Pre­train­ing: AI Dis­course Causes Self-Fulfilling (Mis)alignment

21 Dec 2025 0:53 UTC
184 points
23 comments9 min readLW link

The un­rea­son­able deep­ness of num­ber theory

wingspan20 Dec 2025 22:16 UTC
65 points
6 comments9 min readLW link

Digi­tal in­ten­tion­al­ity: What’s the point?

mingyuan20 Dec 2025 21:46 UTC
45 points
7 comments3 min readLW link
(mingyuan.substack.com)