The Strange Science of In­ter­pretabil­ity: Re­cent Papers and a Read­ing List for the Philos­o­phy of Interpretability

Kola Ayonrinde17 Aug 2025 23:38 UTC
16 points
0 comments2 min readLW link
(arxiv.org)

The parable of the underdog

Said Achmiz17 Aug 2025 22:39 UTC
25 points
4 comments2 min readLW link
(www.datasecretslox.com)

Un­der­dog bias rules ev­ery­thing around me

Richard_Ngo17 Aug 2025 19:21 UTC
159 points
53 comments7 min readLW link
(www.mindthefuture.info)

Ap­ply for the 2025 Dove­tail fellowship

17 Aug 2025 19:09 UTC
42 points
2 comments4 min readLW link

Writ­ing Out My Tunes

jefftk17 Aug 2025 17:00 UTC
11 points
2 comments3 min readLW link
(www.jefftk.com)

Plan E for AI Doom

Ihor Kendiukhov17 Aug 2025 15:26 UTC
67 points
15 comments3 min readLW link

[Question] Mean­ing in life—should I have it? How did you find yours?

Aprillion17 Aug 2025 9:49 UTC
13 points
21 comments3 min readLW link

Le­gal Per­son­hood—Types of Consequences

Stephen Martin17 Aug 2025 6:52 UTC
6 points
0 comments4 min readLW link

Agent foun­da­tions: not re­ally math, not re­ally science

Alex_Altair17 Aug 2025 5:48 UTC
114 points
25 comments5 min readLW link

Why Lat­ter-day Saints Have Strong Communities

Jeffrey Heninger17 Aug 2025 4:20 UTC
102 points
29 comments9 min readLW link

Im­mor­tal­ism—A Ra­tional Case for Solv­ing Death

vampiretooth17 Aug 2025 3:56 UTC
11 points
4 comments18 min readLW link

My In­ter­view With Cade Metz on His Re­port­ing About Lighthaven

Zack_M_Davis17 Aug 2025 2:30 UTC
151 points
15 comments5 min readLW link

On Pessimization

Richard_Ngo17 Aug 2025 1:10 UTC
61 points
3 comments10 min readLW link
(www.mindthefuture.info)

De­bug­ging for Mid Coders

Raemon16 Aug 2025 22:32 UTC
82 points
41 comments7 min readLW link

Church Plant­ing: When Ven­ture Cap­i­tal Finds Jesus

Elizabeth16 Aug 2025 19:40 UTC
226 points
23 comments16 min readLW link
(acesounderglass.com)

35 Thoughts About AGI and 1 About GPT-5

snewman16 Aug 2025 19:20 UTC
21 points
20 comments16 min readLW link
(secondthoughts.ai)

keyMe­tas — If train­ing an AI re­quires vec­tor­iz­ing the hid­den, why not try it with our goals?

P. João16 Aug 2025 18:07 UTC
3 points
0 comments5 min readLW link

The Com­pre­hen­sive Case Against Trump

Bentham's Bulldog16 Aug 2025 17:30 UTC
−14 points
34 comments26 min readLW link

The Col­lider Bias The­ory of (Not Quite) Everything

Jack_S16 Aug 2025 16:53 UTC
82 points
3 comments10 min readLW link

How we hacked busi­ness school

16 Aug 2025 15:22 UTC
17 points
2 comments6 min readLW link
(agenticconjectures.substack.com)

[Question] Why did in­ter­est in “AI risk” and “AI safety” spike in June and July 2025? (Google Trends)

WilliamKiely16 Aug 2025 15:22 UTC
32 points
4 comments1 min readLW link

Four types of ap­proaches for your emo­tional problems

Kaj_Sotala16 Aug 2025 13:59 UTC
43 points
5 comments15 min readLW link

‘Just Tax Land’ - what’s the point?

Hruss16 Aug 2025 12:37 UTC
−3 points
1 comment1 min readLW link
(open.substack.com)

Mind Conditioning

Gabriel Alfour16 Aug 2025 11:20 UTC
−1 points
0 comments1 min readLW link
(cognition.cafe)

An­thropic Lets Claude Opus 4 & 4.1 End Conversations

Stephen Martin16 Aug 2025 5:01 UTC
53 points
3 comments1 min readLW link
(www.anthropic.com)

The In­her­i­tors: a book review

Alex_Altair16 Aug 2025 2:47 UTC
73 points
4 comments3 min readLW link

BIDA Mask­ing and Attendance

jefftk16 Aug 2025 1:50 UTC
11 points
0 comments1 min readLW link
(www.jefftk.com)

Rights & Liber­ties—are opposites

James Stephen Brown16 Aug 2025 0:20 UTC
1 point
0 comments4 min readLW link

N Di­men­sional In­ter­ac­tive Scat­ter Plot (ndisp)

TristanTrim15 Aug 2025 23:08 UTC
10 points
3 comments12 min readLW link

SE Gyges’ re­sponse to AI-2027

StanislavKrym15 Aug 2025 21:54 UTC
29 points
13 comments46 min readLW link
(www.verysane.ai)

Towards data-cen­tric in­ter­pretabil­ity with sparse autoencoders

15 Aug 2025 20:10 UTC
53 points
2 comments18 min readLW link

Mu­sic taste is (also) a next to­ken prediction

eamag15 Aug 2025 17:49 UTC
5 points
0 comments2 min readLW link
(eamag.me)

The­ory of cul­ture as waste.

Laureana Bonaparte15 Aug 2025 17:34 UTC
−3 points
15 comments2 min readLW link

Spend­ing Too Much Time At Airports

Zvi15 Aug 2025 16:10 UTC
57 points
24 comments7 min readLW link
(thezvi.wordpress.com)

How to make the fu­ture bet­ter (other than by re­duc­ing ex­tinc­tion risk)

wdmacaskill15 Aug 2025 15:40 UTC
19 points
1 comment3 min readLW link

Should you start a for-profit AI safety org?

KatWoods15 Aug 2025 13:52 UTC
8 points
4 comments1 min readLW link

How to get ChatGPT to re­ally thor­oughly re­search something

KatWoods15 Aug 2025 12:54 UTC
18 points
1 comment1 min readLW link

Thoughts on Grad­ual Disempowerment

Tom Davidson15 Aug 2025 11:56 UTC
62 points
32 comments19 min readLW link

Misal­ign­ment clas­sifiers: Why they’re hard to eval­u­ate ad­ver­sar­i­ally, and why we’re study­ing them anyway

15 Aug 2025 11:48 UTC
59 points
3 comments17 min readLW link

A Phy­logeny of Agents

15 Aug 2025 10:47 UTC
40 points
12 comments6 min readLW link
(substack.com)

My kids won’t be workers

Gauraventh15 Aug 2025 7:06 UTC
3 points
0 comments6 min readLW link
(y1d2.com)

Euro­pean Links (15.08.25)

Martin Sustrik15 Aug 2025 4:20 UTC
21 points
8 comments2 min readLW link
(www.250bpm.com)

Le­gal Per­son­hood—Three Prong Bun­dle Theory

Stephen Martin15 Aug 2025 4:13 UTC
13 points
6 comments4 min readLW link

Men­tal Gym­nas­tics.

Laureana Bonaparte15 Aug 2025 4:08 UTC
3 points
0 comments13 min readLW link

Rare AI and the Fermi Paradox

dawnstrata15 Aug 2025 4:05 UTC
11 points
6 comments9 min readLW link

Tris­tan’s Projects

TristanTrim15 Aug 2025 3:46 UTC
6 points
4 comments2 min readLW link

Tri­al­ing Far UVC and Gly­col Va­pors at BIDA

jefftk15 Aug 2025 2:20 UTC
19 points
1 comment2 min readLW link
(www.jefftk.com)

A philo­soph­i­cal ker­nel: bit­ing an­a­lytic bullets

jessicata15 Aug 2025 1:35 UTC
64 points
21 comments13 min readLW link
(unstableontology.com)

A let­ter to Kyle Fish on the Re­tire­ment of Claude 3 Sonnet

bridgebot15 Aug 2025 1:08 UTC
−4 points
3 comments5 min readLW link

Con­cep­tual Rhyme and Metaphor

Jordan Rubin15 Aug 2025 0:05 UTC
2 points
0 comments9 min readLW link
(jordanmrubin.substack.com)