How To Be­come A Mechanis­tic In­ter­pretabil­ity Researcher

Neel Nanda2 Sep 2025 23:38 UTC
109 points
12 comments55 min readLW link

[Question] When Both Peo­ple Are In­ter­ested, How Often Is Flir­ta­tious Es­ca­la­tion Mu­tual?

johnswentworth2 Sep 2025 23:37 UTC
51 points
14 comments2 min readLW link

Scal­ing AI Safety in Europe: From Lo­cal Groups to In­ter­na­tional Coordination

MariusWenk2 Sep 2025 23:36 UTC
21 points
1 comment11 min readLW link

Si­mu­lat­ing the *rest* of the poli­ti­cal disagreement

Raemon2 Sep 2025 22:06 UTC
125 points
16 comments2 min readLW link

AI Safety at the Fron­tier: Paper High­lights, Au­gust ’25

gasteigerjo2 Sep 2025 20:29 UTC
12 points
0 comments7 min readLW link
(open.substack.com)

Struc­tural en­g­ineer­ing in soft­ware engineering

Adam Zerner2 Sep 2025 19:07 UTC
25 points
2 comments4 min readLW link

But Have They En­gaged With The Ar­gu­ments? [Linkpost]

Noosphere892 Sep 2025 18:25 UTC
72 points
14 comments2 min readLW link
(philiptrammell.com)

Models vs beliefs

Adam Zerner2 Sep 2025 17:27 UTC
29 points
14 comments2 min readLW link

Non-Dual­ism and AI Morality

Marcio Díaz2 Sep 2025 17:21 UTC
3 points
4 comments5 min readLW link

%CPU Utiliza­tion Is A Lie

Brendan Long2 Sep 2025 17:05 UTC
75 points
9 comments3 min readLW link
(www.brendanlong.com)

Your LLM-as­sisted sci­en­tific break­through prob­a­bly isn’t real

eggsyntax2 Sep 2025 15:05 UTC
143 points
39 comments7 min readLW link

xAI’s new safety frame­work is dreadful

Zach Stein-Perlman2 Sep 2025 15:00 UTC
104 points
5 comments3 min readLW link

Notes on Dark Sun (The Mak­ing of the Hy­dro­gen Bomb)

Joel Burget2 Sep 2025 13:20 UTC
22 points
0 comments23 min readLW link

Three main views on the fu­ture of AI

2 Sep 2025 13:06 UTC
47 points
1 comment1 min readLW link

Traf­fic and Tran­sit Roundup #1

Zvi2 Sep 2025 12:00 UTC
37 points
4 comments21 min readLW link
(thezvi.wordpress.com)

Gra­di­ent rout­ing is bet­ter than pre­train­ing filtering

Cleo Nardo2 Sep 2025 9:05 UTC
44 points
3 comments5 min readLW link

Time’s ar­row ⇒ de­ci­sion theory

Aram Ebtekar2 Sep 2025 6:20 UTC
33 points
0 comments2 min readLW link
(doi.org)

The Cats are On To Something

Hastings2 Sep 2025 2:30 UTC
248 points
27 comments3 min readLW link
(www.hgreer.com)

Will Non-Dual Crap Cause Emer­gent Misal­ign­ment?

Marcio Díaz2 Sep 2025 0:12 UTC
26 points
2 comments4 min readLW link

Cat­e­gory-The­o­retic Wan­der­ings into Interpretability

unruly abstractions2 Sep 2025 0:03 UTC
18 points
2 comments1 min readLW link
(www.unrulyabstractions.com)

An­thropic’s lead­ing re­searchers acted as mod­er­ate accelerationists

Remmelt1 Sep 2025 23:23 UTC
118 points
69 comments42 min readLW link

⿻ Plu­ral­ity & 6pack.care

Audrey Tang1 Sep 2025 20:54 UTC
173 points
19 comments11 min readLW link

A Cup of Black Coffee

Rudaiba1 Sep 2025 20:17 UTC
−11 points
2 comments4 min readLW link

The In­sight Gacha

The Dao of Bayes1 Sep 2025 17:15 UTC
13 points
0 comments3 min readLW link

Dat­ing Roundup #7: Back to Basics

Zvi1 Sep 2025 11:40 UTC
23 points
11 comments29 min readLW link
(thezvi.wordpress.com)

Want to make AI go well for all sen­tient be­ings? Ap­ply to a Sen­tient Fu­tures fel­low­ship or con­fer­ence!

Damin Curtis1 Sep 2025 8:50 UTC
17 points
0 comments2 min readLW link

Sup­port the move­ment against ex­tinc­tion risk due to AI

samuelshadrach1 Sep 2025 5:35 UTC
−26 points
8 comments2 min readLW link
(samuelshadrach.com)

Should we al­ign AI with ma­ter­nal in­stinct?

Priyanka Bharadwaj1 Sep 2025 3:56 UTC
33 points
15 comments3 min readLW link

Gen­er­a­tive AI is not caus­ing YCom­bi­na­tor com­pa­nies to grow more quickly than usual (yet)

Xodarap1 Sep 2025 3:38 UTC
95 points
8 comments9 min readLW link

Help me un­der­stand: how do mul­ti­verse acausal trades work?

Aram Ebtekar1 Sep 2025 3:25 UTC
46 points
26 comments2 min readLW link

Newcomber

Charlie Sanders1 Sep 2025 2:29 UTC
5 points
0 comments2 min readLW link
(www.dailymicrofiction.com)

Eval­u­at­ing Pre­dic­tion in Acausal Mixed-Mo­tive Settings

Tim Chan31 Aug 2025 22:58 UTC
14 points
0 comments6 min readLW link

My AI Pre­dic­tions for 2027

Taylor G. Lunt31 Aug 2025 22:00 UTC
37 points
73 comments16 min readLW link

He­do­nium is AI Alignment

31 Aug 2025 19:46 UTC
−16 points
0 comments6 min readLW link

To Rae­mon: bet in My (per­sonal) Goals

P. João31 Aug 2025 15:48 UTC
3 points
0 comments3 min readLW link

Le­gal Per­son­hood—The First Amend­ment (Part 2)

Stephen Martin31 Aug 2025 12:06 UTC
2 points
0 comments2 min readLW link

A quan­tum equiv­a­lent to Bayes’ rule

dr_s31 Aug 2025 10:06 UTC
51 points
17 comments8 min readLW link

ACX Meetup Wellington

NotEvil31 Aug 2025 5:13 UTC
1 point
1 comment1 min readLW link

Sleep­ing Ex­perts in the (re­flec­tive) Solomonoff Prior

31 Aug 2025 4:55 UTC
16 points
0 comments3 min readLW link

Hack­ing The Spec­trum For Profit (Maybe Fun)

Elek Szid31 Aug 2025 4:49 UTC
7 points
3 comments3 min readLW link

AI agents and painted facades

30 Aug 2025 23:13 UTC
38 points
3 comments2 min readLW link
(fulcrumresearch.ai)

ACX Every­where fall 2025 - New­ton, MA

duck_master30 Aug 2025 22:02 UTC
1 point
1 comment1 min readLW link

[via bsky, found pa­per] “AI Con­scious­ness: A Cen­trist Man­i­festo”

the gears to ascension30 Aug 2025 21:05 UTC
13 points
0 comments1 min readLW link
(philpapers.org)

Fe­male sex­ual at­trac­tive­ness seems more egal­i­tar­ian than peo­ple acknowledge

lc30 Aug 2025 18:09 UTC
53 points
27 comments3 min readLW link

AI Sleeper Agents: How An­thropic Trains and Catches Them—Video

Writer30 Aug 2025 17:53 UTC
9 points
0 comments7 min readLW link
(youtu.be)

Un­der­stand­ing LLMs: In­sights from Mechanis­tic Interpretability

Stephen McAleese30 Aug 2025 16:50 UTC
40 points
2 comments30 min readLW link

Le­gal Per­son­hood—The First Amend­ment (Part 1)

Stephen Martin30 Aug 2025 13:20 UTC
4 points
0 comments3 min readLW link

Method Iter­a­tion: An LLM Prompt­ing Technique

Davey Morse30 Aug 2025 0:08 UTC
−12 points
1 comment2 min readLW link

[Question] How to bet on my­self? From ex­pec­ta­tions to ro­bust goals

P. João29 Aug 2025 18:33 UTC
4 points
3 comments1 min readLW link

AI Se­cu­rity Lon­don Hackathon

Prince Kumar29 Aug 2025 18:23 UTC
4 points
0 comments1 min readLW link