The Danger­ous Illu­sion of AI Deter­rence: Why MAIM Isn’t Rational

Robert Shuler22 Mar 2025 22:55 UTC
3 points
0 comments2 min readLW link

Day­ton, Ohio, ACX Meetup

Lunawarrior22 Mar 2025 19:45 UTC
1 point
0 comments1 min readLW link

[Repli­ca­tion] Cross­coder-based Stage-Wise Model Diffing

22 Mar 2025 18:35 UTC
24 points
0 comments7 min readLW link

The Prin­ci­ple of Satis­fy­ing Foreknowledge

Randall Reams22 Mar 2025 18:20 UTC
1 point
0 comments2 min readLW link

[Question] Ur­gency in the ITN framework

Shaïman22 Mar 2025 18:16 UTC
0 points
2 comments1 min readLW link

Tran­shu­man­ism and AI: Toward Pros­per­ity or Ex­tinc­tion?

Shaïman22 Mar 2025 18:16 UTC
11 points
2 comments6 min readLW link

Tied Cross­coders: Ex­plain­ing Chat Be­hav­ior from Base Model

Santiago Aranguri22 Mar 2025 18:07 UTC
9 points
0 comments12 min readLW link

100+ con­crete pro­jects and open prob­lems in evals

Marius Hobbhahn22 Mar 2025 15:21 UTC
75 points
1 comment1 min readLW link

Do mod­els say what they learn?

22 Mar 2025 15:19 UTC
126 points
12 comments13 min readLW link

deleted

funnyfranco22 Mar 2025 12:06 UTC
2 points
8 comments1 min readLW link

2025 Q3 Pivotal Re­search Fel­low­ship: Ap­pli­ca­tions Open

Tobias H22 Mar 2025 10:54 UTC
4 points
0 comments2 min readLW link

Good Re­search Takes are Not Suffi­cient for Good Strate­gic Takes

Neel Nanda22 Mar 2025 10:13 UTC
294 points
28 comments4 min readLW link
(www.neelnanda.io)

Gram­mat­i­cal Roles and So­cial Roles: A Struc­tural Analogy

Lucien22 Mar 2025 7:44 UTC
0 points
0 comments1 min readLW link

Legibility

lsusr22 Mar 2025 6:54 UTC
20 points
22 comments2 min readLW link

Why Were We Wrong About China and AI? A Case Study in Failed Rationality

[deleted-by-moderator]22 Mar 2025 5:13 UTC
31 points
47 comments1 min readLW link

A Short Di­a­tribe on Hid­den Asser­tions.

Eggs22 Mar 2025 3:14 UTC
−9 points
2 comments3 min readLW link

Trans­former At­ten­tion’s High School Math Mistake

Max Ma22 Mar 2025 0:16 UTC
−13 points
1 comment1 min readLW link

Mak­ing Sense of Pres­i­dent Trump’s An­nex­a­tion Obsession

Annapurna21 Mar 2025 21:10 UTC
−13 points
3 comments5 min readLW link
(jorgevelez.substack.com)

How I force LLMs to gen­er­ate cor­rect code

claudio21 Mar 2025 14:40 UTC
91 points
7 comments5 min readLW link

Prospects for Align­ment Au­toma­tion: In­ter­pretabil­ity Case Study

21 Mar 2025 14:05 UTC
32 points
5 comments8 min readLW link

Epoch AI re­leased a GATE Sce­nario Explorer

Lee.aao21 Mar 2025 13:57 UTC
10 points
0 comments1 min readLW link
(epoch.ai)

They Took MY Job?

Zvi21 Mar 2025 13:30 UTC
37 points
4 comments9 min readLW link
(thezvi.wordpress.com)

Silly Time

jefftk21 Mar 2025 12:30 UTC
45 points
2 comments2 min readLW link
(www.jefftk.com)

Towards a scale-free the­ory of in­tel­li­gent agency

Richard_Ngo21 Mar 2025 1:39 UTC
97 points
46 comments13 min readLW link
(www.mindthefuture.info)

[Question] Any mis­takes in my un­der­stand­ing of Trans­form­ers?

Kallistos21 Mar 2025 0:34 UTC
3 points
7 comments1 min readLW link

A Cri­tique of “Utility”

Zero Contradictions20 Mar 2025 23:21 UTC
−2 points
10 comments2 min readLW link
(thewaywardaxolotl.blogspot.com)

In­ten­tion to Treat

Alicorn20 Mar 2025 20:01 UTC
201 points
6 comments2 min readLW link

An­thropic: Progress from our Fron­tier Red Team

UnofficialLinkpostBot20 Mar 2025 19:12 UTC
16 points
3 comments6 min readLW link
(www.anthropic.com)

Every­thing’s An Emer­gency

Bentham's Bulldog20 Mar 2025 17:12 UTC
18 points
0 comments2 min readLW link

Non-Con­sen­sual Con­sent: The Perfor­mance of Choice in a Co­er­cive World

Alex_Steiner20 Mar 2025 17:12 UTC
28 points
4 comments13 min readLW link

Minor in­ter­pretabil­ity ex­plo­ra­tion #4: Lay­erNorm and the learn­ing coefficient

Rareș Baron20 Mar 2025 16:18 UTC
4 points
0 comments1 min readLW link

[Question] How far along Metr’s law can AI start au­tomat­ing or helping with al­ign­ment re­search?

Christopher King20 Mar 2025 15:58 UTC
20 points
21 comments1 min readLW link

Hu­man alignment

Lucien20 Mar 2025 15:52 UTC
−16 points
2 comments1 min readLW link

[Question] Seek­ing: more Sci Fi micro reviews

Yair Halberstadt20 Mar 2025 14:31 UTC
7 points
0 comments1 min readLW link

AI #108: Straight Line on a Graph

Zvi20 Mar 2025 13:50 UTC
43 points
5 comments39 min readLW link
(thezvi.wordpress.com)

What is an al­ign­ment tax?

20 Mar 2025 13:06 UTC
5 points
0 comments1 min readLW link
(aisafety.info)

Longter­mist Im­pli­ca­tions of the Ex­is­tence Neu­tral­ity Hypothesis

Maxime Riché20 Mar 2025 12:20 UTC
3 points
2 comments21 min readLW link

You don’t have to be “into EA” to at­tend EAG(x) Conferences

gergogaspar20 Mar 2025 10:44 UTC
1 point
0 comments1 min readLW link

Defense Against The Su­per-Worms

viemccoy20 Mar 2025 7:24 UTC
24 points
1 comment2 min readLW link

So­cially Grace­ful Degradation

Screwtape20 Mar 2025 4:03 UTC
58 points
10 comments9 min readLW link

Ap­ply to MATS 8.0!

20 Mar 2025 2:17 UTC
64 points
5 comments4 min readLW link

Im­proved vi­su­al­iza­tions of METR Time Hori­zons pa­per.

LDJ19 Mar 2025 23:36 UTC
30 points
4 comments2 min readLW link

The case against “The case against AI al­ign­ment”

KvmanThinking19 Mar 2025 22:40 UTC
1 point
0 comments1 min readLW link

[Question] Su­per­in­tel­li­gence Strat­egy: A Prag­matic Path to… Doom?

Mr Beastly19 Mar 2025 22:30 UTC
8 points
0 comments3 min readLW link

SHIFT re­lies on to­ken-level fea­tures to de-bias Bias in Bios probes

Tim Hua19 Mar 2025 21:29 UTC
39 points
2 comments6 min readLW link

Janet must die

Shmi19 Mar 2025 20:35 UTC
12 points
3 comments2 min readLW link

[Question] Why am I get­ting down­voted on Less­wrong?

Oxidize19 Mar 2025 18:32 UTC
7 points
14 comments1 min readLW link

Fore­cast­ing AI Fu­tures Re­source Hub

Alvin Ånestrand19 Mar 2025 17:26 UTC
2 points
0 comments2 min readLW link
(forecastingaifutures.substack.com)

TBC epi­sode w Dave Kas­ten from Con­trol AI on AI Policy

Eneasz19 Mar 2025 17:09 UTC
8 points
0 comments1 min readLW link
(www.thebayesianconspiracy.com)

Pri­ori­tiz­ing threats for AI control

ryan_greenblatt19 Mar 2025 17:09 UTC
59 points
2 comments10 min readLW link