The Danger­ous Illu­sion of AI Deter­rence: Why MAIM Isn’t Rational

mc1softMar 22, 2025, 10:55 PM
3 points
0 comments2 min readLW link

Day­ton, Ohio, ACX Meetup

LunawarriorMar 22, 2025, 7:45 PM
1 point
0 comments1 min readLW link

[Repli­ca­tion] Cross­coder-based Stage-Wise Model Diffing

Mar 22, 2025, 6:35 PM
21 points
0 comments7 min readLW link

The Prin­ci­ple of Satis­fy­ing Foreknowledge

Randall ReamsMar 22, 2025, 6:20 PM
1 point
0 comments2 min readLW link

[Question] Ur­gency in the ITN framework

ShaïmanMar 22, 2025, 6:16 PM
0 points
2 comments1 min readLW link

Tran­shu­man­ism and AI: Toward Pros­per­ity or Ex­tinc­tion?

ShaïmanMar 22, 2025, 6:16 PM
10 points
2 comments6 min readLW link

Tied Cross­coders: Ex­plain­ing Chat Be­hav­ior from Base Model

Santiago AranguriMar 22, 2025, 6:07 PM
9 points
0 comments12 min readLW link

100+ con­crete pro­jects and open prob­lems in evals

Marius HobbhahnMar 22, 2025, 3:21 PM
74 points
1 comment1 min readLW link

Do mod­els say what they learn?

Mar 22, 2025, 3:19 PM
126 points
12 comments13 min readLW link

AGI Mo­ral­ity and Why It Is Un­likely to Emerge as a Fea­ture of Superintelligence

funnyfrancoMar 22, 2025, 12:06 PM
1 point
9 comments18 min readLW link

2025 Q3 Pivotal Re­search Fel­low­ship: Ap­pli­ca­tions Open

Tobias HMar 22, 2025, 10:54 AM
4 points
0 comments2 min readLW link

Good Re­search Takes are Not Suffi­cient for Good Strate­gic Takes

Neel NandaMar 22, 2025, 10:13 AM
292 points
28 comments4 min readLW link
(www.neelnanda.io)

Gram­mat­i­cal Roles and So­cial Roles: A Struc­tural Analogy

LucienMar 22, 2025, 7:44 AM
0 points
0 comments1 min readLW link

Legibility

lsusrMar 22, 2025, 6:54 AM
19 points
22 comments2 min readLW link

Why Were We Wrong About China and AI? A Case Study in Failed Rationality

thedudeabidesMar 22, 2025, 5:13 AM
31 points
45 comments1 min readLW link

A Short Di­a­tribe on Hid­den Asser­tions.

EggsMar 22, 2025, 3:14 AM
−9 points
2 comments3 min readLW link

Trans­former At­ten­tion’s High School Math Mistake

Max MaMar 22, 2025, 12:16 AM
−13 points
1 comment1 min readLW link

Mak­ing Sense of Pres­i­dent Trump’s An­nex­a­tion Obsession

AnnapurnaMar 21, 2025, 9:10 PM
−13 points
3 comments5 min readLW link
(jorgevelez.substack.com)

How I force LLMs to gen­er­ate cor­rect code

claudioMar 21, 2025, 2:40 PM
91 points
7 comments5 min readLW link

Prospects for Align­ment Au­toma­tion: In­ter­pretabil­ity Case Study

Mar 21, 2025, 2:05 PM
32 points
5 comments8 min readLW link

Epoch AI re­leased a GATE Sce­nario Explorer

Lee.aaoMar 21, 2025, 1:57 PM
10 points
0 comments1 min readLW link
(epoch.ai)

They Took MY Job?

ZviMar 21, 2025, 1:30 PM
37 points
4 comments9 min readLW link
(thezvi.wordpress.com)

Silly Time

jefftkMar 21, 2025, 12:30 PM
45 points
2 comments2 min readLW link
(www.jefftk.com)

Towards a scale-free the­ory of in­tel­li­gent agency

Richard_NgoMar 21, 2025, 1:39 AM
96 points
44 comments13 min readLW link
(www.mindthefuture.info)

[Question] Any mis­takes in my un­der­stand­ing of Trans­form­ers?

KallistosMar 21, 2025, 12:34 AM
3 points
7 comments1 min readLW link

A Cri­tique of “Utility”

Zero ContradictionsMar 20, 2025, 11:21 PM
−2 points
10 comments2 min readLW link
(thewaywardaxolotl.blogspot.com)

In­ten­tion to Treat

AlicornMar 20, 2025, 8:01 PM
195 points
5 comments2 min readLW link

An­thropic: Progress from our Fron­tier Red Team

UnofficialLinkpostBotMar 20, 2025, 7:12 PM
16 points
3 comments6 min readLW link
(www.anthropic.com)

Every­thing’s An Emer­gency

Bentham's BulldogMar 20, 2025, 5:12 PM
18 points
0 comments2 min readLW link

Non-Con­sen­sual Con­sent: The Perfor­mance of Choice in a Co­er­cive World

Alex_SteinerMar 20, 2025, 5:12 PM
27 points
4 comments13 min readLW link

Minor in­ter­pretabil­ity ex­plo­ra­tion #4: Lay­erNorm and the learn­ing coefficient

Rareș BaronMar 20, 2025, 4:18 PM
2 points
0 comments1 min readLW link

[Question] How far along Metr’s law can AI start au­tomat­ing or helping with al­ign­ment re­search?

Christopher KingMar 20, 2025, 3:58 PM
20 points
21 comments1 min readLW link

Hu­man alignment

LucienMar 20, 2025, 3:52 PM
−16 points
2 comments1 min readLW link

[Question] Seek­ing: more Sci Fi micro reviews

Yair HalberstadtMar 20, 2025, 2:31 PM
7 points
0 comments1 min readLW link

AI #108: Straight Line on a Graph

ZviMar 20, 2025, 1:50 PM
43 points
5 comments39 min readLW link
(thezvi.wordpress.com)

What is an al­ign­ment tax?

Mar 20, 2025, 1:06 PM
5 points
0 comments1 min readLW link
(aisafety.info)

Longter­mist Im­pli­ca­tions of the Ex­is­tence Neu­tral­ity Hypothesis

Maxime RichéMar 20, 2025, 12:20 PM
3 points
2 comments21 min readLW link

You don’t have to be “into EA” to at­tend EAG(x) Conferences

gergogasparMar 20, 2025, 10:44 AM
1 point
0 comments1 min readLW link

Defense Against The Su­per-Worms

viemccoyMar 20, 2025, 7:24 AM
23 points
1 comment2 min readLW link

So­cially Grace­ful Degradation

ScrewtapeMar 20, 2025, 4:03 AM
57 points
9 comments9 min readLW link

Ap­ply to MATS 8.0!

Mar 20, 2025, 2:17 AM
63 points
5 comments4 min readLW link

Im­proved vi­su­al­iza­tions of METR Time Hori­zons pa­per.

LDJMar 19, 2025, 11:36 PM
20 points
4 comments2 min readLW link

Is CCP au­thor­i­tar­i­anism good for build­ing safe AI?

HrussMar 19, 2025, 11:13 PM
1 point
0 comments1 min readLW link

The case against “The case against AI al­ign­ment”

KvmanThinking19 Mar 2025 22:40 UTC
2 points
0 comments1 min readLW link

[Question] Su­per­in­tel­li­gence Strat­egy: A Prag­matic Path to… Doom?

Mr Beastly19 Mar 2025 22:30 UTC
6 points
0 comments3 min readLW link

SHIFT re­lies on to­ken-level fea­tures to de-bias Bias in Bios probes

Tim Hua19 Mar 2025 21:29 UTC
39 points
2 comments6 min readLW link

Janet must die

Shmi19 Mar 2025 20:35 UTC
12 points
3 comments2 min readLW link

[Question] Why am I get­ting down­voted on Less­wrong?

Oxidize19 Mar 2025 18:32 UTC
7 points
14 comments1 min readLW link

Fore­cast­ing AI Fu­tures Re­source Hub

Alvin Ånestrand19 Mar 2025 17:26 UTC
2 points
0 comments2 min readLW link
(forecastingaifutures.substack.com)

TBC epi­sode w Dave Kas­ten from Con­trol AI on AI Policy

Eneasz19 Mar 2025 17:09 UTC
8 points
0 comments1 min readLW link
(www.thebayesianconspiracy.com)