Mak­ing Sense of Pres­i­dent Trump’s An­nex­a­tion Obsession

AnnapurnaMar 21, 2025, 9:10 PM
−13 points
3 comments5 min readLW link
(jorgevelez.substack.com)

How I force LLMs to gen­er­ate cor­rect code

claudioMar 21, 2025, 2:40 PM
91 points
7 comments5 min readLW link

Prospects for Align­ment Au­toma­tion: In­ter­pretabil­ity Case Study

Mar 21, 2025, 2:05 PM
32 points
5 comments8 min readLW link

Epoch AI re­leased a GATE Sce­nario Explorer

Lee.aaoMar 21, 2025, 1:57 PM
10 points
0 comments1 min readLW link
(epoch.ai)

They Took MY Job?

ZviMar 21, 2025, 1:30 PM
37 points
4 comments9 min readLW link
(thezvi.wordpress.com)

Silly Time

jefftkMar 21, 2025, 12:30 PM
45 points
2 comments2 min readLW link
(www.jefftk.com)

Towards a scale-free the­ory of in­tel­li­gent agency

Richard_NgoMar 21, 2025, 1:39 AM
96 points
44 comments13 min readLW link
(www.mindthefuture.info)

[Question] Any mis­takes in my un­der­stand­ing of Trans­form­ers?

KallistosMar 21, 2025, 12:34 AM
3 points
7 comments1 min readLW link

A Cri­tique of “Utility”

Zero ContradictionsMar 20, 2025, 11:21 PM
−2 points
10 comments2 min readLW link
(thewaywardaxolotl.blogspot.com)

In­ten­tion to Treat

AlicornMar 20, 2025, 8:01 PM
197 points
5 comments2 min readLW link

An­thropic: Progress from our Fron­tier Red Team

UnofficialLinkpostBotMar 20, 2025, 7:12 PM
16 points
3 comments6 min readLW link
(www.anthropic.com)

Every­thing’s An Emer­gency

Bentham's BulldogMar 20, 2025, 5:12 PM
18 points
0 comments2 min readLW link

Non-Con­sen­sual Con­sent: The Perfor­mance of Choice in a Co­er­cive World

Alex_SteinerMar 20, 2025, 5:12 PM
27 points
4 comments13 min readLW link

Minor in­ter­pretabil­ity ex­plo­ra­tion #4: Lay­erNorm and the learn­ing coefficient

Rareș BaronMar 20, 2025, 4:18 PM
2 points
0 comments1 min readLW link

[Question] How far along Metr’s law can AI start au­tomat­ing or helping with al­ign­ment re­search?

Christopher KingMar 20, 2025, 3:58 PM
20 points
21 comments1 min readLW link

Hu­man alignment

LucienMar 20, 2025, 3:52 PM
−16 points
2 comments1 min readLW link

[Question] Seek­ing: more Sci Fi micro reviews

Yair HalberstadtMar 20, 2025, 2:31 PM
7 points
0 comments1 min readLW link

AI #108: Straight Line on a Graph

ZviMar 20, 2025, 1:50 PM
43 points
5 comments39 min readLW link
(thezvi.wordpress.com)

What is an al­ign­ment tax?

Mar 20, 2025, 1:06 PM
5 points
0 comments1 min readLW link
(aisafety.info)

Longter­mist Im­pli­ca­tions of the Ex­is­tence Neu­tral­ity Hypothesis

Maxime RichéMar 20, 2025, 12:20 PM
3 points
2 comments21 min readLW link

You don’t have to be “into EA” to at­tend EAG(x) Conferences

gergogasparMar 20, 2025, 10:44 AM
1 point
0 comments1 min readLW link

Defense Against The Su­per-Worms

viemccoyMar 20, 2025, 7:24 AM
23 points
1 comment2 min readLW link

So­cially Grace­ful Degradation

ScrewtapeMar 20, 2025, 4:03 AM
58 points
10 comments9 min readLW link

Ap­ply to MATS 8.0!

Mar 20, 2025, 2:17 AM
63 points
5 comments4 min readLW link

Im­proved vi­su­al­iza­tions of METR Time Hori­zons pa­per.

LDJMar 19, 2025, 11:36 PM
20 points
4 comments2 min readLW link

Is CCP au­thor­i­tar­i­anism good for build­ing safe AI?

HrussMar 19, 2025, 11:13 PM
1 point
0 comments1 min readLW link

The case against “The case against AI al­ign­ment”

KvmanThinkingMar 19, 2025, 10:40 PM
2 points
0 comments1 min readLW link

[Question] Su­per­in­tel­li­gence Strat­egy: A Prag­matic Path to… Doom?

Mr BeastlyMar 19, 2025, 10:30 PM
6 points
0 comments3 min readLW link

SHIFT re­lies on to­ken-level fea­tures to de-bias Bias in Bios probes

Tim HuaMar 19, 2025, 9:29 PM
39 points
2 comments6 min readLW link

Janet must die

ShmiMar 19, 2025, 8:35 PM
12 points
3 comments2 min readLW link

[Question] Why am I get­ting down­voted on Less­wrong?

OxidizeMar 19, 2025, 6:32 PM
7 points
14 comments1 min readLW link

Fore­cast­ing AI Fu­tures Re­source Hub

Alvin ÅnestrandMar 19, 2025, 5:26 PM
2 points
0 comments2 min readLW link
(forecastingaifutures.substack.com)

TBC epi­sode w Dave Kas­ten from Con­trol AI on AI Policy

EneaszMar 19, 2025, 5:09 PM
8 points
0 comments1 min readLW link
(www.thebayesianconspiracy.com)

Pri­ori­tiz­ing threats for AI control

ryan_greenblattMar 19, 2025, 5:09 PM
59 points
2 comments10 min readLW link

The Illu­sion of Trans­parency as a Trust-Build­ing Mechanism

Priyanka BharadwajMar 19, 2025, 5:09 PM
2 points
0 comments1 min readLW link

How Do We Govern AI Well?

kaimeMar 19, 2025, 5:08 PM
2 points
0 comments25 min readLW link

METR: Mea­sur­ing AI Abil­ity to Com­plete Long Tasks

Zach Stein-PerlmanMar 19, 2025, 4:00 PM
241 points
106 comments5 min readLW link
(metr.org)

Why I think AI will go poorly for humanity

Alek WestoverMar 19, 2025, 3:52 PM
14 points
0 comments30 min readLW link

The prin­ci­ple of ge­nomic liberty

TsviBTMar 19, 2025, 2:27 PM
76 points
51 comments17 min readLW link

Go­ing Nova

ZviMar 19, 2025, 1:30 PM
64 points
14 comments15 min readLW link
(thezvi.wordpress.com)

Equa­tions Mean Things

abstractapplicMar 19, 2025, 8:16 AM
46 points
10 comments3 min readLW link

Elite Co­or­di­na­tion via the Con­sen­sus of Power

Richard_NgoMar 19, 2025, 6:56 AM
92 points
15 comments12 min readLW link
(www.mindthefuture.info)

What I am work­ing on right now and why: rep­re­sen­ta­tion en­g­ineer­ing edition

Lukasz G BartoszczeMar 18, 2025, 10:37 PM
3 points
0 comments3 min readLW link

Boots the­ory and Sy­bil Ramkin

philhMar 18, 2025, 10:10 PM
37 points
17 comments11 min readLW link
(reasonableapproximation.net)

Sch­midt Sciences Tech­ni­cal AI Safety RFP on In­fer­ence-Time Com­pute – Dead­line: April 30

Ryan GajarawalaMar 18, 2025, 6:05 PM
18 points
0 comments2 min readLW link
(www.schmidtsciences.org)

PRISM: Per­spec­tive Rea­son­ing for In­te­grated Syn­the­sis and Me­di­a­tion (In­ter­ac­tive Demo)

Anthony DiamondMar 18, 2025, 6:03 PM
10 points
2 comments1 min readLW link

Sub­space Rer­out­ing: Us­ing Mechanis­tic In­ter­pretabil­ity to Craft Ad­ver­sar­ial At­tacks against Large Lan­guage Models

Le magicien quantiqueMar 18, 2025, 5:55 PM
6 points
1 comment10 min readLW link

Progress links and short notes, 2025-03-18

jasoncrawfordMar 18, 2025, 5:14 PM
8 points
0 comments3 min readLW link
(newsletter.rootsofprogress.org)

The Con­ver­gent Path to the Stars

Maxime RichéMar 18, 2025, 5:09 PM
6 points
0 comments20 min readLW link

Sapir-Whorf Ego Death

Jonathan MoregårdMar 18, 2025, 4:57 PM
8 points
7 comments2 min readLW link
(honestliving.substack.com)