We need (a lot) more rogue agent honeypots

OzyrusMar 23, 2025, 10:24 PM
37 points
12 comments4 min readLW link

What’s the word for the amount of ex­per­tise that I, an ex­pe­rienced ther­apy pa­tient and gen­er­ally ed­u­cated per­son, have on psy­chol­ogy top­ics?

d_el_ezMar 23, 2025, 5:38 PM
4 points
0 comments3 min readLW link

Prob­a­bil­ity The­ory Fun­da­men­tals 102: Source of the Sam­ple Space

Ape in the coatMar 23, 2025, 5:23 PM
12 points
17 comments7 min readLW link

How to miti­gate sandbagging

Teun van der WeijMar 23, 2025, 5:19 PM
29 points
0 comments8 min readLW link

Tab­ula Bio: to­wards a fu­ture free of dis­ease (& look­ing for col­lab­o­ra­tors)

mpoonMar 23, 2025, 4:30 PM
44 points
15 comments2 min readLW link

Solv­ing willpower seems eas­ier than solv­ing aging

Yair HalberstadtMar 23, 2025, 3:25 PM
61 points
28 comments1 min readLW link

[Question] Should I fundraise for open source search en­g­ine?

samuelshadrachMar 23, 2025, 1:04 PM
−11 points
2 comments1 min readLW link

Pri­va­teers Re­born: Cy­ber Let­ters of Marque

arealsocietyMar 23, 2025, 3:39 AM
5 points
2 comments1 min readLW link
(arealsociety.substack.com)

Be­ware nerfing AI with opinionated hu­man-cen­tric sensors

HaotianMar 23, 2025, 1:09 AM
1 point
0 comments3 min readLW link

Refram­ing AI Safety as a Nev­erend­ing In­sti­tu­tional Challenge

scasperMar 23, 2025, 12:13 AM
52 points
12 comments5 min readLW link

The Danger­ous Illu­sion of AI Deter­rence: Why MAIM Isn’t Rational

mc1softMar 22, 2025, 10:55 PM
3 points
0 comments2 min readLW link

Day­ton, Ohio, ACX Meetup

LunawarriorMar 22, 2025, 7:45 PM
1 point
0 comments1 min readLW link

[Repli­ca­tion] Cross­coder-based Stage-Wise Model Diffing

Mar 22, 2025, 6:35 PM
19 points
0 comments7 min readLW link

The Prin­ci­ple of Satis­fy­ing Foreknowledge

Randall ReamsMar 22, 2025, 6:20 PM
1 point
0 comments2 min readLW link

[Question] Ur­gency in the ITN framework

ShaïmanMar 22, 2025, 6:16 PM
0 points
2 comments1 min readLW link

Tran­shu­man­ism and AI: Toward Pros­per­ity or Ex­tinc­tion?

ShaïmanMar 22, 2025, 6:16 PM
11 points
2 comments6 min readLW link

Tied Cross­coders: Ex­plain­ing Chat Be­hav­ior from Base Model

Santiago AranguriMar 22, 2025, 6:07 PM
9 points
0 comments12 min readLW link

100+ con­crete pro­jects and open prob­lems in evals

Marius HobbhahnMar 22, 2025, 3:21 PM
74 points
1 comment1 min readLW link

Do mod­els say what they learn?

Mar 22, 2025, 3:19 PM
126 points
12 comments13 min readLW link

AGI Mo­ral­ity and Why It Is Un­likely to Emerge as a Fea­ture of Superintelligence

funnyfrancoMar 22, 2025, 12:06 PM
2 points
9 comments18 min readLW link

2025 Q3 Pivotal Re­search Fel­low­ship: Ap­pli­ca­tions Open

Tobias HMar 22, 2025, 10:54 AM
4 points
0 comments2 min readLW link

Good Re­search Takes are Not Suffi­cient for Good Strate­gic Takes

Neel NandaMar 22, 2025, 10:13 AM
292 points
28 comments4 min readLW link
(www.neelnanda.io)

Gram­mat­i­cal Roles and So­cial Roles: A Struc­tural Analogy

LucienMar 22, 2025, 7:44 AM
0 points
0 comments1 min readLW link

Legibility

lsusrMar 22, 2025, 6:54 AM
19 points
22 comments2 min readLW link

Why Were We Wrong About China and AI? A Case Study in Failed Rationality

[deleted-by-moderator]Mar 22, 2025, 5:13 AM
31 points
47 comments1 min readLW link

A Short Di­a­tribe on Hid­den Asser­tions.

EggsMar 22, 2025, 3:14 AM
−9 points
2 comments3 min readLW link

Trans­former At­ten­tion’s High School Math Mistake

Max MaMar 22, 2025, 12:16 AM
−13 points
1 comment1 min readLW link

Mak­ing Sense of Pres­i­dent Trump’s An­nex­a­tion Obsession

AnnapurnaMar 21, 2025, 9:10 PM
−13 points
3 comments5 min readLW link
(jorgevelez.substack.com)

How I force LLMs to gen­er­ate cor­rect code

claudioMar 21, 2025, 2:40 PM
91 points
7 comments5 min readLW link

Prospects for Align­ment Au­toma­tion: In­ter­pretabil­ity Case Study

Mar 21, 2025, 2:05 PM
32 points
5 comments8 min readLW link

Epoch AI re­leased a GATE Sce­nario Explorer

Lee.aaoMar 21, 2025, 1:57 PM
10 points
0 comments1 min readLW link
(epoch.ai)

They Took MY Job?

ZviMar 21, 2025, 1:30 PM
37 points
4 comments9 min readLW link
(thezvi.wordpress.com)

Silly Time

jefftkMar 21, 2025, 12:30 PM
45 points
2 comments2 min readLW link
(www.jefftk.com)

Towards a scale-free the­ory of in­tel­li­gent agency

Richard_NgoMar 21, 2025, 1:39 AM
96 points
44 comments13 min readLW link
(www.mindthefuture.info)

[Question] Any mis­takes in my un­der­stand­ing of Trans­form­ers?

KallistosMar 21, 2025, 12:34 AM
3 points
7 comments1 min readLW link

A Cri­tique of “Utility”

Zero ContradictionsMar 20, 2025, 11:21 PM
−2 points
10 comments2 min readLW link
(thewaywardaxolotl.blogspot.com)

In­ten­tion to Treat

AlicornMar 20, 2025, 8:01 PM
195 points
5 comments2 min readLW link

An­thropic: Progress from our Fron­tier Red Team

UnofficialLinkpostBotMar 20, 2025, 7:12 PM
16 points
3 comments6 min readLW link
(www.anthropic.com)

Every­thing’s An Emer­gency

Bentham's BulldogMar 20, 2025, 5:12 PM
18 points
0 comments2 min readLW link

Non-Con­sen­sual Con­sent: The Perfor­mance of Choice in a Co­er­cive World

Alex_SteinerMar 20, 2025, 5:12 PM
27 points
4 comments13 min readLW link

Minor in­ter­pretabil­ity ex­plo­ra­tion #4: Lay­erNorm and the learn­ing coefficient

Rareș BaronMar 20, 2025, 4:18 PM
2 points
0 comments1 min readLW link

[Question] How far along Metr’s law can AI start au­tomat­ing or helping with al­ign­ment re­search?

Christopher KingMar 20, 2025, 3:58 PM
20 points
21 comments1 min readLW link

Hu­man alignment

LucienMar 20, 2025, 3:52 PM
−16 points
2 comments1 min readLW link

[Question] Seek­ing: more Sci Fi micro reviews

Yair HalberstadtMar 20, 2025, 2:31 PM
7 points
0 comments1 min readLW link

AI #108: Straight Line on a Graph

ZviMar 20, 2025, 1:50 PM
43 points
5 comments39 min readLW link
(thezvi.wordpress.com)

What is an al­ign­ment tax?

Mar 20, 2025, 1:06 PM
5 points
0 comments1 min readLW link
(aisafety.info)

Longter­mist Im­pli­ca­tions of the Ex­is­tence Neu­tral­ity Hypothesis

Maxime RichéMar 20, 2025, 12:20 PM
3 points
2 comments21 min readLW link

You don’t have to be “into EA” to at­tend EAG(x) Conferences

gergogasparMar 20, 2025, 10:44 AM
1 point
0 comments1 min readLW link

Defense Against The Su­per-Worms

viemccoyMar 20, 2025, 7:24 AM
23 points
1 comment2 min readLW link

So­cially Grace­ful Degradation

ScrewtapeMar 20, 2025, 4:03 AM
58 points
10 comments9 min readLW link