An overview of con­trol measures

ryan_greenblatt24 Mar 2025 23:16 UTC
40 points
2 comments26 min readLW link

Pop­ulec­tomy.ai

YonatanK24 Mar 2025 22:06 UTC
7 points
2 comments2 min readLW link

Policy for LLM Writ­ing on LessWrong

jimrandomh24 Mar 2025 21:41 UTC
339 points
71 comments2 min readLW link

An­a­lyz­ing long agent tran­scripts (Do­cent)

jsteinhardt24 Mar 2025 20:49 UTC
41 points
2 comments1 min readLW link
(bounded-regret.ghost.io)

Con­ver­gence 2024 Im­pact Review

David_Kristoffersson24 Mar 2025 20:28 UTC
13 points
0 comments14 min readLW link

The Best Lec­ture Series on Every Subject

Rauno Arike24 Mar 2025 20:03 UTC
13 points
1 comment2 min readLW link

Re­cent AI model progress feels mostly like bullshit

lc24 Mar 2025 19:28 UTC
357 points
85 comments8 min readLW link
(zeropath.com)

Learn­ing about AI reg­u­la­tion should be easier

mfg24 Mar 2025 19:22 UTC
12 points
0 comments2 min readLW link

Speaker For AIs Soul

Max Abecassis24 Mar 2025 19:20 UTC
−3 points
0 comments20 min readLW link

Ad­vanced AI Sys­tems Will Not Fol­low His­tor­i­cal Tech­nolog­i­cal Pat­terns and Will Not Suffer the Misat­tri­bu­tion of Pro­duc­tivity Gains

Max Abecassis24 Mar 2025 19:20 UTC
8 points
0 comments10 min readLW link

AI “Deep Re­search” Tools Reviewed

sarahconstantin24 Mar 2025 18:40 UTC
53 points
5 comments5 min readLW link
(sarahconstantin.substack.com)

Notes on coun­ter­mea­sures for ex­plo­ra­tion hack­ing (aka sand­bag­ging)

ryan_greenblatt24 Mar 2025 18:39 UTC
54 points
6 comments8 min readLW link

Sub­ver­sion Strat­egy Eval: Can lan­guage mod­els state­lessly strate­gize to sub­vert con­trol pro­to­cols?

24 Mar 2025 17:55 UTC
35 points
0 comments8 min readLW link

Straight­for­ward Steps to Marginally Im­prove Odds of Whole Brain Emulation

Dom Polsinelli24 Mar 2025 17:14 UTC
8 points
20 comments6 min readLW link

From Loops to Klein Bot­tles: Un­cov­er­ing Hid­den Topol­ogy in High Di­men­sional Data

Gunnar Carlsson24 Mar 2025 17:09 UTC
15 points
0 comments9 min readLW link

Will Je­sus Christ re­turn in an elec­tion year?

Eric Neyman24 Mar 2025 16:50 UTC
411 points
59 comments4 min readLW link
(ericneyman.wordpress.com)

Sen­tinel’s Global Risks Weekly Roundup #12/​2025: Famine in Gaza, H7N9 out­break, US geopoli­ti­cal lead­er­ship weak­en­ing.

NunoSempere24 Mar 2025 16:46 UTC
13 points
0 comments7 min readLW link
(blog.sentinel-team.org)

deleted

funnyfranco24 Mar 2025 15:03 UTC
−2 points
0 comments1 min readLW link

Deli­cious Boy Slop—Bor­ing Diet, Effortless Weightloss

sapphire24 Mar 2025 15:01 UTC
17 points
8 comments4 min readLW link
(sapphstar.substack.com)

Hong Kong ACX Spring Meetup 2025

fbreton24 Mar 2025 14:27 UTC
1 point
0 comments1 min readLW link

More on Var­i­ous AI Ac­tion Plans

Zvi24 Mar 2025 13:10 UTC
32 points
0 comments11 min readLW link
(thezvi.wordpress.com)

Emer­gent scal­ing effects on the func­tional hi­er­ar­chies within LLMs

Paul Bogdan24 Mar 2025 13:03 UTC
8 points
0 comments9 min readLW link

Recom­mender Align­ment for Lock-In Risk

alamerton24 Mar 2025 12:56 UTC
8 points
0 comments7 min readLW link

Edge Cases in AI Alignment

Florian_Dietz24 Mar 2025 9:27 UTC
19 points
3 comments4 min readLW link

Towards an un­der­stand­ing of the Chi­nese AI scene

Mitchell_Porter24 Mar 2025 9:10 UTC
21 points
0 comments2 min readLW link

Selec­tive mod­u­lar­ity: a re­search agenda

24 Mar 2025 4:12 UTC
66 points
2 comments24 min readLW link

Pic­tures for 2024

jefftk24 Mar 2025 2:40 UTC
9 points
0 comments1 min readLW link
(www.jefftk.com)

Notes on han­dling non-con­cen­trated failures with AI con­trol: high level meth­ods and differ­ent regimes

ryan_greenblatt24 Mar 2025 1:00 UTC
23 points
3 comments16 min readLW link

We need (a lot) more rogue agent honeypots

Ozyrus23 Mar 2025 22:24 UTC
37 points
12 comments4 min readLW link

Prob­a­bil­ity The­ory Fun­da­men­tals 102: Source of the Sam­ple Space

Ape in the coat23 Mar 2025 17:23 UTC
12 points
17 comments7 min readLW link

How to miti­gate sandbagging

Teun van der Weij23 Mar 2025 17:19 UTC
30 points
0 comments8 min readLW link

Tab­ula Bio: to­wards a fu­ture free of dis­ease (& look­ing for col­lab­o­ra­tors)

mpoon23 Mar 2025 16:30 UTC
44 points
15 comments2 min readLW link

Solv­ing willpower seems eas­ier than solv­ing aging

Yair Halberstadt23 Mar 2025 15:25 UTC
61 points
28 comments1 min readLW link

[Question] Should I fundraise for open source search en­g­ine?

samuelshadrach23 Mar 2025 13:04 UTC
−11 points
2 comments1 min readLW link

Pri­va­teers Re­born: Cy­ber Let­ters of Marque

arealsociety23 Mar 2025 3:39 UTC
5 points
2 comments1 min readLW link
(arealsociety.substack.com)

Be­ware nerfing AI with opinionated hu­man-cen­tric sensors

Haotian23 Mar 2025 1:09 UTC
1 point
0 comments3 min readLW link

Refram­ing AI Safety as a Nev­erend­ing In­sti­tu­tional Challenge

scasper23 Mar 2025 0:13 UTC
53 points
12 comments5 min readLW link

The Danger­ous Illu­sion of AI Deter­rence: Why MAIM Isn’t Rational

Robert Shuler22 Mar 2025 22:55 UTC
3 points
0 comments2 min readLW link

Day­ton, Ohio, ACX Meetup

Lunawarrior22 Mar 2025 19:45 UTC
1 point
0 comments1 min readLW link

[Repli­ca­tion] Cross­coder-based Stage-Wise Model Diffing

22 Mar 2025 18:35 UTC
24 points
0 comments7 min readLW link

The Prin­ci­ple of Satis­fy­ing Foreknowledge

Randall Reams22 Mar 2025 18:20 UTC
1 point
0 comments2 min readLW link

[Question] Ur­gency in the ITN framework

Shaïman22 Mar 2025 18:16 UTC
0 points
2 comments1 min readLW link

Tran­shu­man­ism and AI: Toward Pros­per­ity or Ex­tinc­tion?

Shaïman22 Mar 2025 18:16 UTC
11 points
2 comments6 min readLW link

Tied Cross­coders: Ex­plain­ing Chat Be­hav­ior from Base Model

Santiago Aranguri22 Mar 2025 18:07 UTC
9 points
0 comments12 min readLW link

100+ con­crete pro­jects and open prob­lems in evals

Marius Hobbhahn22 Mar 2025 15:21 UTC
75 points
1 comment1 min readLW link

Do mod­els say what they learn?

22 Mar 2025 15:19 UTC
126 points
12 comments13 min readLW link

deleted

funnyfranco22 Mar 2025 12:06 UTC
2 points
8 comments1 min readLW link

2025 Q3 Pivotal Re­search Fel­low­ship: Ap­pli­ca­tions Open

Tobias H22 Mar 2025 10:54 UTC
4 points
0 comments2 min readLW link

Good Re­search Takes are Not Suffi­cient for Good Strate­gic Takes

Neel Nanda22 Mar 2025 10:13 UTC
294 points
28 comments4 min readLW link
(www.neelnanda.io)

Gram­mat­i­cal Roles and So­cial Roles: A Struc­tural Analogy

Lucien22 Mar 2025 7:44 UTC
0 points
0 comments1 min readLW link