Me­mory De­cod­ing Jour­nal Club: Mo­tor learn­ing se­lec­tively strength­ens cor­ti­cal and stri­atal synapses of mo­tor en­gram neu­rons

Devin Ward1 May 2025 23:52 UTC
1 point
0 comments1 min readLW link

My Re­search Pro­cess: Un­der­stand­ing and Cul­ti­vat­ing Re­search Taste

Neel Nanda1 May 2025 23:08 UTC
30 points
2 comments9 min readLW link

AI Gover­nance to Avoid Ex­tinc­tion: The Strate­gic Land­scape and Ac­tion­able Re­search Questions

1 May 2025 22:46 UTC
109 points
7 comments8 min readLW link
(techgov.intelligence.org)

How to spec­ify an al­ign­ment target

Richard Juggins1 May 2025 21:11 UTC
14 points
2 comments12 min readLW link

Ob­sta­cles in ARC’s agenda: Mechanis­tic Ano­maly Detection

David Matolcsi1 May 2025 20:51 UTC
43 points
1 comment11 min readLW link

AI-Gen­er­ated GitHub repo back­dated with junk then filled with my sys­tems work. Has any­one seen this be­fore?

rgunther1 May 2025 20:14 UTC
7 points
1 comment1 min readLW link

What is Inad­e­quate about Bayesi­anism for AI Align­ment: Mo­ti­vat­ing In­fra-Bayesianism

Brittany Gelb1 May 2025 19:06 UTC
54 points
1 comment7 min readLW link

Can LLMs Si­mu­late In­ter­nal Eval­u­a­tion? A Case Study in Self-Gen­er­ated Recommendations

The Neutral Mind1 May 2025 19:04 UTC
4 points
0 comments2 min readLW link

Su­per­hu­man Coders in AI 2027 - Not So Fast

1 May 2025 18:56 UTC
67 points
0 comments5 min readLW link

AI #114: Liars, Sy­co­phants and Cheaters

Zvi1 May 2025 14:00 UTC
40 points
6 comments63 min readLW link
(thezvi.wordpress.com)

Slow­down After 2028: Com­pute, RLVR Uncer­tainty, MoE Data Wall

Vladimir_Nesov1 May 2025 13:54 UTC
200 points
35 comments5 min readLW link

An­thro­po­mor­phiz­ing AI might be good, ac­tu­ally

Seth Herd1 May 2025 13:50 UTC
35 points
6 comments3 min readLW link

Dont fo­cus on up­dat­ing P doom

Algon1 May 2025 11:10 UTC
7 points
3 comments2 min readLW link

Pri­ori­tiz­ing Work

jefftk1 May 2025 2:00 UTC
109 points
11 comments1 min readLW link
(www.jefftk.com)

Don’t rely on a “race to the top”

sjadler1 May 2025 0:33 UTC
10 points
0 comments1 min readLW link

Meta-Tech­ni­cal­ities: Safe­guard­ing Values in For­mal Systems

LTM30 Apr 2025 23:43 UTC
2 points
0 comments3 min readLW link
(routecause.substack.com)

Ob­sta­cles in ARC’s agenda: Find­ing explanations

David Matolcsi30 Apr 2025 23:03 UTC
128 points
10 comments17 min readLW link

GPT-4o Re­sponds to Nega­tive Feedback

Zvi30 Apr 2025 20:20 UTC
45 points
2 comments18 min readLW link
(thezvi.wordpress.com)

State of play of AI progress (and re­lated brakes on an in­tel­li­gence ex­plo­sion) [Linkpost]

Noosphere8930 Apr 2025 19:58 UTC
7 points
0 comments5 min readLW link
(www.interconnects.ai)

Don’t ac­cuse your in­ter­locu­tor of be­ing in­suffi­ciently truth-seeking

TFD30 Apr 2025 19:38 UTC
30 points
15 comments2 min readLW link
(www.thefloatingdroid.com)

How can we solve diffuse threats like re­search sab­o­tage with AI con­trol?

Vivek Hebbar30 Apr 2025 19:23 UTC
52 points
1 comment8 min readLW link

[Question] Can Nar­row­ing One’s Refer­ence Class Un­der­mine the Dooms­day Ar­gu­ment?

Iannoose n.30 Apr 2025 18:24 UTC
2 points
1 comment1 min readLW link

[Question] Does there ex­ist an in­ter­ac­tive rea­son­ing map tool that lets users vi­su­ally lay out claims, as­sign prob­a­bil­ities and con­fi­dence lev­els, and dy­nam­i­cally ad­just their be­liefs based on weighted in­fluences be­tween con­nected as­ser­tions?

Zack Friedman30 Apr 2025 18:22 UTC
5 points
4 comments1 min readLW link

Distill­ing the In­ter­nal Model Prin­ci­ple part II

JoseFaustino30 Apr 2025 17:56 UTC
15 points
0 comments19 min readLW link

Re­search Pri­ori­ties for Hard­ware-En­abled Mechanisms (HEMs)

aog30 Apr 2025 17:43 UTC
18 points
3 comments15 min readLW link
(www.longview.org)

Video and tran­script of talk on au­tomat­ing al­ign­ment research

Joe Carlsmith30 Apr 2025 17:43 UTC
27 points
0 comments24 min readLW link
(joecarlsmith.com)

Can we safely au­to­mate al­ign­ment re­search?

Joe Carlsmith30 Apr 2025 17:37 UTC
63 points
30 comments48 min readLW link
(joecarlsmith.com)

In­ves­ti­gat­ing task-spe­cific prompts and sparse au­toen­coders for ac­ti­va­tion monitoring

Henk Tillman30 Apr 2025 17:09 UTC
23 points
0 comments1 min readLW link
(arxiv.org)

Euro­pean Links (30.04.25)

Martin Sustrik30 Apr 2025 15:40 UTC
15 points
1 comment8 min readLW link
(250bpm.substack.com)

Scal­ing Laws for Scal­able Oversight

30 Apr 2025 12:13 UTC
37 points
1 comment9 min readLW link

Early Chi­nese Lan­guage Me­dia Cover­age of the AI 2027 Re­port: A Qual­i­ta­tive Analysis

30 Apr 2025 11:06 UTC
217 points
11 comments11 min readLW link

[Paper] Au­to­mated Fea­ture La­bel­ing with To­ken-Space Gra­di­ent Descent

Wuschel Schulz30 Apr 2025 10:22 UTC
4 points
0 comments4 min readLW link

A sin­gle prin­ci­ple re­lated to many Align­ment sub­prob­lems?

Q Home30 Apr 2025 9:49 UTC
43 points
34 comments17 min readLW link

What if Brain Com­puter In­ter­faces went ex­po­nen­tial?

Stephen Martin30 Apr 2025 5:07 UTC
−1 points
0 comments12 min readLW link

In­ter­pret­ing the METR Time Hori­zons Post

snewman30 Apr 2025 3:03 UTC
70 points
13 comments10 min readLW link
(amistrongeryet.substack.com)

Should we ex­pect the fu­ture to be good?

Neil Crawford30 Apr 2025 0:36 UTC
15 points
0 comments14 min readLW link

Judg­ing types of con­se­quen­tial­ism by in­fluence and normativity

Cole Wyeth29 Apr 2025 23:25 UTC
19 points
0 comments2 min readLW link

Band­width Rules Every­thing Around Me: Oliver Habryka on OpenPhil and GoodVentures

Elizabeth29 Apr 2025 20:40 UTC
81 points
15 comments1 min readLW link
(acesounderglass.com)

The Grand En­cy­clo­pe­dia of Epony­mous Laws

rogersbacon29 Apr 2025 19:30 UTC
29 points
8 comments16 min readLW link
(www.secretorum.life)

Mis­rep­re­sen­ta­tion as a Bar­rier for In­terp (Part I)

29 Apr 2025 17:07 UTC
113 points
12 comments7 min readLW link

AISN #53: An Open Let­ter At­tempts to Block OpenAI Restructuring

29 Apr 2025 16:13 UTC
7 points
0 comments4 min readLW link

What could Alphafold 4 look like?

Abhishaike Mahajan29 Apr 2025 15:45 UTC
8 points
0 comments1 min readLW link

Sealed Com­pu­ta­tion: Towards Low-Fric­tion Proof of Locality

Paul Bricman29 Apr 2025 15:26 UTC
4 points
0 comments10 min readLW link
(noemaresearch.com)

Dat­ing Roundup #4: An App for That

Zvi29 Apr 2025 13:10 UTC
18 points
5 comments16 min readLW link
(thezvi.wordpress.com)

Talk on let­ters to AI (Lon­don)

ukc1001429 Apr 2025 9:50 UTC
3 points
0 comments1 min readLW link

Me­mory De­cod­ing Jour­nal Club: “Mo­tor learn­ing se­lec­tively strength­ens cor­ti­cal and stri­atal synapses of mo­tor en­gram neu­rons”

Devin Ward29 Apr 2025 2:26 UTC
1 point
0 comments1 min readLW link

D&D.Sci Tax Day: Ad­ven­tur­ers and Assess­ments Eval­u­a­tion & Ruleset

aphyer29 Apr 2025 2:00 UTC
28 points
10 comments5 min readLW link

How to Build a Third Place on Focusmate

Parker Conley28 Apr 2025 23:46 UTC
97 points
10 comments5 min readLW link
(parconley.com)

Meth­ods of defense against AGI manipulation

MarkelKori28 Apr 2025 21:03 UTC
3 points
0 comments2 min readLW link

China’s Pe­ti­tion Sys­tem: It Looks Like Democ­racy — But It Isn’t

Hu Yichao28 Apr 2025 20:56 UTC
0 points
4 comments2 min readLW link