“Su­per­hu­man” Isn’t Well Specified

JustisMillsMay 3, 2025, 11:42 PM
32 points
9 comments3 min readLW link
(justismills.substack.com)

Nav­i­gat­ing burnout

gwMay 3, 2025, 10:07 PM
73 points
1 comment9 min readLW link
(www.georgeyw.com)

What is your fa­vorite pod­cast?

ChristianKlMay 3, 2025, 9:25 PM
32 points
9 comments1 min readLW link

[Question] Does trans­lat­ing a post with an LLM af­fect its rat­ing?

ReverendBayesMay 3, 2025, 2:45 PM
9 points
9 comments2 min readLW link

Sim­pleS­to­ries: A Bet­ter Syn­thetic Dataset and Tiny Models for Interpretability

Lennart FinkeMay 3, 2025, 2:04 PM
13 points
0 comments1 min readLW link

What’s up with AI’s vision

Joachim BartosikMay 3, 2025, 1:23 PM
12 points
19 comments1 min readLW link

Spar­sity is the en­emy of fea­ture ex­trac­tion (ft. ab­sorp­tion)

May 3, 2025, 10:13 AM
31 points
0 comments6 min readLW link

Ex­plor­ing out-of-con­text rea­son­ing (OOCR) fine-tun­ing in LLMs to in­crease test-phase awareness

Sanyu RajakumarMay 3, 2025, 3:33 AM
8 points
0 comments6 min readLW link

Pri­son Jour­nal: Build­ing Bet­ter Think­ing Skills—Altru­is­tic Per­son Saved > 100 Go­rillas saved

P. JoãoMay 3, 2025, 1:34 AM
−30 points
2 comments1 min readLW link

Up­dates from Com­ments on “AI 2027 is a Bet Against Am­dahl’s Law”

snewmanMay 2, 2025, 11:52 PM
40 points
2 comments13 min readLW link

At­tend SPAR’s vir­tual demo day! (ca­reer fair + talks)

agucovaMay 2, 2025, 11:45 PM
9 points
0 commentsLW link
(demoday.sparai.org)

Why does METR score o3 as effec­tive for such a long time du­ra­tion de­spite over­all poor scores?

Cole WyethMay 2, 2025, 10:58 PM
19 points
3 comments1 min readLW link

Short story: Who is nan­cy­gon­za­lez8451097

Anders LindströmMay 2, 2025, 9:01 PM
13 points
2 comments5 min readLW link

In­terim Re­search Re­port: Mechanisms of Awareness

May 2, 2025, 8:29 PM
43 points
6 comments8 min readLW link

Agents, Tools, and Simulators

May 2, 2025, 8:19 PM
12 points
0 comments10 min readLW link

Ob­sta­cles in ARC’s agenda: Low Prob­a­bil­ity Estimation

David MatolcsiMay 2, 2025, 7:38 PM
43 points
0 comments6 min readLW link

What’s go­ing on with AI progress and trends? (As of 5/​2025)

ryan_greenblattMay 2, 2025, 7:00 PM
71 points
7 comments8 min readLW link

When AI Op­ti­mizes for the Wrong Thing

Anthony FoxMay 2, 2025, 6:00 PM
5 points
0 comments1 min readLW link

Align­ment Struc­ture Direc­tion—Re­cur­sive Ad­ver­sar­ial Over­sight(RAO)

Jayden ShepardMay 2, 2025, 5:51 PM
2 points
0 comments2 min readLW link

AI Welfare Risks

Adrià MoretMay 2, 2025, 5:49 PM
5 points
0 comments1 min readLW link
(philpapers.org)

Philoso­plas­tic­ity: On the Inevitable Drift of Mean­ing in Re­cur­sive Self-In­ter­pret­ing Systems

Maikol CoinMay 2, 2025, 5:46 PM
−1 points
0 comments4 min readLW link

Su­per­men of the (Not so Far) Future

TerriLeafMay 2, 2025, 3:55 PM
9 points
0 comments4 min readLW link

Steer­ing Lan­guage Models in Mul­ti­ple Direc­tions Simultaneously

May 2, 2025, 3:27 PM
18 points
0 comments7 min readLW link

AI In­ci­dent Mon­i­tor­ing: A Brief Analysis

Spencer AmesMay 2, 2025, 3:06 PM
3 points
0 comments5 min readLW link

RA x Con­trolAI video: What if AI just keeps get­ting smarter?

WriterMay 2, 2025, 2:19 PM
100 points
17 comments9 min readLW link

OpenAI Pre­pared­ness Frame­work 2.0

ZviMay 2, 2025, 1:10 PM
60 points
1 comment23 min readLW link
(thezvi.wordpress.com)

Ex-OpenAI em­ployee am­ici leave to file de­nied in Musk v OpenAI case?

TFDMay 2, 2025, 12:27 PM
4 points
6 comments2 min readLW link
(www.thefloatingdroid.com)

Roads are at max­i­mum effi­ciency always

HrussMay 2, 2025, 10:29 AM
1 point
3 comments1 min readLW link

The Con­tinuum Fal­lacy and its Relatives

Zero ContradictionsMay 2, 2025, 2:58 AM
4 points
2 comments4 min readLW link
(thewaywardaxolotl.blogspot.com)

Me­mory De­cod­ing Jour­nal Club: Mo­tor learn­ing se­lec­tively strength­ens cor­ti­cal and stri­atal synapses of mo­tor en­gram neu­rons

Devin WardMay 1, 2025, 11:52 PM
1 point
0 comments1 min readLW link

My Re­search Pro­cess: Un­der­stand­ing and Cul­ti­vat­ing Re­search Taste

Neel NandaMay 1, 2025, 11:08 PM
26 points
1 comment9 min readLW link

AI Gover­nance to Avoid Ex­tinc­tion: The Strate­gic Land­scape and Ac­tion­able Re­search Questions

May 1, 2025, 10:46 PM
105 points
7 comments8 min readLW link
(techgov.intelligence.org)

How to spec­ify an al­ign­ment target

Richard JugginsMay 1, 2025, 9:11 PM
14 points
2 comments12 min readLW link

Ob­sta­cles in ARC’s agenda: Mechanis­tic Ano­maly Detection

David MatolcsiMay 1, 2025, 8:51 PM
42 points
1 comment11 min readLW link

AI-Gen­er­ated GitHub repo back­dated with junk then filled with my sys­tems work. Has any­one seen this be­fore?

rguntherMay 1, 2025, 8:14 PM
7 points
1 comment1 min readLW link

What is Inad­e­quate about Bayesi­anism for AI Align­ment: Mo­ti­vat­ing In­fra-Bayesianism

Brittany GelbMay 1, 2025, 7:06 PM
17 points
0 comments7 min readLW link

Can LLMs Si­mu­late In­ter­nal Eval­u­a­tion? A Case Study in Self-Gen­er­ated Recommendations

The Neutral MindMay 1, 2025, 7:04 PM
4 points
0 comments2 min readLW link

Su­per­hu­man Coders in AI 2027 - Not So Fast

May 1, 2025, 6:56 PM
62 points
0 comments5 min readLW link

AI #114: Liars, Sy­co­phants and Cheaters

ZviMay 1, 2025, 2:00 PM
40 points
5 comments63 min readLW link
(thezvi.wordpress.com)

Slow­down After 2028: Com­pute, RLVR Uncer­tainty, MoE Data Wall

Vladimir_NesovMay 1, 2025, 1:54 PM
175 points
22 comments5 min readLW link

An­thro­po­mor­phiz­ing AI might be good, ac­tu­ally

Seth HerdMay 1, 2025, 1:50 PM
35 points
6 comments3 min readLW link

Dont fo­cus on up­dat­ing P doom

AlgonMay 1, 2025, 11:10 AM
7 points
3 comments2 min readLW link

Pri­ori­tiz­ing Work

jefftkMay 1, 2025, 2:00 AM
106 points
11 comments1 min readLW link
(www.jefftk.com)

Don’t rely on a “race to the top”

sjadlerMay 1, 2025, 12:33 AM
4 points
0 comments1 min readLW link

Meta-Tech­ni­cal­ities: Safe­guard­ing Values in For­mal Systems

LTMApr 30, 2025, 11:43 PM
2 points
0 comments3 min readLW link
(routecause.substack.com)

Ob­sta­cles in ARC’s agenda: Find­ing explanations

David MatolcsiApr 30, 2025, 11:03 PM
122 points
10 comments17 min readLW link

GPT-4o Re­sponds to Nega­tive Feedback

Zvi30 Apr 2025 20:20 UTC
45 points
2 comments18 min readLW link
(thezvi.wordpress.com)

State of play of AI progress (and re­lated brakes on an in­tel­li­gence ex­plo­sion) [Linkpost]

Noosphere8930 Apr 2025 19:58 UTC
7 points
0 comments5 min readLW link
(www.interconnects.ai)

Don’t ac­cuse your in­ter­locu­tor of be­ing in­suffi­ciently truth-seeking

TFD30 Apr 2025 19:38 UTC
30 points
15 comments2 min readLW link
(www.thefloatingdroid.com)

How can we solve diffuse threats like re­search sab­o­tage with AI con­trol?

Vivek Hebbar30 Apr 2025 19:23 UTC
52 points
1 comment8 min readLW link