“Self-Black­mail” and Alternatives

jessicataFeb 9, 2025, 11:20 PM
19 points
12 comments7 min readLW link
(unstableontology.com)

Alt­man blog on post-AGI world

Julian BradshawFeb 9, 2025, 9:52 PM
29 points
10 comments1 min readLW link
(blog.samaltman.com)

Fore­cast­ing newslet­ter #2/​2025: Fore­cast­ing meetup network

NunoSempereFeb 9, 2025, 6:07 PM
13 points
0 comments4 min readLW link
(forecasting.substack.com)

How iden­ti­cal twin sisters feel about nieces vs their own daughters

Dave LindberghFeb 9, 2025, 5:36 PM
4 points
19 comments1 min readLW link

Two hemi­spheres—I do not think it means what you think it means

ViliamFeb 9, 2025, 3:33 PM
109 points
21 comments14 min readLW link

The Struc­ture of Pro­fes­sional Revolutions

SebastianG Feb 9, 2025, 1:23 PM
8 points
0 comments4 min readLW link

Gary Mar­cus now say­ing AI can’t do things it can already do

Benjamin_ToddFeb 9, 2025, 12:24 PM
62 points
12 comments1 min readLW link
(benjamintodd.substack.com)

How do you make a 250x bet­ter vac­cine at 1/​10 the cost? Develop it in In­dia.

Abhishaike MahajanFeb 9, 2025, 3:53 AM
4 points
5 comments1 min readLW link
(www.owlposting.com)

Less Lap­top Velcro

jefftkFeb 9, 2025, 3:30 AM
19 points
0 comments1 min readLW link
(www.jefftk.com)

AXRP Epi­sode 38.7 - An­thony Aguirre on the Fu­ture of Life Institute

DanielFilanFeb 9, 2025, 1:10 AM
10 points
0 comments12 min readLW link

[Job ad] LISA CEO

Feb 9, 2025, 12:18 AM
18 points
4 comments2 min readLW link

“Think it Faster” worksheet

RaemonFeb 8, 2025, 10:02 PM
59 points
8 comments4 min readLW link

Seven sources of goals in LLM agents

Seth HerdFeb 8, 2025, 9:54 PM
22 points
3 comments2 min readLW link

[Question] p(s-risks to con­tem­po­rary hu­mans)?

mhamptonFeb 8, 2025, 9:19 PM
6 points
5 comments6 min readLW link

Cross-Layer Fea­ture Align­ment and Steer­ing in Large Lan­guage Model

dlaptevFeb 8, 2025, 8:18 PM
5 points
0 comments6 min readLW link

Towards build­ing blocks of ontologies

Feb 8, 2025, 4:03 PM
29 points
0 comments26 min readLW link

Can Knowl­edge Hurt You? The Dangers of In­fo­haz­ards (and Exfo­haz­ards)

Feb 8, 2025, 3:51 PM
20 points
0 comments5 min readLW link
(www.youtube.com)

Distill­ing the In­ter­nal Model Principle

JoseFaustinoFeb 8, 2025, 2:59 PM
21 points
0 comments16 min readLW link

Knock­ing Down My AI Op­ti­mist Strawman

tailcalledFeb 8, 2025, 10:52 AM
31 points
3 comments6 min readLW link

Pre­serv­ing Epistemic Novelty in AI: Ex­per­i­ments, In­sights, and the Case for De­cen­tral­ized Col­lec­tive Intelligence

Andy E WilliamsFeb 8, 2025, 10:25 AM
−4 points
8 comments7 min readLW link

Chaos In­vest­ments v0.31

ScrewtapeFeb 8, 2025, 6:53 AM
15 points
1 comment9 min readLW link

AI Safety Oversights

Davey MorseFeb 8, 2025, 6:15 AM
3 points
0 comments1 min readLW link

Wiki on Sus­pects in Lind, Za­jko, and Maland Killings

Rebecca_RecordsFeb 8, 2025, 4:16 AM
20 points
4 comments1 min readLW link

Re­search di­rec­tions Open Phil wants to fund in tech­ni­cal AI safety

Feb 8, 2025, 1:40 AM
117 points
21 comments58 min readLW link
(www.openphilanthropy.org)

So You Want To Make Marginal Progress...

johnswentworthFeb 7, 2025, 11:22 PM
286 points
42 comments4 min readLW link

Rea­sons-based choice and cluelessness

JesseCliftonFeb 7, 2025, 10:21 PM
34 points
0 comments10 min readLW link

[Trans­la­tion] In the Age of AI don’t Look for Unicorns

mushroomsoupFeb 7, 2025, 9:06 PM
3 points
0 comments10 min readLW link

Rac­ing Towards Fu­sion and AI

Jeffrey HeningerFeb 7, 2025, 8:40 PM
48 points
11 comments7 min readLW link

‘High-Level Ma­chine In­tel­li­gence’ and ‘Full Au­toma­tion of La­bor’ in the AI Im­pacts Surveys

Jeffrey HeningerFeb 7, 2025, 8:40 PM
11 points
1 comment7 min readLW link

Re­quest for In­for­ma­tion for a new US AI Ac­tion Plan (OSTP RFI)

agucovaFeb 7, 2025, 8:40 PM
5 points
0 commentsLW link
(www.federalregister.gov)

A Prob­lem to Solve Be­fore Build­ing a De­cep­tion Detector

Feb 7, 2025, 7:35 PM
71 points
12 comments14 min readLW link

Re­quest for pro­pos­als: im­prov­ing ca­pa­bil­ity evaluations

cbFeb 7, 2025, 6:51 PM
1 point
0 comments1 min readLW link
(www.openphilanthropy.org)

How AI Takeover Might Hap­pen in 2 Years

joshcFeb 7, 2025, 5:10 PM
422 points
137 comments29 min readLW link
(x.com)

the devil’s ontology

lostinwilliamsburgFeb 7, 2025, 2:18 PM
−1 points
14 comments6 min readLW link

On the Meta and Deep­Mind Safety Frameworks

ZviFeb 7, 2025, 1:10 PM
45 points
1 comment17 min readLW link
(thezvi.wordpress.com)

Illu­sory Safety: Redteam­ing Deep­Seek R1 and the Strongest Fine-Tun­able Models of OpenAI, An­thropic, and Google

Feb 7, 2025, 3:57 AM
29 points
0 comments10 min readLW link

When you down­vote, ex­plain why

KvmanThinkingFeb 7, 2025, 1:03 AM
4 points
31 comments1 min readLW link

Med­i­cal Wind­fall Prizes

PeterMcCluskeyFeb 6, 2025, 11:33 PM
5 points
1 comment5 min readLW link
(bayesianinvestor.com)

Do No Harm? Nav­i­gat­ing and Nudg­ing AI Mo­ral Choices

Feb 6, 2025, 7:18 PM
11 points
0 comments9 min readLW link

Open Philan­thropy Tech­ni­cal AI Safety RFP - $40M Available Across 21 Re­search Areas

Feb 6, 2025, 6:58 PM
111 points
0 comments1 min readLW link
(www.openphilanthropy.org)

AISN #47: Rea­son­ing Models

Feb 6, 2025, 6:52 PM
3 points
0 comments4 min readLW link
(newsletter.safe.ai)

Wild An­i­mal Suffer­ing Is The Worst Thing In The World

omnizoidFeb 6, 2025, 4:15 PM
26 points
18 comments7 min readLW link

De­tect­ing Strate­gic De­cep­tion Us­ing Lin­ear Probes

Feb 6, 2025, 3:46 PM
102 points
9 comments2 min readLW link
(arxiv.org)

AI #102: Made in America

ZviFeb 6, 2025, 2:20 PM
26 points
18 comments67 min readLW link
(thezvi.wordpress.com)

Biol­ogy, Ide­ol­ogy and Violence

Zero ContradictionsFeb 6, 2025, 11:26 AM
−3 points
5 comments2 min readLW link
(thewaywardaxolotl.blogspot.com)

MATS Ap­pli­ca­tions + Re­search Direc­tions I’m Cur­rently Ex­cited About

Neel NandaFeb 6, 2025, 11:03 AM
73 points
7 comments8 min readLW link

Don’t go bankrupt, don’t go rogue

Nathan YoungFeb 6, 2025, 10:31 AM
20 points
1 comment7 min readLW link

Vot­ing Re­sults for the 2023 Review

RaemonFeb 6, 2025, 8:00 AM
86 points
3 comments69 min readLW link

Chi­canery: No

ScrewtapeFeb 6, 2025, 5:42 AM
31 points
10 comments5 min readLW link

[Question] hyp­no­sis question

KvmanThinkingFeb 6, 2025, 2:41 AM
3 points
5 comments1 min readLW link