Re­port & ret­ro­spec­tive on the Dove­tail fellowship

Alex_AltairMar 14, 2025, 11:20 PM
26 points
3 comments9 min readLW link

The Dangers of Out­sourc­ing Think­ing: Los­ing Our Crit­i­cal Think­ing to the Over-Reli­ance on AI De­ci­sion-Making

Cameron Tomé-MoreiraMar 14, 2025, 11:07 PM
11 points
4 comments8 min readLW link

LLMs may en­able di­rect democ­racy at scale

Davey MorseMar 14, 2025, 10:51 PM
14 points
20 comments1 min readLW link

2024 Unoffi­cial LessWrong Sur­vey Results

ScrewtapeMar 14, 2025, 10:29 PM
109 points
28 comments48 min readLW link

AI4Science: The Hid­den Power of Neu­ral Net­works in Scien­tific Discovery

Max MaMar 14, 2025, 9:18 PM
2 points
2 comments1 min readLW link

What are we do­ing when we do math­e­mat­ics?

epicurusMar 14, 2025, 8:54 PM
7 points
1 comment1 min readLW link
(asving.com)

AI for Epistemics Hackathon

Austin ChenMar 14, 2025, 8:46 PM
77 points
12 comments10 min readLW link
(manifund.substack.com)

Geom­e­try of Fea­tures in Mechanis­tic Interpretability

Gunnar CarlssonMar 14, 2025, 7:11 PM
16 points
0 comments8 min readLW link

AI Tools for Ex­is­ten­tial Security

Mar 14, 2025, 6:38 PM
22 points
4 comments11 min readLW link
(www.forethought.org)

Cap­i­tal­ism as the Cat­a­lyst for AGI-In­duced Hu­man Extinction

funnyfrancoMar 14, 2025, 6:14 PM
−3 points
2 comments21 min readLW link

Minor in­ter­pretabil­ity ex­plo­ra­tion #3: Ex­tend­ing su­per­po­si­tion to differ­ent ac­ti­va­tion func­tions (loss land­scape)

Rareș BaronMar 14, 2025, 3:45 PM
3 points
0 comments3 min readLW link

AI for AI safety

Joe CarlsmithMar 14, 2025, 3:00 PM
78 points
13 comments17 min readLW link
(joecarlsmith.substack.com)

Eval­u­at­ing the ROI of Information

Declan MolonyMar 14, 2025, 2:22 PM
12 points
3 comments3 min readLW link

On MAIM and Su­per­in­tel­li­gence Strategy

ZviMar 14, 2025, 12:30 PM
53 points
2 comments13 min readLW link
(thezvi.wordpress.com)

Whether gov­ern­ments will con­trol AGI is im­por­tant and neglected

Seth HerdMar 14, 2025, 9:48 AM
24 points
2 comments9 min readLW link

Some­thing to fight for

RomanSMar 14, 2025, 8:27 AM
4 points
0 comments1 min readLW link

In­ter­pret­ing Complexity

Maxwell AdamMar 14, 2025, 4:52 AM
53 points
8 comments26 min readLW link

Bike Lights are Cheap Enough to Give Away

jefftkMar 14, 2025, 2:10 AM
24 points
0 comments1 min readLW link
(www.jefftk.com)

Su­per­in­tel­li­gence’s goals are likely to be random

Mikhail SaminMar 13, 2025, 10:41 PM
6 points
6 comments5 min readLW link

Should AI safety be a mass move­ment?

mhamptonMar 13, 2025, 8:36 PM
5 points
1 comment4 min readLW link

Au­dit­ing lan­guage mod­els for hid­den objectives

Mar 13, 2025, 7:18 PM
141 points
15 comments13 min readLW link

Re­duc­ing LLM de­cep­tion at scale with self-other over­lap fine-tuning

Mar 13, 2025, 7:09 PM
155 points
41 comments6 min readLW link

Vacuum De­cay: Ex­pert Sur­vey Results

JessRiedelMar 13, 2025, 6:31 PM
96 points
26 commentsLW link

A Fron­tier AI Risk Man­age­ment Frame­work: Bridg­ing the Gap Between Cur­rent AI Prac­tices and Estab­lished Risk Management

Mar 13, 2025, 6:29 PM
10 points
0 comments1 min readLW link
(arxiv.org)

Creat­ing Com­plex Goals: A Model to Create Au­tonomous Agents

theravenMar 13, 2025, 6:17 PM
6 points
1 comment6 min readLW link

Haber­mas Machine

NicholasKeesMar 13, 2025, 6:16 PM
49 points
7 comments6 min readLW link
(mosaic-labs.org)

The Other Align­ment Prob­lem: Maybe AI Needs Pro­tec­tion From Us

PeterpiperMar 13, 2025, 6:03 PM
−3 points
0 comments3 min readLW link

AI #107: The Mis­placed Hype Machine

ZviMar 13, 2025, 2:40 PM
47 points
10 comments40 min readLW link
(thezvi.wordpress.com)

In­tel­sat as a Model for In­ter­na­tional AGI Governance

Mar 13, 2025, 12:58 PM
45 points
0 comments1 min readLW link
(www.forethought.org)

Stac­ity: a Lock-In Risk Bench­mark for Large Lan­guage Models

alamertonMar 13, 2025, 12:08 PM
4 points
0 comments1 min readLW link
(huggingface.co)

The prospect of ac­cel­er­ated AI safety progress, in­clud­ing philo­soph­i­cal progress

Mitchell_PorterMar 13, 2025, 10:52 AM
11 points
0 comments4 min readLW link

The “Rev­er­sal Curse”: you still aren’t antropo­mor­phis­ing enough.

lumpenspaceMar 13, 2025, 10:24 AM
3 points
0 comments1 min readLW link
(lumpenspace.substack.com)

For­mal­iz­ing Space-Far­ing Civ­i­liza­tions Sat­u­ra­tion con­cepts and metrics

Maxime RichéMar 13, 2025, 9:40 AM
4 points
0 comments8 min readLW link

The Eco­nomics of p(doom)

Jakub GrowiecMar 13, 2025, 7:33 AM
2 points
0 comments1 min readLW link

So­cial Me­dia: How to fix them be­fore they be­come the biggest news platform

Sam GMar 13, 2025, 7:28 AM
5 points
2 comments3 min readLW link

Penny Whis­tle in E?

jefftkMar 13, 2025, 2:40 AM
9 points
1 comment1 min readLW link
(www.jefftk.com)

An­thropic, and tak­ing “tech­ni­cal philos­o­phy” more seriously

RaemonMar 13, 2025, 1:48 AM
125 points
29 comments11 min readLW link

LW/​ACX So­cial Meetup

StefanMar 12, 2025, 11:13 PM
2 points
0 comments1 min readLW link

I grade ev­ery NBA bas­ket­ball game I watch based on enjoyability

proshowersingerMar 12, 2025, 9:46 PM
24 points
2 comments4 min readLW link

Kairos is hiring a Head of Oper­a­tions/​Found­ing Generalist

agucovaMar 12, 2025, 8:58 PM
6 points
0 commentsLW link

USAID Out­look: A Me­tac­u­lus Fore­cast­ing Series

ChristianWilliamsMar 12, 2025, 8:34 PM
9 points
0 commentsLW link
(www.metaculus.com)

What is in­stru­men­tal con­ver­gence?

Mar 12, 2025, 8:28 PM
2 points
0 comments2 min readLW link
(aisafety.info)

Re­vis­ing Stages-Over­sight Re­veals Greater Si­tu­a­tional Aware­ness in LLMs

Sanyu RajakumarMar 12, 2025, 5:56 PM
16 points
0 comments13 min readLW link

Why Obe­di­ent AI May Be the Real Catastrophe

G~Mar 12, 2025, 5:50 PM
5 points
2 comments3 min readLW link

Your Com­mu­ni­ca­tion Prefer­ences Aren’t Law

Jonathan MoregårdMar 12, 2025, 5:20 PM
25 points
4 comments1 min readLW link
(honestliving.substack.com)

Reflec­tions on Neuralese

Alice BlairMar 12, 2025, 4:29 PM
28 points
0 comments5 min readLW link

Field tests of semi-ra­tio­nal­ity in Brazilian mil­i­tary training

P. JoãoMar 12, 2025, 4:14 PM
31 points
0 comments2 min readLW link

Many life-sav­ing drugs fail for lack of fund­ing. But there’s a solu­tion: des­per­ate rich people

MvolzMar 12, 2025, 3:24 PM
17 points
0 comments1 min readLW link
(www.theguardian.com)

The Most For­bid­den Technique

ZviMar 12, 2025, 1:20 PM
143 points
9 comments17 min readLW link
(thezvi.wordpress.com)

You don’t ac­tu­ally need a phys­i­cal mul­ti­verse to ex­plain an­thropic fine-tun­ing.

FraserMar 12, 2025, 7:33 AM
7 points
8 comments3 min readLW link
(frvser.com)