The Lizard­man and the Black Hat Bobcat

ScrewtapeApr 6, 2025, 7:02 PM
107 points
15 comments9 min readLW link

How train­ing-gamers might func­tion (and win)

Vivek HebbarApr 11, 2025, 9:26 PM
107 points
5 comments13 min readLW link

At­tri­bu­tion-based pa­ram­e­ter decomposition

Jan 25, 2025, 1:12 PM
107 points
21 comments4 min readLW link
(publications.apolloresearch.ai)

We’re Not Ad­ver­tis­ing Enough (Post 3 of 6 on AI Gover­nance)

Mass_DriverMay 22, 2025, 5:05 PM
107 points
10 comments28 min readLW link

My su­pervillain ori­gin story

Dmitry VaintrobJan 27, 2025, 12:20 PM
106 points
2 comments5 min readLW link

How do you deal w/​ Su­per Stim­uli?

Logan RiggsJan 14, 2025, 3:14 PM
106 points
25 comments3 min readLW link

AI 2027: Responses

ZviApr 8, 2025, 12:50 PM
106 points
3 comments30 min readLW link
(thezvi.wordpress.com)

Pri­ori­tiz­ing Work

jefftkMay 1, 2025, 2:00 AM
106 points
11 comments1 min readLW link
(www.jefftk.com)

AI Gover­nance to Avoid Ex­tinc­tion: The Strate­gic Land­scape and Ac­tion­able Re­search Questions

May 1, 2025, 10:46 PM
105 points
7 comments8 min readLW link
(techgov.intelligence.org)

Steer­ing Gem­ini with BiDPO

TurnTroutJan 31, 2025, 2:37 AM
104 points
5 comments1 min readLW link
(turntrout.com)

My model of what is go­ing on with LLMs

Cole WyethFeb 13, 2025, 3:43 AM
104 points
49 comments7 min readLW link

Show, not tell: GPT-4o is more opinionated in images than in text

Apr 2, 2025, 8:51 AM
103 points
41 comments3 min readLW link

A short course on AGI safety from the GDM Align­ment team

Feb 14, 2025, 3:43 PM
103 points
2 comments1 min readLW link
(deepmindsafetyresearch.medium.com)

Com­ment on “Death and the Gor­gon”

Zack_M_DavisJan 1, 2025, 5:47 AM
103 points
33 comments8 min readLW link

Judge­ments: Merg­ing Pre­dic­tion & Evidence

abramdemskiFeb 23, 2025, 7:35 PM
103 points
5 comments6 min readLW link

AGI Safety & Align­ment @ Google Deep­Mind is hiring

Rohin ShahFeb 17, 2025, 9:11 PM
102 points
19 comments10 min readLW link

How I talk to those above me

Maxwell PetersonMar 30, 2025, 6:54 AM
102 points
16 comments8 min readLW link

De­tect­ing Strate­gic De­cep­tion Us­ing Lin­ear Probes

Feb 6, 2025, 3:46 PM
102 points
9 comments2 min readLW link
(arxiv.org)

RA x Con­trolAI video: What if AI just keeps get­ting smarter?

WriterMay 2, 2025, 2:19 PM
100 points
17 comments9 min readLW link

Rea­sons for and against work­ing on tech­ni­cal AI safety at a fron­tier AI lab

bilalchughtaiJan 5, 2025, 2:49 PM
100 points
12 comments12 min readLW link

C’mon guys, De­liber­ate Prac­tice is Real

RaemonFeb 5, 2025, 10:33 PM
99 points
25 comments9 min readLW link

Gen­er­at­ing the Fun­niest Joke with RL (ac­cord­ing to GPT-4.1)

aggMay 16, 2025, 5:09 AM
99 points
22 comments4 min readLW link

As­so­ci­a­tion taxes are col­lu­sion subsidies

KatjaGraceMay 27, 2025, 6:50 AM
99 points
7 comments1 min readLW link
(worldspiritsockpuppet.com)

Ti­maeus in 2024

Feb 20, 2025, 11:54 PM
99 points
1 comment8 min readLW link

Third-wave AI safety needs so­ciopoli­ti­cal thinking

Richard_NgoMar 27, 2025, 12:55 AM
99 points
23 comments26 min readLW link

The Ukraine War and the Kill Market

Martin SustrikMay 4, 2025, 7:50 AM
98 points
13 comments5 min readLW link
(250bpm.substack.com)

The pur­pose­ful drunkard

Dmitry VaintrobJan 12, 2025, 12:27 PM
98 points
13 comments6 min readLW link

AI Con­trol May In­crease Ex­is­ten­tial Risk

Jan_KulveitMar 11, 2025, 2:30 PM
98 points
13 comments1 min readLW link

What the Head­lines Miss About the Lat­est De­ci­sion in the Musk vs. OpenAI Lawsuit

garrisonMar 6, 2025, 7:49 PM
98 points
0 commentsLW link
(garrisonlovely.substack.com)

Vacuum De­cay: Ex­pert Sur­vey Results

JessRiedelMar 13, 2025, 6:31 PM
96 points
26 commentsLW link

Re­view­ing LessWrong: Screw­tape’s Ba­sic Answer

ScrewtapeFeb 5, 2025, 4:30 AM
96 points
18 comments6 min readLW link

Towards a scale-free the­ory of in­tel­li­gent agency

Richard_NgoMar 21, 2025, 1:39 AM
96 points
44 comments13 min readLW link
(www.mindthefuture.info)

How to Build a Third Place on Focusmate

Parker ConleyApr 28, 2025, 11:46 PM
96 points
10 comments5 min readLW link
(parconley.com)

The Sweet Les­son: AI Safety Should Scale With Compute

Jesse HooglandMay 5, 2025, 7:03 PM
95 points
3 comments3 min readLW link

The sub­set par­ity learn­ing prob­lem: much more than you wanted to know

Dmitry VaintrobJan 3, 2025, 9:13 AM
94 points
18 comments11 min readLW link

Tips and Code for Em­piri­cal Re­search Workflows

Jan 20, 2025, 10:31 PM
94 points
14 comments20 min readLW link

On Eat­ing the Sun

jessicataJan 8, 2025, 4:57 AM
94 points
96 comments3 min readLW link
(unstablerontology.substack.com)

We prob­a­bly won’t just play sta­tus games with each other af­ter AGI

Matthew BarnettJan 15, 2025, 4:56 AM
93 points
21 comments4 min readLW link

Im­pli­ca­tions of the in­fer­ence scal­ing paradigm for AI safety

Ryan KiddJan 14, 2025, 2:14 AM
93 points
70 comments5 min readLW link

Five Re­cent AI Tu­tor­ing Studies

Arjun PanicksseryJan 19, 2025, 3:53 AM
93 points
0 comments2 min readLW link
(arjunpanickssery.substack.com)

Elite Co­or­di­na­tion via the Con­sen­sus of Power

Richard_NgoMar 19, 2025, 6:56 AM
92 points
15 comments12 min readLW link
(www.mindthefuture.info)

The Ris­ing Sea

Jesse HooglandJan 25, 2025, 8:48 PM
92 points
2 comments2 min readLW link

a con­fu­sion about prefer­ence orderings

nostalgebraistMay 11, 2025, 7:30 PM
92 points
38 comments11 min readLW link

In­tro­duc­ing Squig­gle AI

ozziegooenJan 3, 2025, 5:53 PM
92 points
15 commentsLW link

ASI ex­is­ten­tial risk: Re­con­sid­er­ing Align­ment as a Goal

habrykaApr 15, 2025, 7:57 PM
91 points
14 comments19 min readLW link
(michaelnotebook.com)

How I force LLMs to gen­er­ate cor­rect code

claudioMar 21, 2025, 2:40 PM
91 points
7 comments5 min readLW link

Thoughts on the con­ser­va­tive as­sump­tions in AI control

BuckJan 17, 2025, 7:23 PM
91 points
5 comments13 min readLW link

Slow cor­po­ra­tions as an in­tu­ition pump for AI R&D automation

May 9, 2025, 2:49 PM
91 points
23 comments9 min readLW link

Six Thoughts on AI Safety

boazbarakJan 24, 2025, 10:20 PM
91 points
55 comments15 min readLW link

Tips On Em­piri­cal Re­search Slides

Jan 8, 2025, 5:06 AM
90 points
4 comments6 min readLW link