The Case Against AI Con­trol Research

johnswentworth21 Jan 2025 16:03 UTC
372 points
84 comments6 min readLW link

What’s the short timeline plan?

Marius Hobbhahn2 Jan 2025 14:59 UTC
361 points
51 comments23 min readLW link

The Gen­tle Romance

Richard_Ngo19 Jan 2025 18:29 UTC
244 points
46 comments15 min readLW link
(www.asimov.press)

“Sharp Left Turn” dis­course: An opinionated review

Steven Byrnes28 Jan 2025 18:47 UTC
220 points
31 comments31 min readLW link

Mechanisms too sim­ple for hu­mans to design

Malmesbury22 Jan 2025 16:54 UTC
212 points
45 comments15 min readLW link

In­stru­men­tal Goals Are A Differ­ent And Friendlier Kind Of Thing Than Ter­mi­nal Goals

24 Jan 2025 20:20 UTC
186 points
61 comments5 min readLW link

What Is The Align­ment Prob­lem?

johnswentworth16 Jan 2025 1:20 UTC
181 points
49 comments25 min readLW link

How will we up­date about schem­ing?

ryan_greenblatt6 Jan 2025 20:21 UTC
176 points
21 comments37 min readLW link

Grad­ual Disem­pow­er­ment: Sys­temic Ex­is­ten­tial Risks from In­cre­men­tal AI Development

30 Jan 2025 17:03 UTC
167 points
65 comments2 min readLW link
(gradual-disempowerment.ai)

Don’t ig­nore bad vibes you get from people

Kaj_Sotala18 Jan 2025 9:20 UTC
164 points
52 comments2 min readLW link
(kajsotala.fi)

[Fic­tion] [Comic] Effec­tive Altru­ism and Ra­tion­al­ity meet at a Sec­u­lar Sols­tice afterparty

tandem7 Jan 2025 19:11 UTC
163 points
9 comments1 min readLW link

Cap­i­tal Own­er­ship Will Not Prevent Hu­man Disempowerment

beren5 Jan 2025 6:00 UTC
162 points
20 comments14 min readLW link

Max­i­miz­ing Com­mu­ni­ca­tion, not Traffic

jefftk5 Jan 2025 13:00 UTC
161 points
10 comments1 min readLW link
(www.jefftk.com)

Ap­ply­ing tra­di­tional eco­nomic think­ing to AGI: a trilemma

Steven Byrnes13 Jan 2025 1:23 UTC
153 points
32 comments3 min readLW link

Ac­ti­va­tion space in­ter­pretabil­ity may be doomed

8 Jan 2025 12:49 UTC
152 points
34 comments8 min readLW link

OpenAI #10: Reflections

Zvi7 Jan 2025 17:00 UTC
149 points
7 comments11 min readLW link
(thezvi.wordpress.com)

Quotes from the Star­gate press conference

Nikola Jurkovic22 Jan 2025 0:50 UTC
149 points
7 comments1 min readLW link
(www.c-span.org)

Hu­man takeover might be worse than AI takeover

Tom Davidson10 Jan 2025 16:53 UTC
147 points
56 comments8 min readLW link
(forethoughtnewsletter.substack.com)

AI com­pa­nies are un­likely to make high-as­surance safety cases if timelines are short

ryan_greenblatt23 Jan 2025 18:41 UTC
145 points
5 comments13 min readLW link

Ano­ma­lous To­kens in Deep­Seek-V3 and r1

henry25 Jan 2025 22:55 UTC
144 points
3 comments7 min readLW link

Plan­ning for Ex­treme AI Risks

joshc29 Jan 2025 18:33 UTC
143 points
5 comments16 min readLW link

The In­tel­li­gence Curse

lukedrago3 Jan 2025 19:07 UTC
142 points
27 comments18 min readLW link
(lukedrago.substack.com)

What Indi­ca­tors Should We Watch to Disam­biguate AGI Timelines?

snewman6 Jan 2025 19:57 UTC
142 points
57 comments13 min readLW link

Ten peo­ple on the inside

Buck28 Jan 2025 16:41 UTC
139 points
28 comments4 min readLW link

Tell me about your­self: LLMs are aware of their learned behaviors

22 Jan 2025 0:47 UTC
132 points
5 comments6 min readLW link

Build­ing AI Re­search Fleets

12 Jan 2025 18:23 UTC
132 points
11 comments5 min readLW link

Train­ing on Doc­u­ments About Re­ward Hack­ing In­duces Re­ward Hacking

21 Jan 2025 21:32 UTC
131 points
15 comments2 min readLW link
(alignment.anthropic.com)

Park­in­son’s Law and the Ide­ol­ogy of Statistics

Benquo4 Jan 2025 15:49 UTC
130 points
7 comments8 min readLW link
(benjaminrosshoffman.com)

2024 in AI predictions

jessicata1 Jan 2025 20:29 UTC
125 points
3 comments8 min readLW link

The Game Board has been Flipped: Now is a good time to re­think what you’re doing

LintzA28 Jan 2025 23:36 UTC
118 points
30 comments13 min readLW link

My su­pervillain ori­gin story

Dmitry Vaintrob27 Jan 2025 12:20 UTC
112 points
2 comments5 min readLW link

How do you deal w/​ Su­per Stim­uli?

Logan Riggs14 Jan 2025 15:14 UTC
112 points
25 comments3 min readLW link

Fake think­ing and real thinking

Joe Carlsmith28 Jan 2025 20:05 UTC
111 points
17 comments38 min readLW link

Aris­toc­racy and Hostage Capital

Arjun Panickssery8 Jan 2025 19:38 UTC
108 points
7 comments3 min readLW link
(arjunpanickssery.substack.com)

At­tri­bu­tion-based pa­ram­e­ter decomposition

25 Jan 2025 13:12 UTC
108 points
21 comments4 min readLW link
(publications.apolloresearch.ai)

Com­ment on “Death and the Gor­gon”

Zack_M_Davis1 Jan 2025 5:47 UTC
106 points
35 comments8 min readLW link

Rea­sons for and against work­ing on tech­ni­cal AI safety at a fron­tier AI lab

bilalchughtai5 Jan 2025 14:49 UTC
100 points
12 comments12 min readLW link

The pur­pose­ful drunkard

Dmitry Vaintrob12 Jan 2025 12:27 UTC
98 points
13 comments6 min readLW link

The Ris­ing Sea

Jesse Hoogland25 Jan 2025 20:48 UTC
97 points
6 comments2 min readLW link

Tips On Em­piri­cal Re­search Slides

8 Jan 2025 5:06 UTC
97 points
4 comments6 min readLW link

We prob­a­bly won’t just play sta­tus games with each other af­ter AGI

Matthew Barnett15 Jan 2025 4:56 UTC
97 points
21 comments4 min readLW link

Im­pli­ca­tions of the in­fer­ence scal­ing paradigm for AI safety

Ryan Kidd14 Jan 2025 2:14 UTC
96 points
70 comments5 min readLW link

Tips and Code for Em­piri­cal Re­search Workflows

20 Jan 2025 22:31 UTC
96 points
15 comments20 min readLW link

On Eat­ing the Sun

jessicata8 Jan 2025 4:57 UTC
96 points
99 comments3 min readLW link
(unstablerontology.substack.com)

The sub­set par­ity learn­ing prob­lem: much more than you wanted to know

Dmitry Vaintrob3 Jan 2025 9:13 UTC
95 points
18 comments11 min readLW link

Her­i­ta­bil­ity: Five Battles

Steven Byrnes14 Jan 2025 18:21 UTC
94 points
23 comments60 min readLW link

Five Re­cent AI Tu­tor­ing Studies

Arjun Panickssery19 Jan 2025 3:53 UTC
94 points
0 comments2 min readLW link
(arjunpanickssery.substack.com)

In­tro­duc­ing Squig­gle AI

ozziegooen3 Jan 2025 17:53 UTC
92 points
15 comments8 min readLW link

Six Thoughts on AI Safety

boazbarak24 Jan 2025 22:20 UTC
92 points
55 comments15 min readLW link

The Man­hat­tan Trap: Why a Race to Ar­tifi­cial Su­per­in­tel­li­gence is Self-Defeating

21 Jan 2025 16:57 UTC
91 points
11 comments2 min readLW link
(www.convergenceanalysis.org)