Will Je­sus Christ re­turn in an elec­tion year?

Eric Neyman24 Mar 2025 16:50 UTC
405 points
59 comments4 min readLW link
(ericneyman.wordpress.com)

A Bear Case: My Pre­dic­tions Re­gard­ing AI Progress

Thane Ruthenis5 Mar 2025 16:41 UTC
377 points
163 comments9 min readLW link

Re­cent AI model progress feels mostly like bullshit

lc24 Mar 2025 19:28 UTC
356 points
85 comments8 min readLW link
(zeropath.com)

Policy for LLM Writ­ing on LessWrong

jimrandomh24 Mar 2025 21:41 UTC
334 points
71 comments2 min readLW link

Trac­ing the Thoughts of a Large Lan­guage Model

Adam Jermyn27 Mar 2025 17:20 UTC
305 points
24 comments10 min readLW link
(www.anthropic.com)

Good Re­search Takes are Not Suffi­cient for Good Strate­gic Takes

Neel Nanda22 Mar 2025 10:13 UTC
292 points
28 comments4 min readLW link
(www.neelnanda.io)

Tro­jan Sky

Richard_Ngo11 Mar 2025 3:14 UTC
252 points
39 comments12 min readLW link
(www.narrativeark.xyz)

METR: Mea­sur­ing AI Abil­ity to Com­plete Long Tasks

Zach Stein-Perlman19 Mar 2025 16:00 UTC
241 points
106 comments5 min readLW link
(metr.org)

Ex­plain­ing Bri­tish Naval Dom­i­nance Dur­ing the Age of Sail

Arjun Panickssery28 Mar 2025 5:47 UTC
206 points
16 comments4 min readLW link
(arjunpanickssery.substack.com)

Why White-Box Redteam­ing Makes Me Feel Weird

Zygi Straznickas16 Mar 2025 18:54 UTC
206 points
36 comments3 min readLW link

In­ten­tion to Treat

Alicorn20 Mar 2025 20:01 UTC
200 points
5 comments2 min readLW link

Claude Son­net 3.7 (of­ten) knows when it’s in al­ign­ment evaluations

17 Mar 2025 19:11 UTC
184 points
9 comments6 min readLW link

OpenAI: De­tect­ing mis­be­hav­ior in fron­tier rea­son­ing models

Daniel Kokotajlo11 Mar 2025 2:17 UTC
183 points
26 comments4 min readLW link
(openai.com)

So how well is Claude play­ing Poké­mon?

Julian Bradshaw7 Mar 2025 5:54 UTC
171 points
76 comments5 min readLW link

On the Ra­tion­al­ity of Deter­ring ASI

Dan H5 Mar 2025 16:11 UTC
168 points
34 comments4 min readLW link
(nationalsecurity.ai)

Re­duc­ing LLM de­cep­tion at scale with self-other over­lap fine-tuning

13 Mar 2025 19:09 UTC
162 points
46 comments6 min readLW link

I make sev­eral mil­lion dol­lars per year and have hun­dreds of thou­sands of fol­low­ers—what is the straight­est line path to uti­liz­ing these re­sources to re­duce ex­is­ten­tial-level AI threats?

shrimpy16 Mar 2025 16:52 UTC
161 points
26 comments1 min readLW link

Self-fulfilling mis­al­ign­ment data might be poi­son­ing our AI models

TurnTrout2 Mar 2025 19:51 UTC
154 points
29 comments1 min readLW link
(turntrout.com)

Statis­ti­cal Challenges with Mak­ing Su­per IQ babies

Jan Christian Refsgaard2 Mar 2025 20:26 UTC
154 points
26 comments9 min readLW link

Con­cep­tual Round­ing Errors

Jan_Kulveit26 Mar 2025 19:00 UTC
151 points
15 comments3 min readLW link
(boundedlyrational.substack.com)

The Most For­bid­den Technique

Zvi12 Mar 2025 13:20 UTC
150 points
9 comments17 min readLW link
(thezvi.wordpress.com)

Meth­ods for strong hu­man germline en­g­ineer­ing

TsviBT3 Mar 2025 8:13 UTC
149 points
29 comments108 min readLW link

The Hid­den Cost of Our Lies to AI

Nicholas Andresen6 Mar 2025 5:03 UTC
145 points
18 comments7 min readLW link
(substack.com)

The Mil­ton Fried­man Model of Policy Change

JohnofCharleston4 Mar 2025 0:38 UTC
143 points
17 comments4 min readLW link

Au­dit­ing lan­guage mod­els for hid­den objectives

13 Mar 2025 19:18 UTC
141 points
15 comments13 min readLW link

[Question] How Much Are LLMs Ac­tu­ally Boost­ing Real-World Pro­gram­mer Pro­duc­tivity?

Thane Ruthenis4 Mar 2025 16:23 UTC
141 points
52 comments3 min readLW link

An­thropic, and tak­ing “tech­ni­cal philos­o­phy” more seriously

Raemon13 Mar 2025 1:48 UTC
139 points
29 comments11 min readLW link

The Pando Prob­lem: Re­think­ing AI Individuality

Jan_Kulveit28 Mar 2025 21:03 UTC
133 points
14 comments13 min readLW link

Do rea­son­ing mod­els use their scratch­pad like we do? Ev­i­dence from dis­till­ing paraphrases

Fabien Roger11 Mar 2025 11:52 UTC
127 points
23 comments11 min readLW link
(alignment.anthropic.com)

How I’ve run ma­jor projects

benkuhn16 Mar 2025 18:40 UTC
127 points
10 comments8 min readLW link
(www.benkuhn.net)

Do mod­els say what they learn?

22 Mar 2025 15:19 UTC
126 points
12 comments13 min readLW link

[Question] when will LLMs be­come hu­man-level blog­gers?

nostalgebraist9 Mar 2025 21:10 UTC
125 points
34 comments6 min readLW link

Nega­tive Re­sults for SAEs On Down­stream Tasks and Depri­ori­tis­ing SAE Re­search (GDM Mech In­terp Team Progress Up­date #2)

26 Mar 2025 19:07 UTC
113 points
15 comments29 min readLW link
(deepmindsafetyresearch.medium.com)

2024 Unoffi­cial LessWrong Sur­vey Results

Screwtape14 Mar 2025 22:29 UTC
110 points
28 comments48 min readLW link

How I talk to those above me

Maxwell Peterson30 Mar 2025 6:54 UTC
104 points
16 comments8 min readLW link

AI Con­trol May In­crease Ex­is­ten­tial Risk

Jan_Kulveit11 Mar 2025 14:30 UTC
101 points
13 comments1 min readLW link

Third-wave AI safety needs so­ciopoli­ti­cal thinking

Richard_Ngo27 Mar 2025 0:55 UTC
99 points
23 comments26 min readLW link

What the Head­lines Miss About the Lat­est De­ci­sion in the Musk vs. OpenAI Lawsuit

garrison6 Mar 2025 19:49 UTC
98 points
0 comments6 min readLW link
(garrisonlovely.substack.com)

Vacuum De­cay: Ex­pert Sur­vey Results

JessRiedel13 Mar 2025 18:31 UTC
96 points
26 comments13 min readLW link

Towards a scale-free the­ory of in­tel­li­gent agency

Richard_Ngo21 Mar 2025 1:39 UTC
96 points
45 comments13 min readLW link
(www.mindthefuture.info)

Elite Co­or­di­na­tion via the Con­sen­sus of Power

Richard_Ngo19 Mar 2025 6:56 UTC
92 points
15 comments12 min readLW link
(www.mindthefuture.info)

How I force LLMs to gen­er­ate cor­rect code

claudio21 Mar 2025 14:40 UTC
91 points
7 comments5 min readLW link

We should start look­ing for schem­ing “in the wild”

Marius Hobbhahn6 Mar 2025 13:49 UTC
91 points
4 comments5 min readLW link

What goals will AIs have? A list of hypotheses

Daniel Kokotajlo3 Mar 2025 20:08 UTC
88 points
20 comments18 min readLW link

Elon Musk May Be Tran­si­tion­ing to Bipo­lar Type I

Cyborg2511 Mar 2025 17:45 UTC
86 points
22 comments4 min readLW link

Open prob­lems in emer­gent misalignment

1 Mar 2025 9:47 UTC
83 points
17 comments7 min readLW link

OpenAI #11: Amer­ica Ac­tion Plan

Zvi18 Mar 2025 12:50 UTC
83 points
3 comments6 min readLW link
(thezvi.wordpress.com)

Mis­tral Large 2 (123B) seems to ex­hibit al­ign­ment faking

27 Mar 2025 15:39 UTC
81 points
4 comments13 min readLW link

Go home GPT-4o, you’re drunk: emer­gent mis­al­ign­ment as low­ered inhibitions

18 Mar 2025 14:48 UTC
80 points
12 comments5 min readLW link

Eukary­ote Skips Town—Why I’m leav­ing DC

eukaryote26 Mar 2025 17:16 UTC
80 points
1 comment6 min readLW link
(eukaryotewritesblog.com)