Her­i­ta­bil­ity: Five Battles

Steven ByrnesJan 14, 2025, 6:21 PM
90 points
23 comments60 min readLW link

Agent Foun­da­tions 2025 at CMU

Jan 19, 2025, 11:48 PM
90 points
10 comments1 min readLW link

Scal­ing Sparse Fea­ture Cir­cuit Find­ing to Gemma 9B

Jan 10, 2025, 11:08 AM
86 points
11 comments17 min readLW link

Star­gate AI-1

ZviJan 24, 2025, 3:20 PM
85 points
1 comment18 min readLW link
(thezvi.wordpress.com)

I’m offer­ing free math con­sul­ta­tions!

GurkenglasJan 14, 2025, 4:30 PM
83 points
7 comments1 min readLW link

MONA: Man­aged My­opia with Ap­proval Feedback

Jan 23, 2025, 12:24 PM
81 points
30 comments9 min readLW link

On the OpenAI Eco­nomic Blueprint

ZviJan 15, 2025, 2:30 PM
81 points
2 comments9 min readLW link
(thezvi.wordpress.com)

No one has the ball on 1500 Rus­sian olympiad win­ners who’ve re­ceived HPMOR

Mikhail SaminJan 12, 2025, 11:43 AM
80 points
21 comments1 min readLW link

Hu­man study on AI spear phish­ing campaigns

Jan 3, 2025, 3:11 PM
79 points
8 comments5 min readLW link

Stream Entry

lsusrJan 7, 2025, 11:56 PM
76 points
11 comments4 min readLW link

Moder­ately More Than You Wanted To Know: De­pres­sive Realism

JustisMillsJan 13, 2025, 2:57 AM
73 points
4 comments6 min readLW link
(justismills.substack.com)

Beards and Masks?

jefftkJan 18, 2025, 4:00 PM
72 points
5 comments4 min readLW link
(www.jefftk.com)

New, im­proved mul­ti­ple-choice TruthfulQA

Jan 15, 2025, 11:32 PM
72 points
0 comments3 min readLW link

Num­ber­wang: LLMs Do­ing Au­tonomous Re­search, and a Call for Input

Jan 16, 2025, 5:20 PM
71 points
30 comments31 min readLW link

Yud­kowsky on The Tra­jec­tory podcast

Seth HerdJan 24, 2025, 7:52 PM
71 points
39 comments2 min readLW link
(www.youtube.com)

Poli­cy­mak­ers don’t have ac­cess to pay­walled articles

Adam JonesJan 5, 2025, 10:56 AM
71 points
11 comments2 min readLW link
(adamjones.me)

De­tect Good­hart and shut down

Jeremy GillenJan 22, 2025, 6:45 PM
70 points
21 comments7 min readLW link

Tail SP 500 Call Op­tions

sapphireJan 23, 2025, 5:21 AM
70 points
28 comments2 min readLW link

Kessler’s Se­cond Syndrome

Jesse HooglandJan 26, 2025, 7:04 AM
70 points
2 comments3 min readLW link

Some les­sons from the OpenAI-Fron­tierMath debacle

7vikJan 19, 2025, 9:09 PM
70 points
9 comments4 min readLW link

In­fer­ence-Time-Com­pute: More Faith­ful? A Re­search Note

Jan 15, 2025, 4:43 AM
69 points
10 comments11 min readLW link

Ret­ro­spec­tive: 12 [sic] Months Since MIRI

james.lucassenJan 21, 2025, 2:52 AM
68 points
0 comments9 min readLW link

Paper: Open Prob­lems in Mechanis­tic Interpretability

Jan 29, 2025, 10:25 AM
68 points
0 comments1 min readLW link
(arxiv.org)

Chance is in the Map, not the Territory

Jan 13, 2025, 7:17 PM
67 points
18 comments7 min readLW link

Should you go with your best guess?: Against pre­cise Bayesi­anism and re­lated views

Anthony DiGiovanniJan 27, 2025, 8:25 PM
65 points
15 comments22 min readLW link

Ti­maeus is hiring re­searchers & engineers

Jan 17, 2025, 7:13 PM
65 points
4 comments4 min readLW link

Recom­men­da­tions for Tech­ni­cal AI Safety Re­search Directions

Sam MarksJan 10, 2025, 7:34 PM
64 points
1 comment17 min readLW link
(alignment.anthropic.com)

Gam­ing Truth­fulQA: Sim­ple Heuris­tics Ex­posed Dataset Weaknesses

TurnTroutJan 16, 2025, 2:14 AM
64 points
3 comments1 min readLW link
(turntrout.com)

Read The Se­quences As If They Were Writ­ten Today

Peter BerggrenJan 2, 2025, 2:51 AM
63 points
7 comments4 min readLW link

An­nounce­ment: Learn­ing The­ory On­line Course

Jan 20, 2025, 7:55 PM
63 points
33 comments4 min readLW link

“We know how to build AGI”—Sam Altman

Nikola JurkovicJan 6, 2025, 2:05 AM
62 points
5 comments1 min readLW link
(blog.samaltman.com)

Test­ing for Schem­ing with Model Deletion

GuiveJan 7, 2025, 1:54 AM
59 points
21 comments21 min readLW link
(guive.substack.com)

new chi­nese stealth aircraft

bhauthJan 1, 2025, 12:19 AM
58 points
3 comments6 min readLW link
(bhauth.com)

Log­its, log-odds, and loss for par­allel circuits

Dmitry VaintrobJan 20, 2025, 9:56 AM
57 points
4 comments11 min readLW link

A sketch of an AI con­trol safety case

Jan 30, 2025, 5:28 PM
57 points
0 comments5 min readLW link

A Novel Emer­gence of Meta-Aware­ness in LLM Fine-Tuning

rifeJan 15, 2025, 10:59 PM
57 points
32 comments2 min readLW link

AI Safety as a YC Startup

Lukas PeterssonJan 8, 2025, 10:46 AM
56 points
9 comments5 min readLW link

In­tro­duc­ing the WeirdML Benchmark

Håvard Tveit IhleJan 16, 2025, 11:38 AM
56 points
13 comments11 min readLW link

On polytopes

Dmitry VaintrobJan 25, 2025, 1:56 PM
56 points
5 comments12 min readLW link

What’s Be­hind the SynBio Bust?

sarahconstantinJan 30, 2025, 10:30 PM
55 points
8 comments6 min readLW link
(sarahconstantin.substack.com)

Tax Price Goug­ing?

jefftkJan 17, 2025, 2:10 PM
55 points
22 comments3 min readLW link
(www.jefftk.com)

Pre­dict 2025 AI ca­pa­bil­ities (by Sun­day)

Jan 15, 2025, 12:16 AM
55 points
3 comments1 min readLW link

On Deep­Seek’s r1

ZviJan 22, 2025, 7:50 PM
55 points
2 comments35 min readLW link
(thezvi.wordpress.com)

Find­ing Fea­tures Causally Up­stream of Refusal

Jan 14, 2025, 2:30 AM
54 points
5 comments12 min readLW link

AI #99: Farewell to Biden

ZviJan 16, 2025, 2:20 PM
54 points
5 comments58 min readLW link
(thezvi.wordpress.com)

Prefer­ence Inversion

BenquoJan 2, 2025, 6:15 PM
53 points
48 comments4 min readLW link
(benjaminrosshoffman.com)

The OODA Loop—Ob­serve, Ori­ent, De­cide, Act

Davis_KingsleyJan 1, 2025, 8:00 AM
53 points
2 comments11 min readLW link

Dario Amodei: On Deep­Seek and Ex­port Controls

Zach Stein-PerlmanJan 29, 2025, 5:15 PM
53 points
3 comments1 min readLW link
(darioamodei.com)

You should read Hobbes, Locke, Hume, and Mill via Ear­lyModernTexts.com

Arjun PanicksseryJan 30, 2025, 12:35 PM
52 points
3 comments3 min readLW link
(arjunpanickssery.substack.com)

Dis­cur­sive War­fare and Fac­tion Formation

BenquoJan 9, 2025, 4:47 PM
52 points
3 comments3 min readLW link
(benjaminrosshoffman.com)