How to Make Superbabies

19 Feb 2025 20:39 UTC
637 points
360 comments31 min readLW link

How AI Takeover Might Hap­pen in 2 Years

joshc7 Feb 2025 17:10 UTC
426 points
142 comments29 min readLW link
(x.com)

Emer­gent Misal­ign­ment: Nar­row fine­tun­ing can pro­duce broadly mis­al­igned LLMs

25 Feb 2025 17:39 UTC
334 points
92 comments4 min readLW link

Mur­der plots are infohazards

Chris Monteiro13 Feb 2025 19:15 UTC
304 points
46 comments2 min readLW link

So You Want To Make Marginal Progress...

johnswentworth7 Feb 2025 23:22 UTC
304 points
42 comments4 min readLW link

Ar­bital has been im­ported to LessWrong

20 Feb 2025 0:47 UTC
284 points
30 comments5 min readLW link

A His­tory of the Fu­ture, 2025-2040

L Rudolf L17 Feb 2025 12:03 UTC
249 points
42 comments75 min readLW link
(nosetgauge.substack.com)

Power Lies Trem­bling: a three-book review

Richard_Ngo22 Feb 2025 22:57 UTC
214 points
29 comments15 min readLW link
(www.mindthefuture.info)

Why Did Elon Musk Just Offer to Buy Con­trol of OpenAI for $100 Billion?

garrison11 Feb 2025 0:20 UTC
208 points
8 comments6 min readLW link
(garrisonlovely.substack.com)

Eliezer’s Lost Align­ment Ar­ti­cles /​ The Ar­bital Sequence

20 Feb 2025 0:48 UTC
208 points
10 comments5 min readLW link

[Question] Have LLMs Gen­er­ated Novel In­sights?

23 Feb 2025 18:22 UTC
169 points
41 comments2 min readLW link

The Sorry State of AI X-Risk Ad­vo­cacy, and Thoughts on Do­ing Better

Thane Ruthenis21 Feb 2025 20:15 UTC
157 points
53 comments6 min readLW link

Levels of Friction

Zvi10 Feb 2025 13:10 UTC
155 points
8 comments12 min readLW link
(thezvi.wordpress.com)

It’s been ten years. I pro­pose HPMOR An­niver­sary Par­ties.

Screwtape16 Feb 2025 1:43 UTC
154 points
3 comments1 min readLW link

A com­pu­ta­tional no-co­in­ci­dence principle

Eric Neyman14 Feb 2025 21:39 UTC
149 points
40 comments6 min readLW link
(www.alignment.org)

Grad­ual Disem­pow­er­ment, Shell Games and Flinches

Jan_Kulveit2 Feb 2025 14:47 UTC
146 points
36 comments6 min readLW link

You can just wear a suit

lsusr26 Feb 2025 14:57 UTC
139 points
59 comments2 min readLW link

The Paris AI Anti-Safety Summit

Zvi12 Feb 2025 14:00 UTC
129 points
21 comments21 min readLW link
(thezvi.wordpress.com)

Re­search di­rec­tions Open Phil wants to fund in tech­ni­cal AI safety

8 Feb 2025 1:40 UTC
117 points
21 comments58 min readLW link
(www.openphilanthropy.org)

The News is Never Neglected

lsusr11 Feb 2025 14:59 UTC
113 points
18 comments1 min readLW link

Two hemi­spheres—I do not think it means what you think it means

Viliam9 Feb 2025 15:33 UTC
112 points
21 comments14 min readLW link

Open Philan­thropy Tech­ni­cal AI Safety RFP - $40M Available Across 21 Re­search Areas

6 Feb 2025 18:58 UTC
111 points
0 comments1 min readLW link
(www.openphilanthropy.org)

My model of what is go­ing on with LLMs

Cole Wyeth13 Feb 2025 3:43 UTC
110 points
49 comments7 min readLW link

Judge­ments: Merg­ing Pre­dic­tion & Evidence

abramdemski23 Feb 2025 19:35 UTC
107 points
7 comments6 min readLW link

A short course on AGI safety from the GDM Align­ment team

14 Feb 2025 15:43 UTC
105 points
2 comments1 min readLW link
(deepmindsafetyresearch.medium.com)

De­tect­ing Strate­gic De­cep­tion Us­ing Lin­ear Probes

6 Feb 2025 15:46 UTC
104 points
9 comments2 min readLW link
(arxiv.org)

AGI Safety & Align­ment @ Google Deep­Mind is hiring

Rohin Shah17 Feb 2025 21:11 UTC
103 points
19 comments10 min readLW link

C’mon guys, De­liber­ate Prac­tice is Real

Raemon5 Feb 2025 22:33 UTC
100 points
25 comments9 min readLW link

Ti­maeus in 2024

20 Feb 2025 23:54 UTC
99 points
1 comment8 min readLW link

Re­view­ing LessWrong: Screw­tape’s Ba­sic Answer

Screwtape5 Feb 2025 4:30 UTC
97 points
18 comments6 min readLW link

Microplas­tics: Much Less Than You Wanted To Know

15 Feb 2025 19:08 UTC
94 points
10 comments13 min readLW link

Dear AGI,

Nathan Young18 Feb 2025 10:48 UTC
89 points
11 comments3 min readLW link

An­thropic re­leases Claude 3.7 Son­net with ex­tended think­ing mode

LawrenceC24 Feb 2025 19:32 UTC
88 points
8 comments4 min readLW link
(www.anthropic.com)

Wired on: “DOGE per­son­nel with ad­min ac­cess to Fed­eral Pay­ment Sys­tem”

Raemon5 Feb 2025 21:32 UTC
88 points
45 comments2 min readLW link
(web.archive.org)

Vot­ing Re­sults for the 2023 Review

Raemon6 Feb 2025 8:00 UTC
87 points
3 comments69 min readLW link

The Risk of Grad­ual Disem­pow­er­ment from AI

Zvi5 Feb 2025 22:10 UTC
87 points
20 comments20 min readLW link
(thezvi.wordpress.com)

[PAPER] Ja­co­bian Sparse Au­toen­coders: Spar­sify Com­pu­ta­tions, Not Just Activations

Lucy Farnik26 Feb 2025 12:50 UTC
85 points
8 comments7 min readLW link

How might we safely pass the buck to AI?

joshc19 Feb 2025 17:48 UTC
84 points
58 comments31 min readLW link

Am­bigu­ous out-of-dis­tri­bu­tion gen­er­al­iza­tion on an al­gorith­mic task

13 Feb 2025 18:24 UTC
84 points
6 comments11 min readLW link

Pick two: con­cise, com­pre­hen­sive, or clear rules

Screwtape3 Feb 2025 6:39 UTC
84 points
27 comments8 min readLW link

The Mask Comes Off: A Trio of Tales

Zvi14 Feb 2025 15:30 UTC
81 points
1 comment13 min readLW link
(thezvi.wordpress.com)

Lan­guage Models Use Tri­gonom­e­try to Do Addition

Subhash Kantamneni5 Feb 2025 13:50 UTC
80 points
1 comment10 min readLW link

A Prob­lem to Solve Be­fore Build­ing a De­cep­tion Detector

7 Feb 2025 19:35 UTC
78 points
12 comments14 min readLW link

Eval­u­at­ing “What 2026 Looks Like” So Far

Jonny Spicer24 Feb 2025 18:55 UTC
78 points
7 comments7 min readLW link

OpenAI re­leases deep re­search agent

Seth Herd3 Feb 2025 12:48 UTC
78 points
21 comments3 min readLW link
(openai.com)

Anti-Slop In­ter­ven­tions?

abramdemski4 Feb 2025 19:50 UTC
78 points
33 comments6 min readLW link

Osaka

lsusr26 Feb 2025 13:50 UTC
78 points
13 comments1 min readLW link

Ther­mo­dy­namic en­tropy = Kol­mogorov complexity

Aram Ebtekar17 Feb 2025 5:56 UTC
77 points
14 comments1 min readLW link
(doi.org)

The Sim­plest Good

Jesse Hoogland2 Feb 2025 19:51 UTC
76 points
6 comments5 min readLW link

MATS Ap­pli­ca­tions + Re­search Direc­tions I’m Cur­rently Ex­cited About

Neel Nanda6 Feb 2025 11:03 UTC
73 points
7 comments8 min readLW link