Deep Honesty

AletheophileMay 7, 2024, 8:31 PM
159 points
25 comments9 min readLW link

Mak­ing ev­ery re­searcher seek grants is a bro­ken model

jasoncrawfordJan 26, 2024, 4:06 PM
159 points
41 comments4 min readLW link
(rootsofprogress.org)

Cur­rent safety train­ing tech­niques do not fully trans­fer to the agent setting

Nov 3, 2024, 7:24 PM
158 points
9 comments5 min readLW link

What’s up with LLMs rep­re­sent­ing XORs of ar­bi­trary fea­tures?

Sam MarksJan 3, 2024, 7:44 PM
158 points
63 comments16 min readLW link

Lan­guage Models Model Us

eggsyntaxMay 17, 2024, 9:00 PM
158 points
55 comments7 min readLW link

EIS XIII: Reflec­tions on An­thropic’s SAE Re­search Circa May 2024

scasperMay 21, 2024, 8:15 PM
157 points
16 comments3 min readLW link

Iron­ing Out the Squiggles

Zack_M_DavisApr 29, 2024, 4:13 PM
157 points
36 comments11 min readLW link

[Question] things that con­fuse me about the cur­rent AI mar­ket.

DMMFAug 28, 2024, 1:46 PM
156 points
27 comments2 min readLW link

Apol­o­giz­ing is a Core Ra­tion­al­ist Skill

johnswentworthJan 2, 2024, 5:47 PM
156 points
42 comments5 min readLW link

If you weren’t such an idiot...

Mar 2, 2024, 12:01 AM
156 points
74 comments2 min readLW link
(markxu.com)

The In­cred­ible Fen­tanyl-De­tect­ing Machine

sarahconstantinJun 28, 2024, 10:10 PM
156 points
26 comments7 min readLW link
(sarahconstantin.substack.com)

For­mal ver­ifi­ca­tion, heuris­tic ex­pla­na­tions and sur­prise accounting

Jacob_HiltonJun 25, 2024, 3:40 PM
156 points
11 comments9 min readLW link
(www.alignment.org)

A Rocket–In­ter­pretabil­ity Analogy

plexOct 21, 2024, 1:55 PM
155 points
31 comments1 min readLW link

“It’s a 10% chance which I did 10 times, so it should be 100%”

egor.timatkovNov 18, 2024, 1:14 AM
154 points
59 comments2 min readLW link

o3

Zach Stein-PerlmanDec 20, 2024, 6:30 PM
154 points
164 comments1 min readLW link

Dyslucksia

Shoshannah TekofskyMay 9, 2024, 7:21 PM
154 points
45 comments6 min readLW link

Sub­skills of “Listen­ing to Wis­dom”

RaemonDec 9, 2024, 3:01 AM
154 points
29 comments42 min readLW link

Li­a­bil­ity regimes for AI

Ege ErdilAug 19, 2024, 1:25 AM
153 points
34 comments5 min readLW link

Deep athe­ism and AI risk

Joe CarlsmithJan 4, 2024, 6:58 PM
153 points
22 comments27 min readLW link

De­com­pos­ing Agency — ca­pa­bil­ities with­out desires

Jul 11, 2024, 9:38 AM
153 points
32 comments12 min readLW link
(strangecities.substack.com)

OpenAI: Exodus

ZviMay 20, 2024, 1:10 PM
153 points
26 comments44 min readLW link
(thezvi.wordpress.com)

Arith­metic is an un­der­rated world-mod­el­ing technology

dynomightOct 17, 2024, 2:00 PM
152 points
33 comments6 min readLW link
(dynomight.net)

“Align­ment Fak­ing” frame is some­what fake

Jan_KulveitDec 20, 2024, 9:51 AM
152 points
13 comments6 min readLW link

Us­ing axis lines for good or evil

dynomightMar 6, 2024, 2:47 PM
151 points
39 comments4 min readLW link
(dynomight.net)

Pri­ors and Prejudice

MathiasKBApr 22, 2024, 3:00 PM
151 points
31 comments7 min readLW link

The Check­list: What Suc­ceed­ing at AI Safety Will In­volve

Sam BowmanSep 3, 2024, 6:18 PM
151 points
49 comments22 min readLW link
(sleepinyourhat.github.io)

My takes on SB-1047

leogaoSep 9, 2024, 6:38 PM
151 points
8 comments4 min readLW link

Daniel Den­nett has died (1942-2024)

kaveApr 19, 2024, 4:17 PM
150 points
5 comments1 min readLW link
(dailynous.com)

2023 Sur­vey Results

ScrewtapeFeb 16, 2024, 10:24 PM
150 points
26 comments44 min readLW link

Ver­nor Vinge, who coined the term “Tech­nolog­i­cal Sin­gu­lar­ity”, dies at 79

Kaj_SotalaMar 21, 2024, 10:14 PM
149 points
25 comments1 min readLW link
(arstechnica.com)

On Devin

ZviMar 18, 2024, 1:20 PM
148 points
34 comments11 min readLW link
(thezvi.wordpress.com)

What good is G-fac­tor if you’re dumped in the woods? A field re­port from a camp coun­selor.

HastingsJan 12, 2024, 1:17 PM
148 points
22 comments1 min readLW link

Some (prob­le­matic) aes­thet­ics of what con­sti­tutes good work in academia

Steven ByrnesMar 11, 2024, 5:47 PM
148 points
12 comments12 min readLW link

Lead­ing The Parade

johnswentworthJan 31, 2024, 10:39 PM
148 points
31 comments9 min readLW link

What o3 Be­comes by 2028

Vladimir_NesovDec 22, 2024, 12:37 PM
147 points
15 comments5 min readLW link

0. CAST: Cor­rigi­bil­ity as Sin­gu­lar Target

Max HarmsJun 7, 2024, 10:29 PM
147 points
17 comments8 min readLW link

Stanis­lav Petrov Quar­terly Perfor­mance Review

Ricki HeicklenSep 26, 2024, 9:20 PM
147 points
3 comments5 min readLW link
(bayesshammai.substack.com)

OpenAI o1

Zach Stein-PerlmanSep 12, 2024, 5:30 PM
147 points
41 comments1 min readLW link

Re­peal the Jones Act of 1920

ZviNov 27, 2024, 3:00 PM
146 points
24 comments39 min readLW link
(thezvi.wordpress.com)

LLMs for Align­ment Re­search: a safety pri­or­ity?

abramdemskiApr 4, 2024, 8:03 PM
145 points
24 comments11 min readLW link

The In­for­ma­tion: OpenAI shows ‘Straw­berry’ to feds, races to launch it

Martín SotoAug 27, 2024, 11:10 PM
145 points
15 comments3 min readLW link

When is a mind me?

Rob BensingerApr 17, 2024, 5:56 AM
144 points
130 comments15 min readLW link

Fields that I refer­ence when think­ing about AI takeover prevention

BuckAug 13, 2024, 11:08 PM
144 points
16 comments10 min readLW link
(redwoodresearch.substack.com)

Why Don’t We Just… Shog­goth+Face+Para­phraser?

Nov 19, 2024, 8:53 PM
144 points
58 comments14 min readLW link

That Alien Mes­sage—The Animation

WriterSep 7, 2024, 2:53 PM
144 points
10 comments8 min readLW link
(youtu.be)

Nurs­ing doubts

dynomightAug 30, 2024, 2:25 AM
144 points
23 comments9 min readLW link
(dynomight.net)

China Hawks are Man­u­fac­tur­ing an AI Arms Race

garrisonNov 20, 2024, 6:17 PM
144 points
44 commentsLW link
(garrisonlovely.substack.com)

The “Think It Faster” Exercise

RaemonDec 11, 2024, 7:14 PM
144 points
35 comments13 min readLW link

Value Claims (In Par­tic­u­lar) Are Usu­ally Bullshit

johnswentworth30 May 2024 6:26 UTC
144 points
18 comments2 min readLW link

Mo­men­tum of Light in Glass

Ben9 Oct 2024 20:19 UTC
143 points
44 comments11 min readLW link