The Talk: a brief ex­pla­na­tion of sex­ual dimorphism

Malmesbury18 Sep 2023 16:23 UTC
481 points
72 comments16 min readLW link

In­side Views, Im­pos­tor Syn­drome, and the Great LARP

johnswentworth25 Sep 2023 16:08 UTC
325 points
53 comments5 min readLW link

Shar­ing In­for­ma­tion About Nonlinear

Ben Pace7 Sep 2023 6:51 UTC
322 points
323 comments34 min readLW link

EA Ve­gan Ad­vo­cacy is not truth­seek­ing, and it’s ev­ery­one’s problem

Elizabeth28 Sep 2023 23:30 UTC
317 points
246 comments22 min readLW link
(acesounderglass.com)

Sum-thresh­old attacks

TsviBT8 Sep 2023 17:13 UTC
222 points
52 comments10 min readLW link
(tsvibt.blogspot.com)

AI pres­i­dents dis­cuss AI al­ign­ment agendas

9 Sep 2023 18:55 UTC
216 points
22 comments1 min readLW link
(www.youtube.com)

What I would do if I wasn’t at ARC Evals

LawrenceC5 Sep 2023 19:19 UTC
212 points
8 comments13 min readLW link

UDT shows that de­ci­sion the­ory is more puz­zling than ever

Wei Dai13 Sep 2023 12:26 UTC
197 points
51 comments1 min readLW link

How to Catch an AI Liar: Lie De­tec­tion in Black-Box LLMs by Ask­ing Un­re­lated Questions

28 Sep 2023 18:53 UTC
183 points
37 comments3 min readLW link

A Golden Age of Build­ing? Ex­cerpts and les­sons from Em­pire State, Pen­tagon, Skunk Works and SpaceX

jacobjacob1 Sep 2023 4:03 UTC
181 points
23 comments24 min readLW link

There should be more AI safety orgs

Marius Hobbhahn21 Sep 2023 14:53 UTC
175 points
25 comments17 min readLW link

De­fund­ing My Mistake

ymeskhout4 Sep 2023 14:43 UTC
167 points
41 comments6 min readLW link

The King and the Golem

Richard_Ngo25 Sep 2023 19:51 UTC
159 points
15 comments5 min readLW link
(narrativeark.substack.com)

Sparse Au­toen­coders Find Highly In­ter­pretable Direc­tions in Lan­guage Models

21 Sep 2023 15:30 UTC
156 points
7 comments5 min readLW link

Meta Ques­tions about Metaphilosophy

Wei Dai1 Sep 2023 1:17 UTC
148 points
78 comments3 min readLW link

“Di­a­mon­doid bac­te­ria” nanobots: deadly threat or dead-end? A nan­otech in­ves­ti­ga­tion

titotal29 Sep 2023 14:01 UTC
145 points
81 comments1 min readLW link
(titotal.substack.com)

One Minute Every Moment

abramdemski1 Sep 2023 20:23 UTC
125 points
23 comments3 min readLW link

Paper: LLMs trained on “A is B” fail to learn “B is A”

23 Sep 2023 19:55 UTC
120 points
73 comments4 min readLW link
(arxiv.org)

The small­est pos­si­ble but­ton (or: moth traps!)

Neil 2 Sep 2023 15:24 UTC
113 points
17 comments3 min readLW link
(neilwarren.substack.com)

In­ter­pret­ing OpenAI’s Whisper

EllenaR24 Sep 2023 17:53 UTC
112 points
10 comments7 min readLW link

Paper: On mea­sur­ing situ­a­tional aware­ness in LLMs

4 Sep 2023 12:54 UTC
106 points
16 comments5 min readLW link
(arxiv.org)

Ac­tAdd: Steer­ing Lan­guage Models with­out Optimization

6 Sep 2023 17:21 UTC
105 points
3 comments2 min readLW link
(arxiv.org)

PSA: The com­mu­nity is in Berkeley/​Oak­land, not “the Bay Area”

maia11 Sep 2023 15:59 UTC
103 points
7 comments1 min readLW link

Re­pro­duc­ing ARC Evals’ re­cent re­port on lan­guage model agents

Thomas Broadley1 Sep 2023 16:52 UTC
102 points
17 comments3 min readLW link
(thomasbroadley.com)

Co­hab­itive Games so Far

mako yass28 Sep 2023 15:41 UTC
102 points
116 comments19 min readLW link
(makopool.com)

Ex­plain­ing grokking through cir­cuit efficiency

8 Sep 2023 14:39 UTC
98 points
10 comments3 min readLW link
(arxiv.org)

Clos­ing Notes on Non­lin­ear Investigation

Ben Pace15 Sep 2023 22:44 UTC
97 points
47 comments11 min readLW link

“X dis­tracts from Y” as a thinly-dis­guised fight over group sta­tus /​ politics

Steven Byrnes25 Sep 2023 15:18 UTC
96 points
14 comments8 min readLW link

An­nounc­ing FAR Labs, an AI safety cowork­ing space

bgold29 Sep 2023 16:52 UTC
95 points
0 comments1 min readLW link

Atoms to Agents Proto-Lectures

johnswentworth22 Sep 2023 6:22 UTC
93 points
13 comments2 min readLW link
(www.youtube.com)

An­thropic’s Re­spon­si­ble Scal­ing Policy & Long-Term Benefit Trust

Zac Hatfield-Dodds19 Sep 2023 15:09 UTC
90 points
23 comments3 min readLW link
(www.anthropic.com)

AI #31: It Can Do What Now?

Zvi28 Sep 2023 16:00 UTC
90 points
6 comments40 min readLW link
(thezvi.wordpress.com)

Mak­ing AIs less likely to be spiteful

26 Sep 2023 14:12 UTC
89 points
2 comments10 min readLW link

Log­i­cal Share Splitting

DaemonicSigil11 Sep 2023 4:08 UTC
88 points
16 comments9 min readLW link
(pbement.com)

I com­piled a ebook of `Pro­ject Lawful` for eBook readers

OrwellGoesShopping15 Sep 2023 18:09 UTC
87 points
4 comments1 min readLW link
(www.mikescher.com)

High­lights: Went­worth, Shah, and Mur­phy on “Re­tar­get­ing the Search”

RobertM14 Sep 2023 2:18 UTC
85 points
4 comments8 min readLW link

Bench­marks for De­tect­ing Mea­sure­ment Tam­per­ing [Red­wood Re­search]

5 Sep 2023 16:44 UTC
84 points
15 comments20 min readLW link
(arxiv.org)

Nav­i­gat­ing an ecosys­tem that might or might not be bad for the world

15 Sep 2023 23:58 UTC
77 points
20 comments1 min readLW link

Me­mory band­width con­straints im­ply economies of scale in AI inference

Ege Erdil17 Sep 2023 14:01 UTC
76 points
33 comments4 min readLW link

[Question] How have you be­come more hard-work­ing?

Chi Nguyen25 Sep 2023 12:37 UTC
76 points
40 comments1 min readLW link

AI #30: Dalle-3 and GPT-3.5-In­struct-Turbo

Zvi21 Sep 2023 12:00 UTC
75 points
8 comments47 min readLW link
(thezvi.wordpress.com)

Text Posts from the Kids Group: 2023 I

jefftk5 Sep 2023 2:00 UTC
75 points
3 comments7 min readLW link
(www.jefftk.com)

Find Hot French Food Near Me: A Fol­low-up

aphyer6 Sep 2023 12:32 UTC
75 points
19 comments2 min readLW link

Luck based medicine: an­gry el­dritch sugar gods edition

Elizabeth19 Sep 2023 4:40 UTC
74 points
13 comments9 min readLW link
(acesounderglass.com)

[Question] How to talk about rea­sons why AGI might not be near?

Kaj_Sotala17 Sep 2023 8:18 UTC
73 points
19 comments2 min readLW link

A quick up­date from Nonlinear

KatWoods7 Sep 2023 21:28 UTC
72 points
23 comments2 min readLW link

Would You Work Harder In The Least Con­ve­nient Pos­si­ble World?

Firinn22 Sep 2023 5:17 UTC
69 points
93 comments9 min readLW link

High-level in­ter­pretabil­ity: de­tect­ing an AI’s objectives

28 Sep 2023 19:30 UTC
69 points
4 comments21 min readLW link

Con­tra Yud­kowsky on Epistemic Con­duct for Author Criticism

Zack_M_Davis13 Sep 2023 15:33 UTC
69 points
38 comments7 min readLW link

In­fluence func­tions—why, what and how

Nina Rimsky15 Sep 2023 20:42 UTC
69 points
6 comments8 min readLW link