The Talk: a brief ex­pla­na­tion of sex­ual dimorphism

Malmesbury18 Sep 2023 16:23 UTC
540 points
77 comments16 min readLW link3 reviews

In­side Views, Im­pos­tor Syn­drome, and the Great LARP

johnswentworth25 Sep 2023 16:08 UTC
338 points
53 comments5 min readLW link

EA Ve­gan Ad­vo­cacy is not truth­seek­ing, and it’s ev­ery­one’s problem

Elizabeth28 Sep 2023 23:30 UTC
326 points
250 comments22 min readLW link2 reviews
(acesounderglass.com)

Shar­ing In­for­ma­tion About Nonlinear

Ben Pace7 Sep 2023 6:51 UTC
323 points
323 comments34 min readLW link

Sum-thresh­old attacks

TsviBT8 Sep 2023 17:13 UTC
250 points
55 comments10 min readLW link
(tsvibt.blogspot.com)

UDT shows that de­ci­sion the­ory is more puz­zling than ever

Wei Dai13 Sep 2023 12:26 UTC
230 points
56 comments1 min readLW link

AI pres­i­dents dis­cuss AI al­ign­ment agendas

9 Sep 2023 18:55 UTC
222 points
23 comments1 min readLW link
(www.youtube.com)

What I would do if I wasn’t at ARC Evals

LawrenceC5 Sep 2023 19:19 UTC
220 points
10 comments13 min readLW link1 review

The King and the Golem

Richard_Ngo25 Sep 2023 19:51 UTC
205 points
19 comments5 min readLW link1 review
(narrativeark.substack.com)

A Golden Age of Build­ing? Ex­cerpts and les­sons from Em­pire State, Pen­tagon, Skunk Works and SpaceX

Bird Concept1 Sep 2023 4:03 UTC
188 points
26 comments24 min readLW link1 review

De­fund­ing My Mistake

ymeskhout4 Sep 2023 14:43 UTC
188 points
41 comments6 min readLW link

How to Catch an AI Liar: Lie De­tec­tion in Black-Box LLMs by Ask­ing Un­re­lated Questions

28 Sep 2023 18:53 UTC
187 points
39 comments3 min readLW link1 review

There should be more AI safety orgs

Marius Hobbhahn21 Sep 2023 14:53 UTC
182 points
25 comments17 min readLW link

Meta Ques­tions about Metaphilosophy

Wei Dai1 Sep 2023 1:17 UTC
164 points
80 comments3 min readLW link

“Di­a­mon­doid bac­te­ria” nanobots: deadly threat or dead-end? A nan­otech in­ves­ti­ga­tion

titotal29 Sep 2023 14:01 UTC
161 points
79 comments20 min readLW link
(titotal.substack.com)

Sparse Au­toen­coders Find Highly In­ter­pretable Direc­tions in Lan­guage Models

21 Sep 2023 15:30 UTC
159 points
8 comments5 min readLW link

Co­hab­itive Games so Far

mako yass28 Sep 2023 15:41 UTC
140 points
146 comments19 min readLW link2 reviews
(makopool.com)

The small­est pos­si­ble but­ton (or: moth traps!)

Neil 2 Sep 2023 15:24 UTC
126 points
18 comments3 min readLW link
(neilwarren.substack.com)

One Minute Every Moment

abramdemski1 Sep 2023 20:23 UTC
126 points
24 comments3 min readLW link

Paper: LLMs trained on “A is B” fail to learn “B is A”

23 Sep 2023 19:55 UTC
125 points
74 comments4 min readLW link
(arxiv.org)

Mak­ing AIs less likely to be spiteful

26 Sep 2023 14:12 UTC
118 points
7 comments10 min readLW link

In­ter­pret­ing OpenAI’s Whisper

EllenaR24 Sep 2023 17:53 UTC
116 points
13 comments7 min readLW link

“X dis­tracts from Y” as a thinly-dis­guised fight over group sta­tus /​ politics

Steven Byrnes25 Sep 2023 15:18 UTC
112 points
14 comments8 min readLW link

Would You Work Harder In The Least Con­ve­nient Pos­si­ble World?

Firinn22 Sep 2023 5:17 UTC
109 points
100 comments9 min readLW link2 reviews

Paper: On mea­sur­ing situ­a­tional aware­ness in LLMs

4 Sep 2023 12:54 UTC
109 points
17 comments5 min readLW link
(arxiv.org)

PSA: The com­mu­nity is in Berkeley/​Oak­land, not “the Bay Area”

maia11 Sep 2023 15:59 UTC
108 points
7 comments1 min readLW link

Ac­tAdd: Steer­ing Lan­guage Models with­out Optimization

6 Sep 2023 17:21 UTC
105 points
3 comments2 min readLW link
(arxiv.org)

Re­pro­duc­ing ARC Evals’ re­cent re­port on lan­guage model agents

Thomas Broadley1 Sep 2023 16:52 UTC
104 points
17 comments3 min readLW link
(thomasbroadley.com)

Ex­plain­ing grokking through cir­cuit efficiency

8 Sep 2023 14:39 UTC
102 points
11 comments3 min readLW link
(arxiv.org)

Atoms to Agents Proto-Lectures

johnswentworth22 Sep 2023 6:22 UTC
97 points
14 comments2 min readLW link
(www.youtube.com)

Clos­ing Notes on Non­lin­ear Investigation

Ben Pace15 Sep 2023 22:44 UTC
97 points
47 comments11 min readLW link

An­nounc­ing FAR Labs, an AI safety cowork­ing space

Ben Goldhaber29 Sep 2023 16:52 UTC
95 points
0 comments1 min readLW link

Bench­marks for De­tect­ing Mea­sure­ment Tam­per­ing [Red­wood Re­search]

5 Sep 2023 16:44 UTC
94 points
22 comments20 min readLW link1 review
(arxiv.org)

Log­i­cal Share Splitting

DaemonicSigil11 Sep 2023 4:08 UTC
93 points
16 comments9 min readLW link
(pbement.com)

I com­piled a ebook of `Pro­ject Lawful` for eBook readers

OrwellGoesShopping15 Sep 2023 18:09 UTC
93 points
4 comments1 min readLW link
(www.mikescher.com)

AI #31: It Can Do What Now?

Zvi28 Sep 2023 16:00 UTC
90 points
6 comments40 min readLW link
(thezvi.wordpress.com)

High­lights: Went­worth, Shah, and Mur­phy on “Re­tar­get­ing the Search”

RobertM14 Sep 2023 2:18 UTC
87 points
4 comments8 min readLW link

An­thropic’s Re­spon­si­ble Scal­ing Policy & Long-Term Benefit Trust

Zac Hatfield-Dodds19 Sep 2023 15:09 UTC
85 points
26 comments3 min readLW link1 review
(www.anthropic.com)

[Question] How have you be­come more hard-work­ing?

Chi Nguyen25 Sep 2023 12:37 UTC
84 points
42 comments1 min readLW link

Nav­i­gat­ing an ecosys­tem that might or might not be bad for the world

15 Sep 2023 23:58 UTC
79 points
20 comments1 min readLW link

Me­mory band­width con­straints im­ply economies of scale in AI inference

Ege Erdil17 Sep 2023 14:01 UTC
79 points
34 comments4 min readLW link

In­fluence func­tions—why, what and how

Nina Panickssery15 Sep 2023 20:42 UTC
77 points
6 comments8 min readLW link

Find Hot French Food Near Me: A Fol­low-up

aphyer6 Sep 2023 12:32 UTC
77 points
19 comments2 min readLW link

Have At­ten­tion Spans Been De­clin­ing?

niplav8 Sep 2023 14:11 UTC
75 points
23 comments17 min readLW link1 review

Text Posts from the Kids Group: 2023 I

jefftk5 Sep 2023 2:00 UTC
75 points
3 comments7 min readLW link
(www.jefftk.com)

AI #30: Dalle-3 and GPT-3.5-In­struct-Turbo

Zvi21 Sep 2023 12:00 UTC
75 points
8 comments47 min readLW link
(thezvi.wordpress.com)

Luck based medicine: an­gry el­dritch sugar gods edition

Elizabeth19 Sep 2023 4:40 UTC
75 points
14 comments9 min readLW link
(acesounderglass.com)

[Question] How to talk about rea­sons why AGI might not be near?

Kaj_Sotala17 Sep 2023 8:18 UTC
73 points
19 comments2 min readLW link

List of how peo­ple have be­come more hard-working

Chi Nguyen29 Sep 2023 11:30 UTC
72 points
7 comments3 min readLW link

High-level in­ter­pretabil­ity: de­tect­ing an AI’s objectives

28 Sep 2023 19:30 UTC
72 points
4 comments21 min readLW link