List of how peo­ple have be­come more hard-working

Chi NguyenSep 29, 2023, 11:30 AM
69 points
7 commentsLW link

Con­tra Yud­kowsky on Epistemic Con­duct for Author Criticism

Zack_M_DavisSep 13, 2023, 3:33 PM
69 points
38 comments7 min readLW link

Can I take ducks home from the park?

dynomightSep 14, 2023, 9:03 PM
67 points
8 comments3 min readLW link
(dynomight.net)

[Link post] Michael Niel­sen’s “Notes on Ex­is­ten­tial Risk from Ar­tifi­cial Su­per­in­tel­li­gence”

Joel BeckerSep 19, 2023, 1:31 PM
67 points
12 commentsLW link
(michaelnotebook.com)

If in­fluence func­tions are not ap­prox­i­mat­ing leave-one-out, how are they sup­posed to help?

Fabien RogerSep 22, 2023, 2:23 PM
66 points
5 comments3 min readLW link

Petrov Day Ret­ro­spec­tive, 2023 (re: the most im­por­tant virtue of Petrov Day & unilat­er­ally pro­mot­ing it)

RubySep 28, 2023, 2:48 AM
66 points
73 comments6 min readLW link

GPT-4 for per­sonal pro­duc­tivity: on­line dis­trac­tion blocker

SergiiSep 26, 2023, 5:41 PM
65 points
13 comments2 min readLW link
(grgv.xyz)

AI #29: Take a Deep Breath

ZviSep 14, 2023, 12:00 PM
65 points
21 comments21 min readLW link
(thezvi.wordpress.com)

a rant on poli­ti­cian-en­g­ineer coal­i­tional conflict

bhauthSep 4, 2023, 5:15 PM
64 points
12 comments4 min readLW link

Un­der­stand­ing strate­gic de­cep­tion and de­cep­tive alignment

Sep 25, 2023, 4:27 PM
64 points
16 comments7 min readLW link
(www.apolloresearch.ai)

In­ter­pretabil­ity Ex­ter­nal­ities Case Study—Hun­gry Hun­gry Hippos

Magdalena WacheSep 20, 2023, 2:42 PM
64 points
22 comments2 min readLW link

Eu­gen­ics Performed By A Blind, Idiot God

omnizoidSep 17, 2023, 8:37 PM
63 points
11 comments2 min readLW link

In­stru­men­tal Con­ver­gence Bounty

Logan ZoellnerSep 14, 2023, 2:02 PM
62 points
24 comments1 min readLW link

Linkpost for Jan Leike on Self-Exfiltration

Daniel KokotajloSep 13, 2023, 9:23 PM
59 points
1 comment2 min readLW link
(aligned.substack.com)

Image Hi­jacks: Ad­ver­sar­ial Images can Con­trol Gen­er­a­tive Models at Runtime

Sep 20, 2023, 3:23 PM
58 points
9 comments1 min readLW link
(arxiv.org)

Bids To Defer On Value Judgements

johnswentworthSep 29, 2023, 5:07 PM
58 points
6 comments3 min readLW link

Protest against Meta’s ir­re­versible pro­lifer­a­tion (Sept 29, San Fran­cisco)

Holly_ElmoreSep 19, 2023, 11:40 PM
54 points
33 commentsLW link

Some rea­sons why I fre­quently pre­fer com­mu­ni­cat­ing via text

Adam ZernerSep 18, 2023, 9:50 PM
53 points
18 comments2 min readLW link

AI#28: Watch­ing and Waiting

ZviSep 7, 2023, 5:20 PM
52 points
14 comments45 min readLW link
(thezvi.wordpress.com)

Who Has the Best Food?

ZviSep 5, 2023, 1:40 PM
52 points
61 comments10 min readLW link
(thezvi.wordpress.com)

The point of a game is not to win, and you shouldn’t even pre­tend that it is

mako yassSep 28, 2023, 3:54 PM
51 points
27 comments4 min readLW link
(makopool.com)

Is AI Safety drop­ping the ball on pri­vacy?

markovSep 13, 2023, 1:07 PM
50 points
17 comments7 min readLW link

Ba­sic Math­e­mat­ics of Pre­dic­tive Coding

Adam ShaiSep 29, 2023, 2:38 PM
49 points
6 comments9 min readLW link

Com­pet­i­tive, Co­op­er­a­tive, and Cohabitive

ScrewtapeSep 28, 2023, 11:25 PM
49 points
13 comments5 min readLW link1 review

Fund Tran­sit With Development

jefftk22 Sep 2023 11:10 UTC
47 points
22 comments3 min readLW link
(www.jefftk.com)

Three ways in­ter­pretabil­ity could be impactful

Arthur Conmy18 Sep 2023 1:02 UTC
47 points
8 comments4 min readLW link

Im­mor­tal­ity or death by AGI

ImmortalityOrDeathByAGI21 Sep 2023 23:59 UTC
47 points
30 comments4 min readLW link
(forum.effectivealtruism.org)

Telopheme, telophore, and telotect

TsviBT17 Sep 2023 16:24 UTC
46 points
7 comments8 min readLW link

The goal of physics

Jim Pivarski2 Sep 2023 23:08 UTC
46 points
4 comments5 min readLW link

Feed­back-loops, De­liber­ate Prac­tice, and Trans­fer Learning

7 Sep 2023 1:57 UTC
46 points
5 comments1 min readLW link

[Question] Where might I di­rect promis­ing-to-me re­searchers to ap­ply for al­ign­ment jobs/​grants?

abramdemski18 Sep 2023 16:20 UTC
45 points
10 comments1 min readLW link

Ja­cob on the Precipice

Richard_Ngo26 Sep 2023 21:16 UTC
45 points
8 comments11 min readLW link
(narrativeark.substack.com)

Ama­zon to in­vest up to $4 billion in Anthropic

Davis_Kingsley25 Sep 2023 14:55 UTC
44 points
8 commentsLW link
(twitter.com)

Com­mon­sense Good, Creative Good

jefftk27 Sep 2023 19:50 UTC
44 points
11 comments3 min readLW link
(www.jefftk.com)

Re­cre­at­ing the car­ing drive

Catnee7 Sep 2023 10:41 UTC
43 points
15 comments10 min readLW link1 review

Sparse Cod­ing, for Mechanis­tic In­ter­pretabil­ity and Ac­ti­va­tion Engineering

David Udell23 Sep 2023 19:16 UTC
42 points
7 comments34 min readLW link

Fo­cus on the Hardest Part First

Johannes C. Mayer11 Sep 2023 7:53 UTC
42 points
13 comments1 min readLW link

De­con­fus­ing Regret

Alex Hollow15 Sep 2023 11:52 UTC
41 points
32 comments2 min readLW link

Tech­ni­cal AI Safety Re­search Land­scape [Slides]

Magdalena Wache18 Sep 2023 13:56 UTC
41 points
0 comments4 min readLW link

What is the op­ti­mal fron­tier for due dili­gence?

8 Sep 2023 18:20 UTC
41 points
1 comment1 min readLW link

[Question] Strongest real-world ex­am­ples sup­port­ing AI risk claims?

rosehadshar5 Sep 2023 15:12 UTC
41 points
7 comments1 min readLW link

ARC Evals: Re­spon­si­ble Scal­ing Policies

Zach Stein-Perlman28 Sep 2023 4:30 UTC
40 points
10 comments2 min readLW link1 review
(evals.alignment.org)

Reflex­ive de­ci­sion the­ory is an un­solved problem

Richard_Kennaway17 Sep 2023 14:15 UTC
40 points
27 comments4 min readLW link

Luck based medicine: in­os­i­tol for anx­iety and brain fog

Elizabeth22 Sep 2023 20:10 UTC
40 points
5 comments3 min readLW link
(acesounderglass.com)

De­bate se­ries: should we push for a pause on the de­vel­op­ment of AI?

Xodarap8 Sep 2023 16:29 UTC
39 points
1 commentLW link

Startup Roundup #1: Happy Demo Day

Zvi12 Sep 2023 13:20 UTC
38 points
5 comments15 min readLW link
(thezvi.wordpress.com)

I de­signed an AI safety course (for a philos­o­phy de­part­ment)

Eleni Angelou23 Sep 2023 22:03 UTC
37 points
15 comments2 min readLW link

A The­ory of Laugh­ter—Fol­low-Up

Steven Byrnes14 Sep 2023 15:35 UTC
37 points
3 comments8 min readLW link

Ac­tu­ally, “per­sonal at­tacks af­ter ob­ject-level ar­gu­ments” is a pretty good rule of epistemic conduct

Max H17 Sep 2023 20:25 UTC
37 points
15 comments7 min readLW link

Align­ment Work­shop talks

Richard_Ngo28 Sep 2023 18:26 UTC
37 points
1 comment1 min readLW link
(www.alignment-workshop.com)