How much do you be­lieve your re­sults?

Eric Neyman6 May 2023 20:31 UTC
515 points
18 comments15 min readLW link4 reviews
(ericneyman.wordpress.com)

Steer­ing GPT-2-XL by adding an ac­ti­va­tion vector

13 May 2023 18:42 UTC
439 points
98 comments50 min readLW link1 review

State­ment on AI Ex­tinc­tion—Signed by AGI Labs, Top Aca­demics, and Many Other Notable Figures

Dan H30 May 2023 9:05 UTC
382 points
78 comments1 min readLW link1 review
(www.safe.ai)

How to have Poly­geni­cally Screened Children

GeneSmith7 May 2023 16:01 UTC
368 points
128 comments27 min readLW link1 review

Book Re­view: How Minds Change

bc4026bd4aaa5b7fe25 May 2023 17:55 UTC
327 points
53 comments15 min readLW link

Pre­dictable up­dat­ing about AI risk

Joe Carlsmith8 May 2023 21:53 UTC
295 points
25 comments36 min readLW link1 review

My May 2023 pri­ori­ties for AI x-safety: more em­pa­thy, more unifi­ca­tion of con­cerns, and less vil­ifi­ca­tion of OpenAI

Andrew_Critch24 May 2023 0:02 UTC
268 points
39 comments8 min readLW link

Men­tal Health and the Align­ment Prob­lem: A Com­pila­tion of Re­sources (up­dated April 2023)

10 May 2023 19:04 UTC
266 points
54 comments21 min readLW link

An­nounc­ing Apollo Research

30 May 2023 16:17 UTC
217 points
11 comments8 min readLW link

Twiblings, four-par­ent ba­bies and other re­pro­duc­tive technology

GeneSmith20 May 2023 17:11 UTC
192 points
33 comments6 min readLW link

When is Good­hart catas­trophic?

9 May 2023 3:59 UTC
180 points
30 comments8 min readLW link1 review

De­ci­sion The­ory with the Magic Parts Highlighted

moridinamael16 May 2023 17:39 UTC
175 points
24 comments5 min readLW link

Prizes for ma­trix com­ple­tion problems

paulfchristiano3 May 2023 23:30 UTC
164 points
52 comments1 min readLW link
(www.alignment.org)

Con­jec­ture in­ter­nal sur­vey: AGI timelines and prob­a­bil­ity of hu­man ex­tinc­tion from ad­vanced AI

Maris Sala22 May 2023 14:31 UTC
155 points
5 comments3 min readLW link
(www.conjecture.dev)

Re­quest: stop ad­vanc­ing AI capabilities

So8res26 May 2023 17:42 UTC
154 points
24 comments1 min readLW link

Ad­vice for newly busy people

Severin T. Seehrich11 May 2023 16:46 UTC
149 points
3 comments5 min readLW link

New User’s Guide to LessWrong

Ruby17 May 2023 0:55 UTC
149 points
59 comments11 min readLW link1 review

Dark For­est Theories

Raemon12 May 2023 20:21 UTC
148 points
54 comments2 min readLW link2 reviews

A brief col­lec­tion of Hin­ton’s re­cent com­ments on AGI risk

Kaj_Sotala4 May 2023 23:31 UTC
148 points
9 comments11 min readLW link

Sen­tience matters

So8res29 May 2023 21:25 UTC
144 points
96 comments2 min readLW link

Clar­ify­ing and pre­dict­ing AGI

Richard_Ngo4 May 2023 15:55 UTC
142 points
45 comments4 min readLW link

LeCun’s “A Path Towards Au­tonomous Ma­chine In­tel­li­gence” has an un­solved tech­ni­cal al­ign­ment problem

Steven Byrnes8 May 2023 19:35 UTC
140 points
37 comments15 min readLW link

Trust de­vel­ops grad­u­ally via mak­ing bids and set­ting boundaries

Richard_Ngo19 May 2023 22:16 UTC
135 points
12 comments4 min readLW link

From fear to excitement

Richard_Ngo15 May 2023 6:23 UTC
134 points
9 comments3 min readLW link

AGI safety ca­reer advice

Richard_Ngo2 May 2023 7:36 UTC
132 points
24 comments13 min readLW link

Some back­ground for rea­son­ing about dual-use al­ign­ment research

Charlie Steiner18 May 2023 14:50 UTC
126 points
22 comments9 min readLW link1 review

Who reg­u­lates the reg­u­la­tors? We need to go be­yond the re­view-and-ap­proval paradigm

jasoncrawford4 May 2023 22:11 UTC
122 points
29 comments13 min readLW link
(rootsofprogress.org)

Solv­ing the Mechanis­tic In­ter­pretabil­ity challenges: EIS VII Challenge 1

9 May 2023 19:41 UTC
119 points
1 comment10 min readLW link

In­ves­ti­gat­ing Fabrication

LoganStrohl18 May 2023 17:46 UTC
113 points
14 comments16 min readLW link

Ret­ro­spec­tive: Les­sons from the Failed Align­ment Startup AISafety.com

Søren Elverlin12 May 2023 18:07 UTC
105 points
9 comments3 min readLW link

AI Safety in China: Part 2

Lao Mein22 May 2023 14:50 UTC
103 points
28 comments2 min readLW link

Bayesian Net­works Aren’t Ne­c­es­sar­ily Causal

Zack_M_Davis14 May 2023 1:42 UTC
103 points
38 comments8 min readLW link1 review

Open Thread With Ex­per­i­men­tal Fea­ture: Reactions

jimrandomh24 May 2023 16:46 UTC
101 points
189 comments3 min readLW link

A Case for the Least For­giv­ing Take On Alignment

Thane Ruthenis2 May 2023 21:34 UTC
100 points
85 comments22 min readLW link

Ge­off Hin­ton Quits Google

Adam Shai1 May 2023 21:03 UTC
98 points
14 comments1 min readLW link

Judg­ments of­ten smug­gle in im­plicit standards

Richard_Ngo15 May 2023 18:50 UTC
97 points
4 comments3 min readLW link

Shah (Deep­Mind) and Leahy (Con­jec­ture) Dis­cuss Align­ment Cruxes

1 May 2023 16:47 UTC
96 points
10 comments30 min readLW link

Most peo­ple should prob­a­bly feel safe most of the time

Kaj_Sotala9 May 2023 9:35 UTC
95 points
28 comments10 min readLW link

What if they gave an In­dus­trial Revolu­tion and no­body came?

jasoncrawford17 May 2023 19:41 UTC
94 points
10 comments19 min readLW link
(rootsofprogress.org)

Deep­Mind: Model eval­u­a­tion for ex­treme risks

Zach Stein-Perlman25 May 2023 3:00 UTC
94 points
12 comments1 min readLW link1 review
(arxiv.org)

Why are we so com­pla­cent about AI hell?

Dawn Drescher11 May 2023 9:19 UTC
94 points
101 comments1 min readLW link

In­put Swap Graphs: Dis­cov­er­ing the role of neu­ral net­work com­po­nents at scale

Alexandre Variengien12 May 2023 9:41 UTC
92 points
0 comments33 min readLW link

An Anal­ogy for Un­der­stand­ing Transformers

CallumMcDougall13 May 2023 12:20 UTC
92 points
6 comments9 min readLW link

Yoshua Ben­gio: How Rogue AIs may Arise

harfe23 May 2023 18:28 UTC
92 points
12 comments18 min readLW link
(yoshuabengio.org)

An ar­tifi­cially struc­tured ar­gu­ment for ex­pect­ing AGI ruin

Rob Bensinger7 May 2023 21:52 UTC
91 points
26 comments19 min readLW link

Co­er­cion is an adap­ta­tion to scarcity; trust is an adap­ta­tion to abundance

Richard_Ngo23 May 2023 18:14 UTC
90 points
11 comments4 min readLW link

The bul­ls­eye frame­work: My case against AI doom

titotal30 May 2023 11:52 UTC
89 points
35 comments17 min readLW link

How I ap­ply (so-called) Non-Violent Communication

Kaj_Sotala15 May 2023 9:56 UTC
89 points
28 comments3 min readLW link

LessWrong Com­mu­nity Week­end 2023 [Ap­pli­ca­tions now closed]

Henry Prowbell1 May 2023 9:08 UTC
89 points
0 comments6 min readLW link

Re­acts now en­abled on 100% of posts, though still just ex­per­i­ment­ing

Ruby28 May 2023 5:36 UTC
88 points
73 comments2 min readLW link