SolidGoldMag­ikarp (plus, prompt gen­er­a­tion)

5 Feb 2023 22:02 UTC
663 points
204 comments12 min readLW link

Fo­cus on the places where you feel shocked ev­ery­one’s drop­ping the ball

So8res2 Feb 2023 0:27 UTC
413 points
61 comments4 min readLW link

Bing Chat is blatantly, ag­gres­sively misaligned

evhub15 Feb 2023 5:29 UTC
396 points
168 comments2 min readLW link

Not­ing an er­ror in Inad­e­quate Equilibria

Matthew Barnett8 Feb 2023 1:33 UTC
359 points
56 comments2 min readLW link

Please don’t throw your mind away

TsviBT15 Feb 2023 21:41 UTC
336 points
44 comments18 min readLW link

Cyborgism

10 Feb 2023 14:47 UTC
333 points
45 comments35 min readLW link

Child­hoods of ex­cep­tional people

Henrik Karlsson6 Feb 2023 17:27 UTC
324 points
62 comments15 min readLW link
(escapingflatland.substack.com)

Fuck­ing God­damn Ba­sics of Ra­tion­al­ist Discourse

LoganStrohl4 Feb 2023 1:47 UTC
301 points
97 comments1 min readLW link

I hired 5 peo­ple to sit be­hind me and make me pro­duc­tive for a month

Simon Berens5 Feb 2023 1:19 UTC
243 points
81 comments10 min readLW link
(www.simonberens.com)

AGI in sight: our look at the game board

18 Feb 2023 22:17 UTC
227 points
135 comments6 min readLW link
(andreamiotti.substack.com)

You Don’t Ex­ist, Duncan

[DEACTIVATED] Duncan Sabien2 Feb 2023 8:37 UTC
227 points
107 comments9 min readLW link

Ele­ments of Ra­tion­al­ist Discourse

Rob Bensinger12 Feb 2023 7:58 UTC
215 points
47 comments3 min readLW link

Cog­ni­tive Emu­la­tion: A Naive AI Safety Proposal

25 Feb 2023 19:35 UTC
197 points
45 comments4 min readLW link

AI al­ign­ment re­searchers don’t (seem to) stack

So8res21 Feb 2023 0:48 UTC
188 points
40 comments3 min readLW link

Ei­genKarma: trust at scale

Henrik Karlsson8 Feb 2023 18:52 UTC
182 points
50 comments5 min readLW link

Why Are Bac­te­ria So Sim­ple?

aysja6 Feb 2023 3:00 UTC
171 points
33 comments10 min readLW link

AI #1: Syd­ney and Bing

Zvi21 Feb 2023 14:00 UTC
170 points
44 comments61 min readLW link
(thezvi.wordpress.com)

Para­met­ri­cally re­tar­getable de­ci­sion-mak­ers tend to seek power

TurnTrout18 Feb 2023 18:41 UTC
166 points
9 comments2 min readLW link
(arxiv.org)

My un­der­stand­ing of An­thropic strategy

Swimmer963 (Miranda Dixon-Luinenburg) 15 Feb 2023 1:56 UTC
165 points
31 comments4 min readLW link

[Link] A com­mu­nity alert about Ziz

DanielFilan24 Feb 2023 0:06 UTC
163 points
124 comments2 min readLW link
(medium.com)

Big Mac Sub­sidy?

jefftk23 Feb 2023 4:00 UTC
154 points
24 comments2 min readLW link
(www.jefftk.com)

We Found An Neu­ron in GPT-2

11 Feb 2023 18:27 UTC
141 points
22 comments7 min readLW link
(clementneo.com)

Stop post­ing prompt in­jec­tions on Twit­ter and call­ing it “mis­al­ign­ment”

lc19 Feb 2023 2:21 UTC
138 points
9 comments1 min readLW link

Full Tran­script: Eliezer Yud­kowsky on the Ban­kless podcast

23 Feb 2023 12:34 UTC
138 points
89 comments75 min readLW link

Ano­ma­lous to­kens re­veal the origi­nal iden­tities of In­struct models

9 Feb 2023 1:30 UTC
136 points
16 comments9 min readLW link
(generative.ink)

Mo­dal Fix­point Co­op­er­a­tion with­out Löb’s Theorem

Andrew_Critch5 Feb 2023 0:58 UTC
133 points
32 comments3 min readLW link

Pre­train­ing Lan­guage Models with Hu­man Preferences

21 Feb 2023 17:57 UTC
133 points
18 comments11 min readLW link

“Ra­tion­al­ist Dis­course” Is Like “Physi­cist Mo­tors”

Zack_M_Davis26 Feb 2023 5:58 UTC
131 points
152 comments9 min readLW link

Eval­u­a­tions (of new AI Safety re­searchers) can be noisy

LawrenceC5 Feb 2023 4:15 UTC
130 points
10 comments16 min readLW link

Hash­ing out long-stand­ing dis­agree­ments seems low-value to me

So8res16 Feb 2023 6:20 UTC
130 points
34 comments4 min readLW link

Recom­men­da­tion: Bug Boun­ties and Re­spon­si­ble Dis­clo­sure for Ad­vanced ML Systems

Vaniver17 Feb 2023 20:11 UTC
124 points
11 comments2 min readLW link

There are (prob­a­bly) no su­per­hu­man Go AIs: strong hu­man play­ers beat the strongest AIs

Taran19 Feb 2023 12:25 UTC
123 points
33 comments4 min readLW link

In Defense of Chat­bot Romance

Kaj_Sotala11 Feb 2023 14:30 UTC
123 points
52 comments11 min readLW link
(kajsotala.fi)

A pro­posed method for fore­cast­ing trans­for­ma­tive AI

Matthew Barnett10 Feb 2023 19:34 UTC
121 points
20 comments10 min readLW link

There are no co­her­ence theorems

20 Feb 2023 21:25 UTC
121 points
114 comments19 min readLW link

One-layer trans­form­ers aren’t equiv­a­lent to a set of skip-trigrams

Buck17 Feb 2023 17:26 UTC
119 points
10 comments7 min readLW link

GPT-175bee

8 Feb 2023 18:58 UTC
119 points
13 comments1 min readLW link

On In­ves­ti­gat­ing Con­spir­acy Theories

Zvi20 Feb 2023 12:50 UTC
115 points
38 comments5 min readLW link
(thezvi.wordpress.com)

The pub­lic sup­ports reg­u­lat­ing AI for safety

Zach Stein-Perlman17 Feb 2023 4:10 UTC
114 points
9 comments1 min readLW link
(aiimpacts.org)

The Open Agency Model

Eric Drexler22 Feb 2023 10:35 UTC
113 points
18 comments4 min readLW link

Bing chat is the AI fire alarm

Ratios17 Feb 2023 6:51 UTC
112 points
62 comments3 min readLW link

GPT-4 Predictions

Stephen McAleese17 Feb 2023 23:20 UTC
109 points
27 comments11 min readLW link

SolidGoldMag­ikarp II: tech­ni­cal de­tails and more re­cent findings

6 Feb 2023 19:09 UTC
109 points
45 comments13 min readLW link

A Way To Be Okay

[DEACTIVATED] Duncan Sabien19 Feb 2023 20:27 UTC
107 points
36 comments10 min readLW link

Con­flict The­ory of Bounded Distrust

Zack_M_Davis12 Feb 2023 5:30 UTC
106 points
29 comments3 min readLW link

I don’t think MIRI “gave up”

Raemon3 Feb 2023 0:26 UTC
105 points
64 comments4 min readLW link

Sam Alt­man: “Plan­ning for AGI and be­yond”

LawrenceC24 Feb 2023 20:28 UTC
104 points
54 comments6 min readLW link
(openai.com)

Cy­borg Pe­ri­ods: There will be mul­ti­ple AI transitions

22 Feb 2023 16:09 UTC
103 points
9 comments6 min readLW link

Don’t ac­cel­er­ate prob­lems you’re try­ing to solve

15 Feb 2023 18:11 UTC
100 points
26 comments4 min readLW link

H5N1

Zvi13 Feb 2023 12:50 UTC
100 points
1 comment9 min readLW link
(thezvi.wordpress.com)