SolidGoldMag­ikarp (plus, prompt gen­er­a­tion)

5 Feb 2023 22:02 UTC
646 points
194 comments12 min readLW link

The Waluigi Effect (mega-post)

Cleo Nardo3 Mar 2023 3:22 UTC
568 points
164 comments16 min readLW link

Bing Chat is blatantly, ag­gres­sively misaligned

evhub15 Feb 2023 5:29 UTC
390 points
163 comments2 min readLW link

Fo­cus on the places where you feel shocked ev­ery­one’s drop­ping the ball

So8res2 Feb 2023 0:27 UTC
364 points
55 comments4 min readLW link

How it feels to have your mind hacked by an AI

blaked12 Jan 2023 0:33 UTC
337 points
214 comments17 min readLW link

Not­ing an er­ror in Inad­e­quate Equilibria

Matthew Barnett8 Feb 2023 1:33 UTC
317 points
51 comments2 min readLW link

Shut­ting Down the Light­cone Offices

14 Mar 2023 22:47 UTC
315 points
85 comments17 min readLW link

Please don’t throw your mind away

TsviBT15 Feb 2023 21:41 UTC
307 points
41 comments18 min readLW link

Child­hoods of ex­cep­tional people

Henrik Karlsson6 Feb 2023 17:27 UTC
296 points
56 comments15 min readLW link
(escapingflatland.substack.com)

Cyborgism

10 Feb 2023 14:47 UTC
294 points
41 comments35 min readLW link

My Ob­jec­tions to “We’re All Gonna Die with Eliezer Yud­kowsky”

Quintin Pope21 Mar 2023 0:06 UTC
287 points
167 comments33 min readLW link

Un­der­stand­ing and con­trol­ling a maze-solv­ing policy network

11 Mar 2023 18:59 UTC
284 points
13 comments22 min readLW link

Fuck­ing God­damn Ba­sics of Ra­tion­al­ist Discourse

LoganStrohl4 Feb 2023 1:47 UTC
263 points
95 comments1 min readLW link

On not get­ting con­tam­i­nated by the wrong obe­sity ideas

Natália Coelho Mendonça28 Jan 2023 20:18 UTC
259 points
50 comments30 min readLW link

We don’t trade with ants

KatjaGrace10 Jan 2023 23:50 UTC
253 points
108 comments7 min readLW link
(worldspiritsockpuppet.com)

The Parable of the King and the Ran­dom Process

moridinamael1 Mar 2023 22:18 UTC
243 points
20 comments6 min readLW link

I hired 5 peo­ple to sit be­hind me and make me pro­duc­tive for a month

Simon Berens5 Feb 2023 1:19 UTC
239 points
81 comments10 min readLW link
(www.simonberens.com)

Ba­sics of Ra­tion­al­ist Discourse

Duncan_Sabien27 Jan 2023 2:40 UTC
228 points
178 comments36 min readLW link

Thoughts on the im­pact of RLHF research

paulfchristiano25 Jan 2023 17:23 UTC
227 points
101 comments9 min readLW link

My Model Of EA Burnout

LoganStrohl25 Jan 2023 17:52 UTC
223 points
48 comments5 min readLW link

You Don’t Ex­ist, Duncan

Duncan_Sabien2 Feb 2023 8:37 UTC
211 points
95 comments9 min readLW link

AGI in sight: our look at the game board

18 Feb 2023 22:17 UTC
209 points
131 comments6 min readLW link
(andreamiotti.substack.com)

“Care­fully Boot­strapped Align­ment” is or­ga­ni­za­tion­ally hard

Raemon17 Mar 2023 18:00 UTC
208 points
9 comments11 min readLW link

Re­cur­sive Mid­dle Man­ager Hell

Raemon1 Jan 2023 4:33 UTC
207 points
39 comments11 min readLW link

Dis­cus­sion with Nate Soares on a key al­ign­ment difficulty

HoldenKarnofsky13 Mar 2023 21:20 UTC
203 points
25 comments22 min readLW link

More in­for­ma­tion about the dan­ger­ous ca­pa­bil­ity eval­u­a­tions we did with GPT-4 and Claude.

Beth Barnes19 Mar 2023 0:25 UTC
202 points
29 comments8 min readLW link
(evals.alignment.org)

Ene­mies vs Malefactors

So8res28 Feb 2023 23:38 UTC
194 points
58 comments1 min readLW link

An AI risk ar­gu­ment that res­onates with NYTimes readers

Julian Bradshaw12 Mar 2023 23:09 UTC
192 points
13 comments1 min readLW link

Cog­ni­tive Emu­la­tion: A Naive AI Safety Proposal

25 Feb 2023 19:35 UTC
181 points
31 comments4 min readLW link

Deep Deceptiveness

So8res21 Mar 2023 2:51 UTC
181 points
27 comments14 min readLW link

Ele­ments of Ra­tion­al­ist Discourse

Rob Bensinger12 Feb 2023 7:58 UTC
179 points
35 comments3 min readLW link

An­thropic’s Core Views on AI Safety

Zac Hatfield-Dodds9 Mar 2023 16:55 UTC
178 points
38 comments2 min readLW link
(www.anthropic.com)

ChatGPT (and now GPT4) is very eas­ily dis­tracted from its rules

dmcs15 Mar 2023 17:55 UTC
174 points
36 comments1 min readLW link

AI al­ign­ment re­searchers don’t (seem to) stack

So8res21 Feb 2023 0:48 UTC
173 points
35 comments3 min readLW link

AI #1: Syd­ney and Bing

Zvi21 Feb 2023 14:00 UTC
169 points
44 comments61 min readLW link
(thezvi.wordpress.com)

Alexan­der and Yud­kowsky on AGI goals

24 Jan 2023 21:09 UTC
166 points
52 comments26 min readLW link

Ei­genKarma: trust at scale

Henrik Karlsson8 Feb 2023 18:52 UTC
161 points
45 comments5 min readLW link

My un­der­stand­ing of An­thropic strategy

Swimmer963 (Miranda Dixon-Luinenburg) 15 Feb 2023 1:56 UTC
161 points
28 comments4 min readLW link

Nat­u­ral Ab­strac­tions: Key claims, The­o­rems, and Critiques

16 Mar 2023 16:37 UTC
161 points
13 comments45 min readLW link

[Link] A com­mu­nity alert about Ziz

DanielFilan24 Feb 2023 0:06 UTC
158 points
120 comments2 min readLW link
(medium.com)

$20 Million in NSF Grants for Safety Research

Dan H28 Feb 2023 4:44 UTC
153 points
12 comments1 min readLW link

Big Mac Sub­sidy?

jefftk23 Feb 2023 4:00 UTC
151 points
24 comments2 min readLW link
(www.jefftk.com)

GPT-4

nz14 Mar 2023 17:02 UTC
151 points
140 comments1 min readLW link
(openai.com)

What a com­pute-cen­tric frame­work says about AI take­off speeds—draft report

Tom Davidson23 Jan 2023 4:02 UTC
149 points
24 comments16 min readLW link

Why Are Bac­te­ria So Sim­ple?

aysja6 Feb 2023 3:00 UTC
146 points
28 comments10 min readLW link

Gra­di­ent hack­ing is ex­tremely difficult

beren24 Jan 2023 15:45 UTC
145 points
18 comments5 min readLW link

Sapir-Whorf for Rationalists

Duncan_Sabien25 Jan 2023 7:58 UTC
145 points
47 comments19 min readLW link

Com­ments on OpenAI’s “Plan­ning for AGI and be­yond”

So8res3 Mar 2023 23:01 UTC
145 points
2 comments14 min readLW link

Para­met­ri­cally re­tar­getable de­ci­sion-mak­ers tend to seek power

TurnTrout18 Feb 2023 18:41 UTC
143 points
6 comments2 min readLW link
(arxiv.org)

Acausal normalcy

Andrew_Critch3 Mar 2023 23:34 UTC
143 points
28 comments8 min readLW link