5 Feb 2023 22:02 UTC

675 points

208 comments12 min readLW link 1 review

Focus on the places where you feel shocked everyone’s dropping the ball

So8res2 Feb 2023 0:27 UTC

475 points

65 comments4 min readLW link 3 reviews

Bing Chat is blatantly, aggressively misaligned

evhub15 Feb 2023 5:29 UTC

407 points

181 comments2 min readLW link 1 review

Please don’t throw your mind away

TsviBT15 Feb 2023 21:41 UTC

392 points

49 comments18 min readLW link 1 review

Noting an error in Inadequate Equilibria

Matthew Barnett8 Feb 2023 1:33 UTC

376 points

60 comments2 min readLW link 2 reviews

Fucking Goddamn Basics of Rationalist Discourse

LoganStrohl4 Feb 2023 1:47 UTC

358 points

104 comments1 min readLW link 3 reviews

Childhoods of exceptional people

Henrik Karlsson6 Feb 2023 17:27 UTC

350 points

62 comments15 min readLW link 1 review

(escapingflatland.substack.com)

Cyborgism

NicholasKees and janus

10 Feb 2023 14:47 UTC

335 points

47 comments35 min readLW link 2 reviews

You Don’t Exist, Duncan

Duncan Sabien (Inactive)2 Feb 2023 8:37 UTC

258 points

108 comments9 min readLW link

I hired 5 people to sit behind me and make me productive for a month

Simon Berens5 Feb 2023 1:19 UTC

253 points

83 comments10 min readLW link

(www.simonberens.com)

AGI in sight: our look at the game board

Andrea_Miotti and Gabriel Alfour

18 Feb 2023 22:17 UTC

228 points

135 comments6 min readLW link

(andreamiotti.substack.com)

Elements of Rationalist Discourse

Rob Bensinger12 Feb 2023 7:58 UTC

226 points

49 comments3 min readLW link 1 review

AI alignment researchers don’t (seem to) stack

So8res21 Feb 2023 0:48 UTC

197 points

40 comments3 min readLW link

Cognitive Emulation: A Naive AI Safety Proposal

Connor Leahy and Gabriel Alfour

25 Feb 2023 19:35 UTC

195 points

46 comments4 min readLW link

EigenKarma: trust at scale

Henrik Karlsson8 Feb 2023 18:52 UTC

186 points

52 comments5 min readLW link

[Link] A community alert about Ziz

DanielFilan24 Feb 2023 0:06 UTC

181 points

166 comments2 min readLW link 4 reviews

(medium.com)

Why Are Bacteria So Simple?

aysja6 Feb 2023 3:00 UTC

172 points

33 comments10 min readLW link

Parametrically retargetable decision-makers tend to seek power

TurnTrout18 Feb 2023 18:41 UTC

172 points

10 comments2 min readLW link

(arxiv.org)

AI #1: Sydney and Bing

Zvi21 Feb 2023 14:00 UTC

171 points

45 comments61 min readLW link 1 review

(thezvi.wordpress.com)

My understanding of Anthropic strategy

Swimmer963 (Miranda Dixon-Luinenburg) 15 Feb 2023 1:56 UTC

167 points

31 comments4 min readLW link

Big Mac Subsidy?

jefftk23 Feb 2023 4:00 UTC

160 points

25 comments2 min readLW link

(www.jefftk.com)

There are no coherence theorems

Dan H and Elliott Thornley (EJT)

20 Feb 2023 21:25 UTC

155 points

134 comments19 min readLW link 1 review

Stop posting prompt injections on Twitter and calling it “misalignment”

lc19 Feb 2023 2:21 UTC

147 points

9 comments1 min readLW link

We Found An Neuron in GPT-2

Joseph Miller and Clement Neo

11 Feb 2023 18:27 UTC

143 points

23 comments7 min readLW link

(clementneo.com)

Hashing out long-standing disagreements seems low-value to me

So8res16 Feb 2023 6:20 UTC

142 points

34 comments4 min readLW link

Anomalous tokens reveal the original identities of Instruct models

janus and jdp

9 Feb 2023 1:30 UTC

141 points

16 comments9 min readLW link

(generative.ink)

“Rationalist Discourse” Is Like “Physicist Motors”

Zack_M_Davis26 Feb 2023 5:58 UTC

138 points

153 comments9 min readLW link 1 review

Full Transcript: Eliezer Yudkowsky on the Bankless podcast

remember and Andrea_Miotti

23 Feb 2023 12:34 UTC

138 points

90 comments75 min readLW link

Pretraining Language Models with Human Preferences

Tomek Korbak, Sam Bowman and Ethan Perez

21 Feb 2023 17:57 UTC

135 points

20 comments11 min readLW link 2 reviews

Modal Fixpoint Cooperation without Löb’s Theorem

Andrew_Critch5 Feb 2023 0:58 UTC

134 points

34 comments3 min readLW link 1 review

Evaluations (of new AI Safety researchers) can be noisy

LawrenceC5 Feb 2023 4:15 UTC

132 points

11 comments16 min readLW link 1 review

In Defense of Chatbot Romance

Kaj_Sotala11 Feb 2023 14:30 UTC

127 points

53 comments11 min readLW link

(kajsotala.fi)

One-layer transformers aren’t equivalent to a set of skip-trigrams

Buck17 Feb 2023 17:26 UTC

127 points

11 comments7 min readLW link

Recommendation: Bug Bounties and Responsible Disclosure for Advanced ML Systems

Vaniver17 Feb 2023 20:11 UTC

125 points

12 comments2 min readLW link

GPT-175bee

Adam Scherlis and LawrenceC

8 Feb 2023 18:58 UTC

124 points

14 comments1 min readLW link

A proposed method for forecasting transformative AI

Matthew Barnett10 Feb 2023 19:34 UTC

122 points

21 comments10 min readLW link

A Way To Be Okay

Duncan Sabien (Inactive)19 Feb 2023 20:27 UTC

117 points

38 comments10 min readLW link 1 review

On Investigating Conspiracy Theories

Zvi20 Feb 2023 12:50 UTC

117 points

38 comments5 min readLW link

(thezvi.wordpress.com)

Bing chat is the AI fire alarm

Ratios17 Feb 2023 6:51 UTC

115 points

63 comments3 min readLW link

SolidGoldMagikarp II: technical details and more recent findings

mwatkins and Jessica Rumbelow

6 Feb 2023 19:09 UTC

114 points

45 comments13 min readLW link

The public supports regulating AI for safety

Zach Stein-Perlman17 Feb 2023 4:10 UTC

114 points

9 comments1 min readLW link

(aiimpacts.org)

Cyborg Periods: There will be multiple AI transitions

Jan_Kulveit and rosehadshar

22 Feb 2023 16:09 UTC

114 points

9 comments6 min readLW link

The Open Agency Model

Eric Drexler22 Feb 2023 10:35 UTC

114 points

19 comments4 min readLW link

GPT-4 Predictions

Stephen McAleese17 Feb 2023 23:20 UTC

112 points

27 comments11 min readLW link

Conflict Theory of Bounded Distrust

Zack_M_Davis12 Feb 2023 5:30 UTC

112 points

33 comments3 min readLW link 1 review

Another Way to Be Okay

Gretta Duleba19 Feb 2023 20:49 UTC

111 points

15 comments6 min readLW link

I don’t think MIRI “gave up”

Raemon3 Feb 2023 0:26 UTC

106 points

64 comments4 min readLW link

Sam Altman: “Planning for AGI and beyond”

LawrenceC24 Feb 2023 20:28 UTC

105 points

54 comments6 min readLW link

(openai.com)

The Filan Cabinet Podcast with Oliver Habryka—Transcript

MondSemmel and RobertM

14 Feb 2023 2:38 UTC

104 points

9 comments72 min readLW link

H5N1

Zvi13 Feb 2023 12:50 UTC

102 points

1 comment9 min readLW link

(thezvi.wordpress.com)