Mys­ter­ies of mode collapse

janus8 Nov 2022 10:37 UTC
281 points
56 comments14 min readLW link1 review

I Con­verted Book I of The Se­quences Into A Zoomer-Read­able Format

dkirmani10 Nov 2022 2:59 UTC
204 points
31 comments2 min readLW link

What it’s like to dis­sect a cadaver

Alok Singh10 Nov 2022 6:40 UTC
204 points
23 comments5 min readLW link
(alok.github.io)

The Sin­gu­lar Value De­com­po­si­tions of Trans­former Weight Ma­tri­ces are Highly Interpretable

28 Nov 2022 12:54 UTC
195 points
33 comments31 min readLW link

Tyranny of the Epistemic Majority

Scott Garrabrant22 Nov 2022 17:19 UTC
187 points
13 comments9 min readLW link1 review

Con­jec­ture: a ret­ro­spec­tive af­ter 8 months of work

23 Nov 2022 17:10 UTC
185 points
9 comments8 min readLW link

Planes are still decades away from dis­plac­ing most bird jobs

guzey25 Nov 2022 16:49 UTC
159 points
13 comments3 min readLW link

Geo­met­ric Ra­tion­al­ity is Not VNM Rational

Scott Garrabrant27 Nov 2022 19:36 UTC
149 points
26 comments3 min readLW link

The Geo­met­ric Expectation

Scott Garrabrant23 Nov 2022 18:05 UTC
145 points
19 comments4 min readLW link

The Align­ment Com­mu­nity Is Cul­turally Broken

sudo13 Nov 2022 18:53 UTC
136 points
68 comments2 min readLW link

Sadly, FTX

Zvi17 Nov 2022 14:30 UTC
133 points
18 comments47 min readLW link
(thezvi.wordpress.com)

AI will change the world, but won’t take it over by play­ing “3-di­men­sional chess”.

22 Nov 2022 18:57 UTC
133 points
98 comments24 min readLW link

Mechanis­tic anomaly de­tec­tion and ELK

paulfchristiano25 Nov 2022 18:50 UTC
133 points
21 comments21 min readLW link
(ai-alignment.com)

Clar­ify­ing AI X-risk

1 Nov 2022 11:03 UTC
127 points
24 comments4 min readLW link1 review

On the Di­plo­macy AI

Zvi28 Nov 2022 13:20 UTC
127 points
29 comments11 min readLW link
(thezvi.wordpress.com)

Geo­met­ric Ex­plo­ra­tion, Arith­metic Exploitation

Scott Garrabrant24 Nov 2022 15:36 UTC
118 points
4 comments7 min readLW link

Utili­tar­i­anism Meets Egalitarianism

Scott Garrabrant21 Nov 2022 19:00 UTC
116 points
16 comments6 min readLW link1 review

Spec­u­la­tion on Cur­rent Op­por­tu­ni­ties for Unusu­ally High Im­pact in Global Health

johnswentworth11 Nov 2022 20:47 UTC
114 points
31 comments4 min readLW link

How could we know that an AGI sys­tem will have good con­se­quences?

So8res7 Nov 2022 22:42 UTC
109 points
25 comments5 min readLW link

What I Learned Run­ning Refine

adamShimi24 Nov 2022 14:49 UTC
107 points
5 comments4 min readLW link

Ap­ply­ing su­per­in­tel­li­gence with­out col­lu­sion

Eric Drexler8 Nov 2022 18:08 UTC
107 points
63 comments4 min readLW link

Cau­tion when in­ter­pret­ing Deep­mind’s In-con­text RL paper

Sam Marks1 Nov 2022 2:42 UTC
104 points
6 comments4 min readLW link

LW Beta Fea­ture: Side-Comments

jimrandomh24 Nov 2022 1:55 UTC
103 points
47 comments1 min readLW link

LessWrong read­ers are in­vited to ap­ply to the Lurkshop

22 Nov 2022 9:19 UTC
101 points
41 comments3 min readLW link

In­stead of tech­ni­cal re­search, more peo­ple should fo­cus on buy­ing time

5 Nov 2022 20:43 UTC
100 points
45 comments14 min readLW link

In­stru­men­tal con­ver­gence is what makes gen­eral in­tel­li­gence possible

tailcalled11 Nov 2022 16:38 UTC
97 points
11 comments4 min readLW link

ARC pa­per: For­mal­iz­ing the pre­sump­tion of independence

Erik Jenner20 Nov 2022 1:22 UTC
97 points
2 comments2 min readLW link
(arxiv.org)

Try­ing to Make a Treach­er­ous Mesa-Optimizer

MadHatter9 Nov 2022 18:07 UTC
95 points
14 comments4 min readLW link
(attentionspan.blog)

Meta AI an­nounces Cicero: Hu­man-Level Di­plo­macy play (with di­alogue)

Jacy Reese Anthis22 Nov 2022 16:50 UTC
93 points
64 comments1 min readLW link
(www.science.org)

Con­jec­ture Se­cond Hiring Round

23 Nov 2022 17:11 UTC
92 points
0 comments1 min readLW link

Search­ing for Search

28 Nov 2022 15:31 UTC
91 points
8 comments14 min readLW link1 review

Cur­rent themes in mechanis­tic in­ter­pretabil­ity research

16 Nov 2022 14:14 UTC
89 points
2 comments12 min readLW link

By De­fault, GPTs Think In Plain Sight

Fabien Roger19 Nov 2022 19:15 UTC
85 points
33 comments9 min readLW link

An­nounc­ing the Progress Forum

jasoncrawford17 Nov 2022 19:26 UTC
83 points
9 comments1 min readLW link

When AI solves a game, fo­cus on the game’s me­chan­ics, not its theme.

Cleo Nardo23 Nov 2022 19:16 UTC
82 points
7 comments2 min readLW link

Re­sults from the in­ter­pretabil­ity hackathon

17 Nov 2022 14:51 UTC
81 points
0 comments6 min readLW link
(alignmentjam.com)

Ex­ams-Only Universities

Mati_Roy6 Nov 2022 22:05 UTC
80 points
40 comments2 min readLW link

Always know where your ab­strac­tions break

lsusr27 Nov 2022 6:32 UTC
78 points
6 comments2 min readLW link

Will we run out of ML data? Ev­i­dence from pro­ject­ing dataset size trends

Pablo Villalobos14 Nov 2022 16:42 UTC
75 points
12 comments2 min readLW link
(epochai.org)

Disagree­ment with bio an­chors that lead to shorter timelines

Marius Hobbhahn16 Nov 2022 14:40 UTC
75 points
17 comments7 min readLW link1 review

Eng­ineer­ing Monose­man­tic­ity in Toy Models

18 Nov 2022 1:43 UTC
75 points
7 comments3 min readLW link
(arxiv.org)

Fol­low up to med­i­cal miracle

Elizabeth4 Nov 2022 18:00 UTC
75 points
5 comments6 min readLW link
(acesounderglass.com)

Threat Model Liter­a­ture Review

1 Nov 2022 11:03 UTC
75 points
4 comments25 min readLW link

What is epi­ge­net­ics?

Metacelsus6 Nov 2022 1:24 UTC
74 points
4 comments6 min readLW link
(denovo.substack.com)

Elas­tic Pro­duc­tivity Tools

Simon Berens19 Nov 2022 21:59 UTC
74 points
8 comments2 min readLW link
(simonberens.me)

An­nounc­ing AI Align­ment Awards: $100k re­search con­tests about goal mis­gen­er­al­iza­tion & corrigibility

22 Nov 2022 22:19 UTC
73 points
20 comments4 min readLW link

Take­aways from a sur­vey on AI al­ign­ment resources

DanielFilan5 Nov 2022 23:40 UTC
73 points
10 comments6 min readLW link1 review
(danielfilan.com)

Re­spect­ing your Lo­cal Preferences

Scott Garrabrant26 Nov 2022 19:04 UTC
73 points
1 comment4 min readLW link

Dist­in­guish­ing test from training

So8res29 Nov 2022 21:41 UTC
72 points
11 comments6 min readLW link

Up­date to Mys­ter­ies of mode col­lapse: text-davinci-002 not RLHF

janus19 Nov 2022 23:51 UTC
71 points
8 comments2 min readLW link