«Boundaries», Part 1: a key miss­ing con­cept from util­ity theory

Andrew_CritchJul 26, 2022, 11:03 PM
158 points
33 comments7 min readLW link

Why all the fuss about re­cur­sive self-im­prove­ment?

So8resJun 12, 2022, 8:53 PM
158 points
62 comments7 min readLW link1 review

Limits to Legibility

Jan_KulveitJun 29, 2022, 5:42 PM
157 points
11 comments5 min readLW link1 review

Your posts should be on arXiv

JanBAug 25, 2022, 10:35 AM
156 points
44 comments3 min readLW link

Non­profit Boards are Weird

HoldenKarnofskyJun 23, 2022, 2:40 PM
156 points
26 comments20 min readLW link1 review
(www.cold-takes.com)

What’s Gen­eral-Pur­pose Search, And Why Might We Ex­pect To See It In Trained ML Sys­tems?

johnswentworthAug 15, 2022, 10:48 PM
156 points
18 comments10 min readLW link

Nate Soares’ Life Advice

CatGoddessAug 23, 2022, 2:46 AM
155 points
41 comments3 min readLW link

LessWrong Has Agree/​Disagree Vot­ing On All New Com­ment Threads

Ben PaceJun 24, 2022, 12:43 AM
154 points
217 comments2 min readLW link1 review

Emo­tion­ally Con­fronting a Prob­a­bly-Doomed World: Against Mo­ti­va­tion Via Dig­nity Points

TurnTroutApr 10, 2022, 6:45 PM
154 points
7 comments9 min readLW link

Stay­ing Split: Sa­ba­tini and So­cial Justice

Duncan Sabien (Deactivated)Jun 8, 2022, 8:32 AM
153 points
28 comments21 min readLW link

Learn­ing By Writing

HoldenKarnofskyFeb 22, 2022, 3:50 PM
151 points
25 comments10 min readLW link3 reviews
(www.cold-takes.com)

[In­terim re­search re­port] Tak­ing fea­tures out of su­per­po­si­tion with sparse autoencoders

Dec 13, 2022, 3:41 PM
150 points
23 comments22 min readLW link2 reviews

Prizes for ELK proposals

paulfchristianoJan 3, 2022, 8:23 PM
150 points
152 comments7 min readLW link

Deep­Mind is hiring for the Scal­able Align­ment and Align­ment Teams

May 13, 2022, 12:17 PM
150 points
34 comments9 min readLW link

Align­ment re­search exercises

Richard_NgoFeb 21, 2022, 8:24 PM
150 points
17 comments8 min readLW link

Shard The­ory in Nine Th­e­ses: a Distil­la­tion and Crit­i­cal Appraisal

LawrenceCDec 19, 2022, 10:52 PM
150 points
30 comments18 min readLW link

The metaphor you want is “color blind­ness,” not “blind spot.”

Duncan Sabien (Deactivated)Feb 14, 2022, 12:28 AM
150 points
17 comments3 min readLW link2 reviews

Steam

abramdemskiJun 20, 2022, 5:38 PM
149 points
13 comments5 min readLW link1 review

In­ner and outer al­ign­ment de­com­pose one hard prob­lem into two ex­tremely hard problems

TurnTroutDec 2, 2022, 2:43 AM
149 points
22 comments47 min readLW link3 reviews

Public-fac­ing Cen­sor­ship Is Safety Theater, Caus­ing Rep­u­ta­tional Da­m­age

YitzSep 23, 2022, 5:08 AM
149 points
42 comments6 min readLW link

Use Nor­mal Predictions

Jan Christian RefsgaardJan 9, 2022, 3:01 PM
148 points
67 comments6 min readLW link

A Year of AI In­creas­ing AI Progress

TW123Dec 30, 2022, 2:09 AM
148 points
3 comments2 min readLW link

AI co­or­di­na­tion needs clear wins

evhubSep 1, 2022, 11:41 PM
147 points
16 comments2 min readLW link1 review

Re­shap­ing the AI Industry

Thane RuthenisMay 29, 2022, 10:54 PM
147 points
35 comments21 min readLW link

K-com­plex­ity is silly; use cross-en­tropy instead

So8resDec 20, 2022, 11:06 PM
147 points
54 comments14 min readLW link2 reviews

[Question] why as­sume AGIs will op­ti­mize for fixed goals?

nostalgebraistJun 10, 2022, 1:28 AM
147 points
60 comments4 min readLW link2 reviews

We’re already in AI takeoff

ValentineMar 8, 2022, 11:09 PM
146 points
119 comments7 min readLW link

Up­dat­ing my AI timelines

Matthew BarnettDec 5, 2022, 8:46 PM
145 points
50 comments2 min readLW link

Su­per­vise Pro­cess, not Outcomes

Apr 5, 2022, 10:18 PM
145 points
9 comments10 min readLW link

In­ter­pretabil­ity/​Tool-ness/​Align­ment/​Cor­rigi­bil­ity are not Composable

johnswentworthAug 8, 2022, 6:05 PM
144 points
13 comments3 min readLW link

Public be­liefs vs. Pri­vate beliefs

Eli TyreJun 1, 2022, 9:33 PM
144 points
30 comments5 min readLW link

In­ter­pret­ing Neu­ral Net­works through the Poly­tope Lens

Sep 23, 2022, 5:58 PM
144 points
29 comments33 min readLW link

Refine: An In­cu­ba­tor for Con­cep­tual Align­ment Re­search Bets

adamShimiApr 15, 2022, 8:57 AM
144 points
13 comments4 min readLW link

Take­aways from our ro­bust in­jury clas­sifier pro­ject [Red­wood Re­search]

dmzSep 17, 2022, 3:55 AM
143 points
12 comments6 min readLW link1 review

Twit­ter thread on postrationalists

Eli TyreFeb 17, 2022, 9:02 AM
143 points
32 comments5 min readLW link

High-stakes al­ign­ment via ad­ver­sar­ial train­ing [Red­wood Re­search re­port]

May 5, 2022, 12:59 AM
142 points
29 comments9 min readLW link

Age changes what you care about

DentinOct 16, 2022, 3:36 PM
141 points
37 comments2 min readLW link

[Question] How to Con­vince my Son that Drugs are Bad

concerned_dadDec 17, 2022, 6:47 PM
140 points
84 comments2 min readLW link

The Parable of the Boy Who Cried 5% Chance of Wolf

KatWoodsAug 15, 2022, 2:33 PM
140 points
24 comments2 min readLW link

How might we al­ign trans­for­ma­tive AI if it’s de­vel­oped very soon?

HoldenKarnofskyAug 29, 2022, 3:42 PM
140 points
55 comments45 min readLW link1 review

Un­der­stand­ing In­fra-Bayesi­anism: A Begin­ner-Friendly Video Series

Sep 22, 2022, 1:25 PM
140 points
6 comments2 min readLW link

More Is Differ­ent for AI

jsteinhardtJan 4, 2022, 7:30 PM
140 points
24 comments3 min readLW link1 review
(bounded-regret.ghost.io)

Re­solve Cycles

CFAR!DuncanJul 16, 2022, 11:17 PM
140 points
8 comments10 min readLW link

“Pivotal Act” In­ten­tions: Nega­tive Con­se­quences and Fal­la­cious Arguments

Andrew_CritchApr 19, 2022, 8:25 PM
139 points
55 comments7 min readLW link1 review

Take­off speeds have a huge effect on what it means to work on AI x-risk

BuckApr 13, 2022, 5:38 PM
139 points
27 comments2 min readLW link2 reviews

A de­scrip­tive, not pre­scrip­tive, overview of cur­rent AI Align­ment Research

Jun 6, 2022, 9:59 PM
139 points
21 comments7 min readLW link

ELK prize results

Mar 9, 2022, 12:01 AM
138 points
50 comments21 min readLW link

Mechanis­tic anomaly de­tec­tion and ELK

paulfchristianoNov 25, 2022, 6:50 PM
138 points
22 comments21 min readLW link
(ai-alignment.com)

AI Timelines via Cu­mu­la­tive Op­ti­miza­tion Power: Less Long, More Short

jacob_cannellOct 6, 2022, 12:21 AM
138 points
33 comments6 min readLW link

Con­tra EY: Can AGI de­stroy us with­out trial & er­ror?

nsokolskyJun 13, 2022, 6:26 PM
137 points
72 comments15 min readLW link