Simulators

janus2 Sep 2022 12:45 UTC
594 points
161 comments41 min readLW link8 reviews
(generative.ink)

The Redac­tion Machine

Ben20 Sep 2022 22:03 UTC
494 points
46 comments27 min readLW link1 review

Los­ing the root for the tree

Adam Zerner20 Sep 2022 4:53 UTC
465 points
30 comments9 min readLW link1 review

You Are Not Mea­sur­ing What You Think You Are Measuring

johnswentworth20 Sep 2022 20:04 UTC
368 points
44 comments8 min readLW link2 reviews

Why I think strong gen­eral AI is com­ing soon

porby28 Sep 2022 5:40 UTC
325 points
139 comments34 min readLW link1 review

The shard the­ory of hu­man values

4 Sep 2022 4:28 UTC
235 points
66 comments24 min readLW link2 reviews

An­nounc­ing Balsa Research

Zvi25 Sep 2022 22:50 UTC
235 points
64 comments2 min readLW link1 review
(thezvi.wordpress.com)

How I buy things when Light­cone wants them fast

jacobjacob26 Sep 2022 5:02 UTC
218 points
21 comments8 min readLW link

How my team at Light­cone some­times gets stuff done

jacobjacob19 Sep 2022 5:47 UTC
191 points
43 comments7 min readLW link1 review

7 traps that (we think) new al­ign­ment re­searchers of­ten fall into

27 Sep 2022 23:13 UTC
174 points
10 comments4 min readLW link

Do bam­boos set them­selves on fire?

Malmesbury19 Sep 2022 15:34 UTC
170 points
14 comments6 min readLW link1 review

Most Peo­ple Start With The Same Few Bad Ideas

johnswentworth9 Sep 2022 0:29 UTC
162 points
30 comments3 min readLW link

The Onion Test for Per­sonal and In­sti­tu­tional Honesty

27 Sep 2022 15:26 UTC
154 points
31 comments3 min readLW link3 reviews

Public-fac­ing Cen­sor­ship Is Safety Theater, Caus­ing Rep­u­ta­tional Da­m­age

Yitz23 Sep 2022 5:08 UTC
149 points
42 comments6 min readLW link

AI co­or­di­na­tion needs clear wins

evhub1 Sep 2022 23:41 UTC
146 points
16 comments2 min readLW link1 review

Take­aways from our ro­bust in­jury clas­sifier pro­ject [Red­wood Re­search]

dmz17 Sep 2022 3:55 UTC
143 points
12 comments6 min readLW link1 review

Threat-Re­sis­tant Bar­gain­ing Me­ga­post: In­tro­duc­ing the ROSE Value

Diffractor28 Sep 2022 1:20 UTC
141 points
19 comments53 min readLW link2 reviews

Un­der­stand­ing In­fra-Bayesi­anism: A Begin­ner-Friendly Video Series

22 Sep 2022 13:25 UTC
140 points
6 comments2 min readLW link

In­ter­pret­ing Neu­ral Net­works through the Poly­tope Lens

23 Sep 2022 17:58 UTC
136 points
29 comments33 min readLW link

Mon­i­tor­ing for de­cep­tive alignment

evhub8 Sep 2022 23:07 UTC
135 points
8 comments9 min readLW link

Orexin and the quest for more wak­ing hours

ChristianKl24 Sep 2022 19:54 UTC
129 points
39 comments5 min readLW link

An Up­date on Academia vs. In­dus­try (one year into my fac­ulty job)

David Scott Krueger (formerly: capybaralet)3 Sep 2022 20:43 UTC
121 points
18 comments4 min readLW link

LW Petrov Day 2022 (Mon­day, 9/​26)

Ruby22 Sep 2022 2:56 UTC
121 points
111 comments5 min readLW link

Gene drives: why the wait?

Metacelsus19 Sep 2022 23:37 UTC
120 points
50 comments3 min readLW link
(denovo.substack.com)

Quintin’s al­ign­ment pa­pers roundup—week 1

Quintin Pope10 Sep 2022 6:39 UTC
120 points
6 comments9 min readLW link

An­nounc­ing $5,000 bounty for (re­spon­si­bly) end­ing malaria

lc24 Sep 2022 4:28 UTC
116 points
40 comments4 min readLW link

Re­jected Early Drafts of New­comb’s Problem

zahmahkibo6 Sep 2022 19:04 UTC
112 points
5 comments3 min readLW link

Petrov Day Ret­ro­spec­tive: 2022

Ruby28 Sep 2022 22:16 UTC
107 points
41 comments4 min readLW link

Un­der­stand­ing Con­jec­ture: Notes from Con­nor Leahy interview

Akash15 Sep 2022 18:37 UTC
106 points
23 comments15 min readLW link

My emo­tional re­ac­tion to the cur­rent fund­ing situation

Sam F. Brown9 Sep 2022 22:02 UTC
105 points
36 comments5 min readLW link
(sambrown.eu)

Ukraine Post #12

Zvi22 Sep 2022 14:40 UTC
104 points
3 comments16 min readLW link
(thezvi.wordpress.com)

Eval­u­a­tions pro­ject @ ARC is hiring a re­searcher and a web­dev/​engineer

Beth Barnes9 Sep 2022 22:46 UTC
99 points
7 comments10 min readLW link

Fund­ing is All You Need: Get­ting into Grad School by Hack­ing the NSF GRFP Fellowship

hapanin22 Sep 2022 21:39 UTC
98 points
9 comments12 min readLW link

[Linkpost] A sur­vey on over 300 works about in­ter­pretabil­ity in deep networks

scasper12 Sep 2022 19:07 UTC
97 points
7 comments2 min readLW link
(arxiv.org)

In­verse Scal­ing Prize: Round 1 Winners

26 Sep 2022 19:57 UTC
93 points
16 comments4 min readLW link
(irmckenzie.co.uk)

The ethics of re­clin­ing air­plane seats

braces4 Sep 2022 17:59 UTC
92 points
70 comments1 min readLW link

Linkpost: Github Copi­lot pro­duc­tivity experiment

Daniel Kokotajlo8 Sep 2022 4:41 UTC
88 points
4 comments1 min readLW link
(github.blog)

Why we’re not found­ing a hu­man-data-for-al­ign­ment org

27 Sep 2022 20:14 UTC
88 points
5 comments29 min readLW link
(forum.effectivealtruism.org)

Let’s Ter­raform West Texas

blackstampede4 Sep 2022 16:24 UTC
87 points
33 comments5 min readLW link

Nearcast-based “de­ploy­ment prob­lem” analysis

HoldenKarnofsky21 Sep 2022 18:52 UTC
85 points
2 comments26 min readLW link

Towards de­con­fus­ing wire­head­ing and re­ward maximization

leogao21 Sep 2022 0:36 UTC
81 points
7 comments4 min readLW link

Dath Ilan’s Views on Stop­gap Corrigibility

David Udell22 Sep 2022 16:16 UTC
77 points
19 comments13 min readLW link
(www.glowfic.com)

AI Safety and Neigh­bor­ing Com­mu­ni­ties: A Quick-Start Guide, as of Sum­mer 2022

Sam Bowman1 Sep 2022 19:15 UTC
76 points
2 comments7 min readLW link

Bugs or Fea­tures?

qbolec3 Sep 2022 7:04 UTC
72 points
9 comments2 min readLW link

Builder/​Breaker for Deconfusion

abramdemski29 Sep 2022 17:36 UTC
72 points
9 comments9 min readLW link

So­lar Black­out Resistance

jefftk8 Sep 2022 13:30 UTC
69 points
32 comments3 min readLW link
(www.jefftk.com)

Am­bi­guity in Pre­dic­tion Mar­ket Re­s­olu­tion is Harmful

aphyer26 Sep 2022 16:22 UTC
69 points
17 comments5 min readLW link

Path de­pen­dence in ML in­duc­tive biases

10 Sep 2022 1:38 UTC
68 points
13 comments10 min readLW link

Stop Dis­cour­ag­ing Microwave For­mula Preparation

jefftk2 Sep 2022 2:10 UTC
68 points
12 comments2 min readLW link
(www.jefftk.com)

Align­ment Org Cheat Sheet

20 Sep 2022 17:36 UTC
68 points
8 comments4 min readLW link