My Overview of the AI Align­ment Land­scape: Threat Models

Neel Nanda25 Dec 2021 23:07 UTC
52 points
3 comments28 min readLW link

[Question] What is a prob­a­bil­is­tic phys­i­cal the­ory?

Ege Erdil25 Dec 2021 16:30 UTC
15 points
36 comments2 min readLW link

Belief-con­di­tional things—things that only ex­ist when you be­lieve in them

Jan25 Dec 2021 10:49 UTC
7 points
3 comments5 min readLW link
(universalprior.substack.com)

Tough Choices and Disappointment

maralorn24 Dec 2021 21:59 UTC
2 points
6 comments1 min readLW link

Con­verg­ing to­ward a Million Worlds

Joe Kwon24 Dec 2021 21:33 UTC
11 points
1 comment3 min readLW link

Un­der­stand­ing the ten­sor product for­mu­la­tion in Trans­former Circuits

Tom Lieberum24 Dec 2021 18:05 UTC
16 points
2 comments3 min readLW link

[Question] How to se­lect a long-term goal and al­ign my mind to­wards it?

Alexander24 Dec 2021 11:40 UTC
19 points
8 comments2 min readLW link

Pr­ereq­ui­site Skills

lsusr24 Dec 2021 10:11 UTC
17 points
3 comments1 min readLW link

Mechanis­tic In­ter­pretabil­ity for the MLP Lay­ers (rough early thoughts)

MadHatter24 Dec 2021 7:24 UTC
11 points
2 comments1 min readLW link
(www.youtube.com)

Risks from AI persuasion

Beth Barnes24 Dec 2021 1:48 UTC
75 points
15 comments31 min readLW link

Pri­ori­tiz­ing Information

jsteinhardt24 Dec 2021 0:00 UTC
18 points
0 comments7 min readLW link
(bounded-regret.ghost.io)

Omicron Post #9

Zvi23 Dec 2021 21:50 UTC
89 points
11 comments19 min readLW link
(thezvi.wordpress.com)

Re­ply to Eliezer on Biolog­i­cal Anchors

HoldenKarnofsky23 Dec 2021 16:15 UTC
150 points
46 comments15 min readLW link

Get Set, Also Go

Zvi23 Dec 2021 15:00 UTC
62 points
21 comments16 min readLW link
(thezvi.wordpress.com)

2021 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

Larks23 Dec 2021 14:06 UTC
165 points
28 comments73 min readLW link

Test­ing, Test­ing, Hopefully

Zvi23 Dec 2021 12:30 UTC
41 points
8 comments4 min readLW link
(thezvi.wordpress.com)

Physics Erotica

lsusr23 Dec 2021 11:01 UTC
7 points
12 comments1 min readLW link

[Book Re­view] “The Most Pow­er­ful Idea in the World” by William Rosen

lsusr23 Dec 2021 8:27 UTC
41 points
4 comments8 min readLW link

Specialization

DirectedEvolution23 Dec 2021 3:23 UTC
13 points
1 comment5 min readLW link

Worst-case think­ing in AI alignment

Buck23 Dec 2021 1:29 UTC
162 points
18 comments6 min readLW link2 reviews

[Question] Hedg­ing the Pos­si­bil­ity of Rus­sia in­vad­ing Ukraine

Annapurna23 Dec 2021 1:13 UTC
27 points
8 comments1 min readLW link

Gifts

George3d622 Dec 2021 23:50 UTC
13 points
1 comment9 min readLW link
(www.epistem.ink)

A spread­sheet/​tem­plate for do­ing an an­nual review

peterslattery22 Dec 2021 23:29 UTC
12 points
1 comment2 min readLW link

[Question] What time in your life were you the most pro­duc­tive at learn­ing and/​or think­ing and why?

Jack R22 Dec 2021 22:56 UTC
11 points
2 comments1 min readLW link

Trans­former Circuits

evhub22 Dec 2021 21:09 UTC
144 points
4 comments3 min readLW link
(transformer-circuits.pub)

[Question] Help figur­ing out my sex­u­al­ity?

Centhart22 Dec 2021 20:28 UTC
13 points
13 comments2 min readLW link

DnD.Sci GURPS Eval­u­a­tion and Ruleset

J Bostock22 Dec 2021 19:05 UTC
17 points
2 comments6 min readLW link

Po­ten­tial gears level ex­pla­na­tions of smooth progress

ryan_greenblatt22 Dec 2021 18:05 UTC
4 points
2 comments2 min readLW link

Ran­dom facts can come back to bite you

tailcalled22 Dec 2021 17:33 UTC
66 points
7 comments2 min readLW link1 review

What’s Up With the CDC Now­cast?

Zvi22 Dec 2021 13:00 UTC
61 points
4 comments5 min readLW link
(thezvi.wordpress.com)

Mo­ral­ity and con­strained max­i­miza­tion, part 1

Joe Carlsmith22 Dec 2021 8:47 UTC
20 points
5 comments11 min readLW link

Six Spe­cial­iza­tions Makes You World-Class

lsusr22 Dec 2021 8:03 UTC
53 points
23 comments1 min readLW link

Wor­ld­build­ing ex­er­cise: The High­way­verse.

Yair Halberstadt22 Dec 2021 6:47 UTC
13 points
13 comments11 min readLW link

Two (very differ­ent) kinds of donors

[DEACTIVATED] Duncan Sabien22 Dec 2021 1:43 UTC
101 points
19 comments3 min readLW link

[Question] Con­fu­sion about Se­quences and Re­view Sequences

Alex_Altair21 Dec 2021 18:13 UTC
14 points
3 comments1 min readLW link

Work­ing through D&D.Sci, prob­lem 1 (solu­tion)

Pablo Repetto21 Dec 2021 17:42 UTC
9 points
2 comments1 min readLW link
(pabloernesto.github.io)

De­mand­ing and De­sign­ing Aligned Cog­ni­tive Architectures

Koen.Holtman21 Dec 2021 17:32 UTC
8 points
5 comments5 min readLW link

Ex­pe­riences rais­ing chil­dren in shared housing

juliawise21 Dec 2021 17:09 UTC
116 points
4 comments6 min readLW link

[Question] What ques­tions do you have about do­ing work on AI safety?

peterbarnett21 Dec 2021 16:36 UTC
13 points
8 comments1 min readLW link

Per­pet­ual Dick­en­sian Poverty?

jefftk21 Dec 2021 13:30 UTC
119 points
18 comments1 min readLW link
(www.jefftk.com)

On (Not) Read­ing Papers

Jan21 Dec 2021 9:57 UTC
52 points
10 comments7 min readLW link
(universalprior.substack.com)

Quick Poll: Booster Reactions

Elizabeth21 Dec 2021 7:40 UTC
40 points
2 comments2 min readLW link
(acesounderglass.com)

Book Launch: The Eng­ines of Cognition

Ben Pace21 Dec 2021 7:24 UTC
174 points
55 comments5 min readLW link

Re­searcher in­cen­tives cause smoother progress on bench­marks

ryan_greenblatt21 Dec 2021 4:13 UTC
20 points
4 comments1 min readLW link

[Question] Ma­nipu­la­tion re­sis­tance of futarchy

harper owen21 Dec 2021 0:33 UTC
4 points
4 comments1 min readLW link

Omicron Post #8

Zvi20 Dec 2021 23:10 UTC
96 points
33 comments16 min readLW link
(thezvi.wordpress.com)

[Question] Good com­plete views on motivation

Valdes20 Dec 2021 22:10 UTC
6 points
4 comments1 min readLW link

Prizes for last year’s 2019 Review

Raemon20 Dec 2021 21:58 UTC
40 points
0 comments3 min readLW link

Omicron Paths

jefftk20 Dec 2021 18:30 UTC
14 points
8 comments2 min readLW link
(www.jefftk.com)

[Question] Is there a term /​ bet­ter way of phras­ing the gen­eral case where an in­ter­ven­tion helps cer­tain in­di­vi­d­u­als do bet­ter at zero-sum games but doesn’t provide any ex­ter­nal value?

freedomandutility20 Dec 2021 17:35 UTC
4 points
8 comments1 min readLW link