RSS

AI

Core TagLast edit: 11 Mar 2021 12:14 UTC by plex

Artificial Intelligence is the study of creating intelligence in algorithms. On LessWrong, the primary focus of AI discussion is to ensure that as humanity builds increasingly powerful AI systems, the outcome will be good. The central concern is that a powerful enough AI, if not designed and implemented with sufficient understanding, would optimize something unintended by its creators and pose an existential threat to the future of humanity. This is known as the AI alignment problem.

Common terms in this space are superintelligence, AI Alignment, AI Safety, Friendly AI, Transformative AI, human-level-intelligence, AI Governance, and Beneficial AI. This entry and the associated tag roughly encompass all of these topics: anything part of the broad cluster of understanding AI and its future impacts on our civilization deserves this tag.

AI Alignment

There are narrow conceptions of alignment, where you’re trying to get it to do something like cure Alzheimer’s disease without destroying the rest of the world. And there’s much more ambitious notions of alignment, where you’re trying to get it to do the right thing and achieve a happy intergalactic civilization.

But both the narrow and the ambitious alignment have in common that you’re trying to have the AI do that thing rather than making a lot of paperclips.

See also General Intelligence.

Basic Alignment Theory

AIXI
Coherent Extrapolated Volition
Complexity of Value
Corrigibility
Decision Theory
Embedded Agency
Fixed Point Theorems
Goodhart’s Law
Infra-Bayesianism
Inner Alignment
Instrumental Convergence
Logical Induction
Mesa-Optimization
Myopia
Newcomb’s Problem
Optimization
Orthogonality Thesis
Outer Alignment
Paperclip Maximizer
Solomonoff Induction
Utility Functions

Engineering Alignment

AI Boxing (Containment)
Debate (AI safety technique)
Factored Cognition
Humans Consulting HCH
Impact Measures
Inverse Reinforcement Learning
Iterated Amplification
Mild Optimization
Tool AI
Transparency /​ Interpretability
Value Learning

Strategy

AI Governance
AI Risk
AI Services (CAIS)
AI Takeoff
AI Timelines

Organizations

Centre for Human-Compatible AI
DeepMind
Future of Humanity Institute
Machine Intelligence Research Institute
OpenAI
Ought

Other
GPT
Research Agendas

There’s No Fire Alarm for Ar­tifi­cial Gen­eral Intelligence

Eliezer Yudkowsky13 Oct 2017 21:38 UTC
97 points
67 comments25 min readLW link

An overview of 11 pro­pos­als for build­ing safe ad­vanced AI

evhub29 May 2020 20:38 UTC
147 points
30 comments38 min readLW link

Risks from Learned Op­ti­miza­tion: Introduction

31 May 2019 23:44 UTC
140 points
40 comments12 min readLW link3 nominations3 reviews

Embed­ded Agents

29 Oct 2018 19:53 UTC
185 points
41 comments1 min readLW link

Su­per­in­tel­li­gence FAQ

Scott Alexander20 Sep 2016 19:00 UTC
45 points
7 comments27 min readLW link

What failure looks like

paulfchristiano17 Mar 2019 20:18 UTC
240 points
48 comments8 min readLW link2 nominations2 reviews

Challenges to Chris­ti­ano’s ca­pa­bil­ity am­plifi­ca­tion proposal

Eliezer Yudkowsky19 May 2018 18:18 UTC
97 points
53 comments23 min readLW link

The Rocket Align­ment Problem

Eliezer Yudkowsky4 Oct 2018 0:38 UTC
165 points
41 comments15 min readLW link

Embed­ded Agency (full-text ver­sion)

15 Nov 2018 19:49 UTC
114 points
11 comments54 min readLW link

A space of pro­pos­als for build­ing safe ad­vanced AI

Richard_Ngo10 Jul 2020 16:58 UTC
44 points
3 comments4 min readLW link

Good­hart Taxonomy

Scott Garrabrant30 Dec 2017 16:38 UTC
154 points
33 comments10 min readLW link

AI Align­ment 2018-19 Review

rohinmshah28 Jan 2020 2:19 UTC
115 points
6 comments35 min readLW link

Some AI re­search ar­eas and their rele­vance to ex­is­ten­tial safety

Andrew_Critch19 Nov 2020 3:18 UTC
167 points
37 comments50 min readLW link

That Alien Message

Eliezer Yudkowsky22 May 2008 5:55 UTC
217 points
171 comments10 min readLW link

Episte­molog­i­cal Fram­ing for AI Align­ment Research

adamShimi8 Mar 2021 22:05 UTC
50 points
6 comments9 min readLW link

Ro­bust­ness to Scale

Scott Garrabrant21 Feb 2018 22:55 UTC
99 points
21 comments2 min readLW link

Chris Olah’s views on AGI safety

evhub1 Nov 2019 20:13 UTC
140 points
38 comments12 min readLW link2 nominations2 reviews

[AN #96]: Buck and I dis­cuss/​ar­gue about AI Alignment

rohinmshah22 Apr 2020 17:20 UTC
17 points
4 comments10 min readLW link
(mailchi.mp)

Matt Botv­inick on the spon­ta­neous emer­gence of learn­ing algorithms

Adam Scholl12 Aug 2020 7:47 UTC
138 points
90 comments5 min readLW link

Co­her­ence ar­gu­ments do not im­ply goal-di­rected behavior

rohinmshah3 Dec 2018 3:26 UTC
75 points
65 comments7 min readLW link

Align­ment By Default

johnswentworth12 Aug 2020 18:54 UTC
97 points
84 comments11 min readLW link

Book re­view: “A Thou­sand Brains” by Jeff Hawkins

Steven Byrnes4 Mar 2021 5:10 UTC
99 points
14 comments19 min readLW link

AlphaGo Zero and the Foom Debate

Eliezer Yudkowsky21 Oct 2017 2:18 UTC
78 points
16 comments3 min readLW link

Trade­off be­tween de­sir­able prop­er­ties for baseline choices in im­pact measures

Vika4 Jul 2020 11:56 UTC
37 points
24 comments5 min readLW link

Com­pe­ti­tion: Am­plify Ro­hin’s Pre­dic­tion on AGI re­searchers & Safety Concerns

stuhlmueller21 Jul 2020 20:06 UTC
80 points
40 comments3 min readLW link

the scal­ing “in­con­sis­tency”: openAI’s new insight

nostalgebraist7 Nov 2020 7:40 UTC
126 points
11 comments9 min readLW link
(nostalgebraist.tumblr.com)

2019 Re­view Rewrite: Seek­ing Power is Often Ro­bustly In­stru­men­tal in MDPs

TurnTrout23 Dec 2020 17:16 UTC
35 points
0 comments4 min readLW link
(www.lesswrong.com)

Boot­strapped Alignment

G Gordon Worley III27 Feb 2021 15:46 UTC
14 points
12 comments2 min readLW link

Mul­ti­modal Neu­rons in Ar­tifi­cial Neu­ral Networks

Kaj_Sotala5 Mar 2021 9:01 UTC
56 points
2 comments2 min readLW link
(distill.pub)

Re­view of “Fun with +12 OOMs of Com­pute”

28 Mar 2021 14:55 UTC
52 points
18 comments8 min readLW link

Dis­con­tin­u­ous progress in his­tory: an update

KatjaGrace14 Apr 2020 0:00 UTC
163 points
23 comments31 min readLW link
(aiimpacts.org)

Repli­ca­tion Dy­nam­ics Bridge to RL in Ther­mo­dy­namic Limit

Zachary Robertson18 May 2020 1:02 UTC
6 points
1 comment2 min readLW link

The ground of optimization

alexflint20 Jun 2020 0:38 UTC
158 points
64 comments27 min readLW link

Model­ling Con­tin­u­ous Progress

SDM23 Jun 2020 18:06 UTC
29 points
3 comments7 min readLW link

Clas­sifi­ca­tion of AI al­ign­ment re­search: de­con­fu­sion, “good enough” non-su­per­in­tel­li­gent AI al­ign­ment, su­per­in­tel­li­gent AI alignment

philip_b14 Jul 2020 22:48 UTC
35 points
25 comments3 min readLW link

Col­lec­tion of GPT-3 results

Kaj_Sotala18 Jul 2020 20:04 UTC
83 points
24 comments1 min readLW link
(twitter.com)

Hiring en­g­ineers and re­searchers to help al­ign GPT-3

paulfchristiano1 Oct 2020 18:54 UTC
205 points
13 comments3 min readLW link

The date of AI Takeover is not the day the AI takes over

Daniel Kokotajlo22 Oct 2020 10:41 UTC
96 points
23 comments2 min readLW link

[Question] What could one do with truly un­limited com­pu­ta­tional power?

Yitz11 Nov 2020 10:03 UTC
29 points
22 comments2 min readLW link

AGI Predictions

21 Nov 2020 3:46 UTC
103 points
35 comments4 min readLW link

[Question] What are the best prece­dents for in­dus­tries failing to in­vest in valuable AI re­search?

Daniel Kokotajlo14 Dec 2020 23:57 UTC
18 points
17 comments1 min readLW link

Ex­trap­o­lat­ing GPT-N performance

Lanrian18 Dec 2020 21:41 UTC
75 points
29 comments25 min readLW link

De­bate up­date: Obfus­cated ar­gu­ments problem

Beth Barnes23 Dec 2020 3:24 UTC
105 points
20 comments16 min readLW link

Liter­a­ture Re­view on Goal-Directedness

18 Jan 2021 11:15 UTC
58 points
21 comments31 min readLW link

An Un­trol­lable Math­e­mat­i­cian Illustrated

abramdemski20 Mar 2018 0:00 UTC
150 points
38 comments1 min readLW link

Con­di­tions for Mesa-Optimization

1 Jun 2019 20:52 UTC
62 points
47 comments12 min readLW link

Thoughts on Hu­man Models

21 Feb 2019 9:10 UTC
111 points
31 comments10 min readLW link2 nominations1 review

In­ner al­ign­ment in the brain

Steven Byrnes22 Apr 2020 13:14 UTC
74 points
16 comments15 min readLW link

Prob­lem re­lax­ation as a tactic

TurnTrout22 Apr 2020 23:44 UTC
97 points
8 comments7 min readLW link

[Question] How should po­ten­tial AI al­ign­ment re­searchers gauge whether the field is right for them?

TurnTrout6 May 2020 12:24 UTC
20 points
5 comments1 min readLW link

Speci­fi­ca­tion gam­ing: the flip side of AI ingenuity

6 May 2020 23:51 UTC
45 points
8 comments6 min readLW link

Les­sons from Isaac: Pit­falls of Reason

adamShimi8 May 2020 20:44 UTC
9 points
0 comments8 min readLW link

Cor­rigi­bil­ity as out­side view

TurnTrout8 May 2020 21:56 UTC
36 points
11 comments4 min readLW link

[Question] How to choose a PhD with AI Safety in mind

Ariel Kwiatkowski15 May 2020 22:19 UTC
9 points
1 comment1 min readLW link

Re­ward func­tions and up­dat­ing as­sump­tions can hide a mul­ti­tude of sins

Stuart_Armstrong18 May 2020 15:18 UTC
16 points
2 comments9 min readLW link

Pos­si­ble take­aways from the coro­n­avirus pan­demic for slow AI takeoff

Vika31 May 2020 17:51 UTC
128 points
35 comments3 min readLW link

Fo­cus: you are al­lowed to be bad at ac­com­plish­ing your goals

adamShimi3 Jun 2020 21:04 UTC
19 points
17 comments3 min readLW link

Re­ply to Paul Chris­ti­ano on Inac­cessible Information

alexflint5 Jun 2020 9:10 UTC
76 points
15 comments6 min readLW link

Our take on CHAI’s re­search agenda in un­der 1500 words

alexflint17 Jun 2020 12:24 UTC
95 points
19 comments5 min readLW link

[Question] Ques­tion on GPT-3 Ex­cel Demo

Zhitao Hou22 Jun 2020 20:31 UTC
0 points
2 comments1 min readLW link

Dy­namic in­con­sis­tency of the in­ac­tion and ini­tial state baseline

Stuart_Armstrong7 Jul 2020 12:02 UTC
30 points
8 comments2 min readLW link

Cortés, Pizarro, and Afonso as Prece­dents for Takeover

Daniel Kokotajlo1 Mar 2020 3:49 UTC
114 points
70 comments11 min readLW link

[Question] What prob­lem would you like to see Re­in­force­ment Learn­ing ap­plied to?

Julian Schrittwieser8 Jul 2020 2:40 UTC
45 points
4 comments1 min readLW link

Refram­ing Su­per­in­tel­li­gence: Com­pre­hen­sive AI Ser­vices as Gen­eral Intelligence

rohinmshah8 Jan 2019 7:12 UTC
93 points
74 comments5 min readLW link2 nominations2 reviews
(www.fhi.ox.ac.uk)

My cur­rent frame­work for think­ing about AGI timelines

zhukeepa30 Mar 2020 1:23 UTC
101 points
5 comments3 min readLW link

[Question] To what ex­tent is GPT-3 ca­pa­ble of rea­son­ing?

TurnTrout20 Jul 2020 17:10 UTC
70 points
74 comments16 min readLW link

Repli­cat­ing the repli­ca­tion crisis with GPT-3?

skybrian22 Jul 2020 21:20 UTC
29 points
10 comments1 min readLW link

Can you get AGI from a Trans­former?

Steven Byrnes23 Jul 2020 15:27 UTC
86 points
28 comments11 min readLW link

Writ­ing with GPT-3

Jacob Falkovich24 Jul 2020 15:22 UTC
41 points
0 comments4 min readLW link

In­ner Align­ment: Ex­plain like I’m 12 Edition

Rafael Harth1 Aug 2020 15:24 UTC
122 points
13 comments12 min readLW link

Devel­op­men­tal Stages of GPTs

orthonormal26 Jul 2020 22:03 UTC
127 points
73 comments7 min readLW link

Gen­er­al­iz­ing the Power-Seek­ing Theorems

TurnTrout27 Jul 2020 0:28 UTC
40 points
6 comments4 min readLW link

Are we in an AI over­hang?

Andy Jones27 Jul 2020 12:48 UTC
229 points
92 comments4 min readLW link

[Question] What spe­cific dan­gers arise when ask­ing GPT-N to write an Align­ment Fo­rum post?

Matthew Barnett28 Jul 2020 2:56 UTC
43 points
14 comments1 min readLW link

[Question] Prob­a­bil­ity that other ar­chi­tec­tures will scale as well as Trans­form­ers?

Daniel Kokotajlo28 Jul 2020 19:36 UTC
22 points
4 comments1 min readLW link

What a 20-year-lead in mil­i­tary tech might look like

Daniel Kokotajlo29 Jul 2020 20:10 UTC
62 points
44 comments16 min readLW link

[Question] What if memes are com­mon in highly ca­pa­ble minds?

Daniel Kokotajlo30 Jul 2020 20:45 UTC
32 points
8 comments2 min readLW link

Three men­tal images from think­ing about AGI de­bate & corrigibility

Steven Byrnes3 Aug 2020 14:29 UTC
50 points
35 comments4 min readLW link

Solv­ing Key Align­ment Prob­lems Group

elriggs3 Aug 2020 19:30 UTC
19 points
7 comments2 min readLW link

How eas­ily can we sep­a­rate a friendly AI in de­sign space from one which would bring about a hy­per­ex­is­ten­tial catas­tro­phe?

Anirandis10 Sep 2020 0:40 UTC
18 points
20 comments2 min readLW link

My com­pu­ta­tional frame­work for the brain

Steven Byrnes14 Sep 2020 14:19 UTC
124 points
25 comments12 min readLW link

[Question] Where is hu­man level on text pre­dic­tion? (GPTs task)

Daniel Kokotajlo20 Sep 2020 9:00 UTC
24 points
18 comments1 min readLW link

Needed: AI in­fo­haz­ard policy

Vanessa Kosoy21 Sep 2020 15:26 UTC
49 points
17 comments2 min readLW link

The Col­lid­ing Ex­po­nen­tials of AI

VermillionStuka14 Oct 2020 23:31 UTC
27 points
14 comments5 min readLW link

“Lit­tle glimpses of em­pa­thy” as the foun­da­tion for so­cial emotions

Steven Byrnes22 Oct 2020 11:02 UTC
25 points
0 comments5 min readLW link

In­tro­duc­tion to Carte­sian Frames

Scott Garrabrant22 Oct 2020 13:00 UTC
139 points
26 comments22 min readLW link

“Carte­sian Frames” Talk #2 this Sun­day at 2pm (PT)

Rob Bensinger28 Oct 2020 13:59 UTC
30 points
0 comments1 min readLW link

Does SGD Pro­duce De­cep­tive Align­ment?

Mark Xu6 Nov 2020 23:48 UTC
54 points
2 comments16 min readLW link

[Question] How can I bet on short timelines?

Daniel Kokotajlo7 Nov 2020 12:44 UTC
41 points
16 comments2 min readLW link

Non-Ob­struc­tion: A Sim­ple Con­cept Mo­ti­vat­ing Corrigibility

TurnTrout21 Nov 2020 19:35 UTC
63 points
19 comments19 min readLW link

Carte­sian Frames Definitions

Rob Bensinger8 Nov 2020 12:44 UTC
24 points
0 comments4 min readLW link

Com­mu­ni­ca­tion Prior as Align­ment Strategy

johnswentworth12 Nov 2020 22:06 UTC
36 points
7 comments6 min readLW link

How Rood­man’s GWP model trans­lates to TAI timelines

Daniel Kokotajlo16 Nov 2020 14:05 UTC
20 points
5 comments3 min readLW link

Normativity

abramdemski18 Nov 2020 16:52 UTC
46 points
11 comments9 min readLW link

In­ner Align­ment in Salt-Starved Rats

Steven Byrnes19 Nov 2020 2:40 UTC
111 points
31 comments11 min readLW link

Con­tin­u­ing the take­offs debate

Richard_Ngo23 Nov 2020 15:58 UTC
65 points
13 comments9 min readLW link

The next AI win­ter will be due to en­ergy costs

hippke24 Nov 2020 16:53 UTC
45 points
6 comments2 min readLW link

Re­cur­sive Quan­tiliz­ers II

abramdemski2 Dec 2020 15:26 UTC
25 points
15 comments13 min readLW link

Su­per­vised learn­ing in the brain, part 4: com­pres­sion /​ filtering

Steven Byrnes5 Dec 2020 17:06 UTC
12 points
0 comments5 min readLW link

Con­ser­vatism in neo­cor­tex-like AGIs

Steven Byrnes8 Dec 2020 16:37 UTC
21 points
4 comments8 min readLW link

Avoid­ing Side Effects in Com­plex Environments

12 Dec 2020 0:34 UTC
61 points
9 comments2 min readLW link
(avoiding-side-effects.github.io)

The Power of Annealing

meanderingmoose14 Dec 2020 11:02 UTC
20 points
6 comments5 min readLW link

[link] The AI Gir­lfriend Se­duc­ing China’s Lonely Men

Kaj_Sotala14 Dec 2020 20:18 UTC
32 points
11 comments1 min readLW link
(www.sixthtone.com)

Oper­a­tional­iz­ing com­pat­i­bil­ity with strat­egy-stealing

evhub24 Dec 2020 22:36 UTC
41 points
6 comments4 min readLW link

De­fus­ing AGI Danger

Mark Xu24 Dec 2020 22:58 UTC
45 points
9 comments9 min readLW link

Multi-di­men­sional re­wards for AGI in­ter­pretabil­ity and control

Steven Byrnes4 Jan 2021 3:08 UTC
10 points
5 comments10 min readLW link

DALL-E by OpenAI

Daniel Kokotajlo5 Jan 2021 20:05 UTC
96 points
22 comments1 min readLW link

Re­view of ‘But ex­actly how com­plex and frag­ile?’

TurnTrout6 Jan 2021 18:39 UTC
49 points
0 comments8 min readLW link

The Case for a Jour­nal of AI Alignment

adamShimi9 Jan 2021 18:13 UTC
42 points
29 comments4 min readLW link

Trans­parency and AGI safety

jylin0411 Jan 2021 18:51 UTC
50 points
12 comments30 min readLW link

Birds, Brains, Planes, and AI: Against Ap­peals to the Com­plex­ity/​Mys­te­ri­ous­ness/​Effi­ciency of the Brain

Daniel Kokotajlo18 Jan 2021 12:08 UTC
166 points
74 comments14 min readLW link

In­fra-Bayesi­anism Unwrapped

adamShimi20 Jan 2021 13:35 UTC
19 points
0 comments24 min readLW link

Op­ti­mal play in hu­man-judged De­bate usu­ally won’t an­swer your question

Joe_Collman27 Jan 2021 7:34 UTC
32 points
8 comments12 min readLW link

Creat­ing AGI Safety Interlocks

Koen.Holtman5 Feb 2021 12:01 UTC
7 points
4 comments8 min readLW link

Timeline of AI safety

riceissa7 Feb 2021 22:29 UTC
58 points
6 comments2 min readLW link
(timelines.issarice.com)

Tour­ne­sol, YouTube and AI Risk

adamShimi12 Feb 2021 18:56 UTC
35 points
13 comments4 min readLW link

In­ter­net En­cy­clo­pe­dia of Philos­o­phy on Ethics of Ar­tifi­cial Intelligence

Kaj_Sotala20 Feb 2021 13:54 UTC
15 points
1 comment4 min readLW link
(iep.utm.edu)

Be­hav­ioral Suffi­cient Statis­tics for Goal-Directedness

adamShimi11 Mar 2021 15:01 UTC
21 points
12 comments9 min readLW link

A sim­ple way to make GPT-3 fol­low instructions

Quintin Pope8 Mar 2021 2:57 UTC
6 points
5 comments4 min readLW link

Towards a Mechanis­tic Un­der­stand­ing of Goal-Directedness

Mark Xu9 Mar 2021 20:17 UTC
39 points
1 comment5 min readLW link

AXRP Epi­sode 5 - In­fra-Bayesi­anism with Vanessa Kosoy

DanielFilan10 Mar 2021 4:30 UTC
26 points
11 comments35 min readLW link

Com­ments on “The Sin­gu­lar­ity is Nowhere Near”

Steven Byrnes16 Mar 2021 23:59 UTC
47 points
5 comments8 min readLW link

Is RL in­volved in sen­sory pro­cess­ing?

Steven Byrnes18 Mar 2021 13:57 UTC
14 points
4 comments5 min readLW link

Against evolu­tion as an anal­ogy for how hu­mans will cre­ate AGI

Steven Byrnes23 Mar 2021 12:29 UTC
38 points
25 comments25 min readLW link

My AGI Threat Model: Misal­igned Model-Based RL Agent

Steven Byrnes25 Mar 2021 13:45 UTC
62 points
29 comments16 min readLW link

Co­her­ence ar­gu­ments im­ply a force for goal-di­rected behavior

KatjaGrace26 Mar 2021 16:10 UTC
66 points
13 comments14 min readLW link
(aiimpacts.org)

Trans­parency Trichotomy

Mark Xu28 Mar 2021 20:26 UTC
20 points
2 comments7 min readLW link

Hard­ware is already ready for the sin­gu­lar­ity. Al­gorithm knowl­edge is the only bar­rier.

Andrew Vlahos30 Mar 2021 22:48 UTC
14 points
3 comments3 min readLW link

Ben Go­ertzel’s “Kinds of Minds”

JoshuaFox11 Apr 2021 12:41 UTC
12 points
4 comments1 min readLW link

Up­dat­ing the Lot­tery Ticket Hypothesis

johnswentworth18 Apr 2021 21:45 UTC
40 points
5 comments2 min readLW link

[AN #94]: AI al­ign­ment as trans­la­tion be­tween hu­mans and machines

rohinmshah8 Apr 2020 17:10 UTC
11 points
0 comments7 min readLW link
(mailchi.mp)

[Question] What are the rel­a­tive speeds of AI ca­pa­bil­ities and AI safety?

NunoSempere24 Apr 2020 18:21 UTC
8 points
2 comments1 min readLW link

Seek­ing Power is Often Ro­bustly In­stru­men­tal in MDPs

5 Dec 2019 2:33 UTC
133 points
34 comments17 min readLW link2 nominations2 reviews
(arxiv.org)

“Don’t even think about hell”

emmab2 May 2020 8:06 UTC
6 points
2 comments1 min readLW link

[Question] AI Box­ing for Hard­ware-bound agents (aka the China al­ign­ment prob­lem)

Logan Zoellner8 May 2020 15:50 UTC
11 points
27 comments10 min readLW link

Could We Give an AI a Solu­tion?

Liam Goddard15 May 2020 21:38 UTC
3 points
2 comments2 min readLW link

Point­ing to a Flower

johnswentworth18 May 2020 18:54 UTC
54 points
18 comments9 min readLW link

Learn­ing and ma­nipu­lat­ing learning

Stuart_Armstrong19 May 2020 13:02 UTC
39 points
4 comments10 min readLW link

[Question] Why aren’t we test­ing gen­eral in­tel­li­gence dis­tri­bu­tion?

Bob Jacobs26 May 2020 16:07 UTC
25 points
7 comments1 min readLW link

OpenAI an­nounces GPT-3

gwern29 May 2020 1:49 UTC
67 points
23 comments1 min readLW link
(arxiv.org)

GPT-3: a dis­ap­point­ing paper

nostalgebraist29 May 2020 19:06 UTC
59 points
37 comments8 min readLW link

In­tro­duc­tion to Ex­is­ten­tial Risks from Ar­tifi­cial In­tel­li­gence, for an EA audience

JoshuaFox2 Jun 2020 8:30 UTC
10 points
1 comment1 min readLW link

Prepar­ing for “The Talk” with AI projects

Daniel Kokotajlo13 Jun 2020 23:01 UTC
62 points
16 comments3 min readLW link

[Question] What are the high-level ap­proaches to AI al­ign­ment?

G Gordon Worley III16 Jun 2020 17:10 UTC
12 points
13 comments1 min readLW link

Re­sults of $1,000 Or­a­cle con­test!

Stuart_Armstrong17 Jun 2020 17:44 UTC
55 points
2 comments1 min readLW link

[Question] Like­li­hood of hy­per­ex­is­ten­tial catas­tro­phe from a bug?

Anirandis18 Jun 2020 16:23 UTC
11 points
27 comments1 min readLW link

AI Benefits Post 1: In­tro­duc­ing “AI Benefits”

Cullen_OKeefe22 Jun 2020 16:59 UTC
11 points
3 comments3 min readLW link

Goals and short descriptions

Michele Campolo2 Jul 2020 17:41 UTC
14 points
8 comments5 min readLW link

Re­search ideas to study hu­mans with AI Safety in mind

Riccardo Volpato3 Jul 2020 16:01 UTC
21 points
2 comments5 min readLW link

AI Benefits Post 3: Direct and Indi­rect Ap­proaches to AI Benefits

Cullen_OKeefe6 Jul 2020 18:48 UTC
8 points
0 comments2 min readLW link

An­titrust-Com­pli­ant AI In­dus­try Self-Regulation

Cullen_OKeefe7 Jul 2020 20:53 UTC
9 points
3 comments1 min readLW link
(cullenokeefe.com)

Should AI Be Open?

Scott Alexander17 Dec 2015 8:25 UTC
16 points
2 comments13 min readLW link

Meta Pro­gram­ming GPT: A route to Su­per­in­tel­li­gence?

dmtea11 Jul 2020 14:51 UTC
10 points
7 comments4 min readLW link

The Dilemma of Worse Than Death Scenarios

arkaeik10 Jul 2018 9:18 UTC
6 points
17 comments4 min readLW link

[Question] What are the mostly likely ways AGI will emerge?

Craig Quiter14 Jul 2020 0:58 UTC
3 points
7 comments1 min readLW link

AI Benefits Post 4: Out­stand­ing Ques­tions on Select­ing Benefits

Cullen_OKeefe14 Jul 2020 17:26 UTC
4 points
4 comments5 min readLW link

Solv­ing Math Prob­lems by Relay

17 Jul 2020 15:32 UTC
88 points
26 comments7 min readLW link

AI Benefits Post 5: Out­stand­ing Ques­tions on Govern­ing Benefits

Cullen_OKeefe21 Jul 2020 16:46 UTC
4 points
0 comments4 min readLW link

[Question] Why is pseudo-al­ign­ment “worse” than other ways ML can fail to gen­er­al­ize?

nostalgebraist18 Jul 2020 22:54 UTC
43 points
9 comments2 min readLW link

[Question] “Do Noth­ing” util­ity func­tion, 3½ years later?

niplav20 Jul 2020 11:09 UTC
5 points
3 comments1 min readLW link

[AN #80]: Why AI risk might be solved with­out ad­di­tional in­ter­ven­tion from longtermists

rohinmshah2 Jan 2020 18:20 UTC
35 points
93 comments10 min readLW link
(mailchi.mp)

Ac­cess to AI: a hu­man right?

dmtea25 Jul 2020 9:38 UTC
5 points
3 comments2 min readLW link

The Rise of Com­mon­sense Reasoning

DragonGod27 Jul 2020 19:01 UTC
8 points
0 comments1 min readLW link
(www.reddit.com)

AI and Efficiency

DragonGod27 Jul 2020 20:58 UTC
9 points
1 comment1 min readLW link
(openai.com)

FHI Re­port: How Will Na­tional Se­cu­rity Con­sid­er­a­tions Affect An­titrust De­ci­sions in AI? An Ex­am­i­na­tion of His­tor­i­cal Precedents

Cullen_OKeefe28 Jul 2020 18:34 UTC
2 points
0 comments1 min readLW link
(www.fhi.ox.ac.uk)

The “best pre­dic­tor is mal­i­cious op­ti­miser” problem

Donald Hobson29 Jul 2020 11:49 UTC
14 points
10 comments2 min readLW link

Suffi­ciently Ad­vanced Lan­guage Models Can Do Re­in­force­ment Learning

Zachary Robertson2 Aug 2020 15:32 UTC
23 points
7 comments7 min readLW link

[Question] What are the most im­por­tant pa­pers/​post/​re­sources to read to un­der­stand more of GPT-3?

adamShimi2 Aug 2020 20:53 UTC
22 points
4 comments1 min readLW link

[Question] What should an Ein­stein-like figure in Ma­chine Learn­ing do?

Razied5 Aug 2020 23:52 UTC
3 points
3 comments1 min readLW link

Book re­view: Ar­chi­tects of In­tel­li­gence by Martin Ford (2018)

ofer11 Aug 2020 17:30 UTC
15 points
0 comments2 min readLW link

[Question] Will OpenAI’s work un­in­ten­tion­ally in­crease ex­is­ten­tial risks re­lated to AI?

adamShimi11 Aug 2020 18:16 UTC
48 points
54 comments1 min readLW link

Blog post: A tale of two re­search communities

alenglander12 Aug 2020 20:41 UTC
14 points
0 comments4 min readLW link

Map­ping Out Alignment

15 Aug 2020 1:02 UTC
42 points
0 comments5 min readLW link

My Un­der­stand­ing of Paul Chris­ti­ano’s Iter­ated Am­plifi­ca­tion AI Safety Re­search Agenda

Chi Nguyen15 Aug 2020 20:02 UTC
113 points
21 comments39 min readLW link

GPT-3, be­lief, and consistency

skybrian16 Aug 2020 23:12 UTC
18 points
7 comments2 min readLW link

[Question] What pre­cisely do we mean by AI al­ign­ment?

G Gordon Worley III9 Dec 2018 2:23 UTC
27 points
8 comments1 min readLW link

Thoughts on the Fea­si­bil­ity of Pro­saic AGI Align­ment?

iamthouthouarti21 Aug 2020 23:25 UTC
8 points
10 comments1 min readLW link

[Question] Fore­cast­ing Thread: AI Timelines

22 Aug 2020 2:33 UTC
114 points
87 comments2 min readLW link

Learn­ing hu­man prefer­ences: black-box, white-box, and struc­tured white-box access

Stuart_Armstrong24 Aug 2020 11:42 UTC
23 points
9 comments6 min readLW link

Proofs Sec­tion 2.3 (Up­dates, De­ci­sion The­ory)

Diffractor27 Aug 2020 7:49 UTC
7 points
0 comments31 min readLW link

Proofs Sec­tion 2.2 (Iso­mor­phism to Ex­pec­ta­tions)

Diffractor27 Aug 2020 7:52 UTC
7 points
0 comments46 min readLW link

Proofs Sec­tion 2.1 (The­o­rem 1, Lem­mas)

Diffractor27 Aug 2020 7:54 UTC
7 points
0 comments36 min readLW link

Proofs Sec­tion 1.1 (Ini­tial re­sults to LF-du­al­ity)

Diffractor27 Aug 2020 7:59 UTC
6 points
0 comments20 min readLW link

Proofs Sec­tion 1.2 (Mix­tures, Up­dates, Push­for­wards)

Diffractor27 Aug 2020 7:57 UTC
7 points
0 comments14 min readLW link

Ba­sic In­framea­sure Theory

Diffractor27 Aug 2020 8:02 UTC
20 points
10 comments25 min readLW link

Belief Func­tions And De­ci­sion Theory

Diffractor27 Aug 2020 8:00 UTC
12 points
8 comments39 min readLW link

Tech­ni­cal model re­fine­ment formalism

Stuart_Armstrong27 Aug 2020 11:54 UTC
9 points
0 comments6 min readLW link

Pong from pix­els with­out read­ing “Pong from Pix­els”

naimenz29 Aug 2020 17:26 UTC
15 points
1 comment7 min readLW link

Reflec­tions on AI Timelines Fore­cast­ing Thread

Amandango1 Sep 2020 1:42 UTC
53 points
7 comments5 min readLW link

on “learn­ing to sum­ma­rize”

nostalgebraist12 Sep 2020 3:20 UTC
22 points
13 comments8 min readLW link
(nostalgebraist.tumblr.com)

[Question] The uni­ver­sal­ity of com­pu­ta­tion and mind de­sign space

alanf12 Sep 2020 14:58 UTC
1 point
7 comments1 min readLW link

Clar­ify­ing “What failure looks like” (part 1)

Sam Clarke20 Sep 2020 20:40 UTC
69 points
13 comments17 min readLW link

Hu­man Bi­ases that Ob­scure AI Progress

Phylliida Dev25 Sep 2020 0:24 UTC
42 points
2 comments4 min readLW link

[Question] Com­pe­tence vs Alignment

Ariel Kwiatkowski30 Sep 2020 21:03 UTC
6 points
4 comments1 min readLW link

[Question] GPT-3 + GAN

stick10917 Oct 2020 7:58 UTC
4 points
2 comments1 min readLW link

Book Re­view: Re­in­force­ment Learn­ing by Sut­ton and Barto

billmei20 Oct 2020 19:40 UTC
47 points
3 comments10 min readLW link

GPT-X, Paper­clip Max­i­mizer? An­a­lyz­ing AGI and Fi­nal Goals

meanderingmoose22 Oct 2020 14:33 UTC
8 points
1 comment6 min readLW link

Con­tain­ing the AI… In­side a Si­mu­lated Reality

HumaneAutomation31 Oct 2020 16:16 UTC
1 point
5 comments2 min readLW link

Why those who care about catas­trophic and ex­is­ten­tial risk should care about au­tonomous weapons

aaguirre11 Nov 2020 15:22 UTC
48 points
20 comments19 min readLW link

Euro­pean Master’s Pro­grams in Ma­chine Learn­ing, Ar­tifi­cial In­tel­li­gence, and re­lated fields

Master Programs ML/AI14 Nov 2020 15:51 UTC
25 points
8 comments1 min readLW link

Should we post­pone AGI un­til we reach safety?

otto.barten18 Nov 2020 15:43 UTC
23 points
36 comments3 min readLW link

Com­mit­ment and cred­i­bil­ity in mul­ti­po­lar AI scenarios

anni_leskela4 Dec 2020 18:48 UTC
25 points
3 comments18 min readLW link

[Question] AI Win­ter Is Com­ing—How to profit from it?

maximkazhenkov5 Dec 2020 20:23 UTC
10 points
7 comments1 min readLW link

An­nounc­ing the Tech­ni­cal AI Safety Podcast

Quinn7 Dec 2020 18:51 UTC
42 points
4 comments2 min readLW link
(technical-ai-safety.libsyn.com)

All GPT skills are translation

p.b.13 Dec 2020 20:06 UTC
4 points
0 comments2 min readLW link

[Question] Judg­ing AGI Output

meredev14 Dec 2020 12:43 UTC
3 points
0 comments2 min readLW link

Risk Map of AI Systems

15 Dec 2020 9:16 UTC
24 points
3 comments8 min readLW link

AI Align­ment, Philo­soph­i­cal Plu­ral­ism, and the Rele­vance of Non-Western Philosophy

xuan1 Jan 2021 0:08 UTC
28 points
19 comments20 min readLW link

Are we all mis­al­igned?

Mateusz Mazurkiewicz3 Jan 2021 2:42 UTC
10 points
0 comments5 min readLW link

[Question] What do we *re­ally* ex­pect from a well-al­igned AI?

jan betley4 Jan 2021 20:57 UTC
8 points
10 comments1 min readLW link

Eight claims about multi-agent AGI safety

Richard_Ngo7 Jan 2021 13:34 UTC
69 points
18 comments4 min readLW link

Imi­ta­tive Gen­er­al­i­sa­tion (AKA ‘Learn­ing the Prior’)

Beth Barnes10 Jan 2021 0:30 UTC
74 points
12 comments12 min readLW link

Pre­dic­tion can be Outer Aligned at Optimum

Lanrian10 Jan 2021 18:48 UTC
13 points
11 comments11 min readLW link

[Question] Poll: Which vari­ables are most strate­gi­cally rele­vant?

22 Jan 2021 17:17 UTC
32 points
34 comments1 min readLW link

AISU 2021

Linda Linsefors30 Jan 2021 17:40 UTC
27 points
2 comments1 min readLW link

Deep­mind has made a gen­eral in­duc­tor (“Mak­ing sense of sen­sory in­put”)

MakoYass2 Feb 2021 2:54 UTC
46 points
10 comments1 min readLW link
(www.sciencedirect.com)

Coun­ter­fac­tual Plan­ning in AGI Systems

Koen.Holtman3 Feb 2021 13:54 UTC
5 points
0 comments5 min readLW link

[AN #136]: How well will GPT-N perform on down­stream tasks?

rohinmshah3 Feb 2021 18:10 UTC
21 points
2 comments9 min readLW link
(mailchi.mp)

For­mal Solu­tion to the In­ner Align­ment Problem

michaelcohen18 Feb 2021 14:51 UTC
46 points
122 comments2 min readLW link

TASP Ep 3 - Op­ti­mal Poli­cies Tend to Seek Power

Quinn11 Mar 2021 1:44 UTC
24 points
0 comments1 min readLW link
(technical-ai-safety.libsyn.com)

Phy­lac­tery De­ci­sion Theory

Bunthut2 Apr 2021 20:55 UTC
14 points
6 comments2 min readLW link

Pre­dic­tive Cod­ing has been Unified with Backpropagation

lsusr2 Apr 2021 21:42 UTC
142 points
42 comments2 min readLW link

[Question] What if we could use the the­ory of Mechanism De­sign from Game The­ory as a medium achieve AI Align­ment?

farari74 Apr 2021 12:56 UTC
4 points
0 comments1 min readLW link

A Sys­tem For Evolv­ing In­creas­ingly Gen­eral Ar­tifi­cial In­tel­li­gence From Cur­rent Technologies

Tsang Chung Shu8 Apr 2021 21:37 UTC
1 point
3 comments11 min readLW link

An Ortho­dox Case Against Utility Functions

abramdemski7 Apr 2020 19:18 UTC
113 points
49 comments8 min readLW link

2018 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

Larks18 Dec 2018 4:46 UTC
190 points
26 comments62 min readLW link

Real­ism about rationality

Richard_Ngo16 Sep 2018 10:46 UTC
172 points
139 comments4 min readLW link
(thinkingcomplete.blogspot.com)

De­bate on In­stru­men­tal Con­ver­gence be­tween LeCun, Rus­sell, Ben­gio, Zador, and More

Ben Pace4 Oct 2019 4:08 UTC
177 points
54 comments15 min readLW link2 nominations2 reviews

The Parable of Pre­dict-O-Matic

abramdemski15 Oct 2019 0:49 UTC
245 points
41 comments14 min readLW link5 nominations4 reviews

“How con­ser­va­tive” should the par­tial max­imisers be?

Stuart_Armstrong13 Apr 2020 15:50 UTC
21 points
8 comments2 min readLW link

[AN #95]: A frame­work for think­ing about how to make AI go well

rohinmshah15 Apr 2020 17:10 UTC
20 points
2 comments10 min readLW link
(mailchi.mp)

AI Align­ment Pod­cast: An Overview of Tech­ni­cal AI Align­ment in 2018 and 2019 with Buck Sh­legeris and Ro­hin Shah

Palus Astra16 Apr 2020 0:50 UTC
46 points
27 comments89 min readLW link

Open ques­tion: are min­i­mal cir­cuits dae­mon-free?

paulfchristiano5 May 2018 22:40 UTC
79 points
69 comments2 min readLW link

Disen­tan­gling ar­gu­ments for the im­por­tance of AI safety

Richard_Ngo21 Jan 2019 12:41 UTC
123 points
23 comments8 min readLW link

[AI Align­ment Fo­rum] Database Main­te­nance Today

habryka16 Apr 2020 19:11 UTC
8 points
0 comments1 min readLW link

In­te­grat­ing Hid­den Vari­ables Im­proves Approximation

johnswentworth16 Apr 2020 21:43 UTC
15 points
4 comments1 min readLW link

AI Ser­vices as a Re­search Paradigm

VojtaKovarik20 Apr 2020 13:00 UTC
30 points
12 comments4 min readLW link
(docs.google.com)

Databases of hu­man be­havi­our and prefer­ences?

Stuart_Armstrong21 Apr 2020 18:06 UTC
10 points
9 comments1 min readLW link

Critch on ca­reer ad­vice for ju­nior AI-x-risk-con­cerned researchers

Rob Bensinger12 May 2018 2:13 UTC
110 points
25 comments4 min readLW link

Refram­ing Impact

TurnTrout20 Sep 2019 19:03 UTC
84 points
14 comments3 min readLW link2 nominations1 review

De­scrip­tion vs simu­lated prediction

Richard Korzekwa 22 Apr 2020 16:40 UTC
26 points
0 comments5 min readLW link
(aiimpacts.org)

Deep­Mind team on speci­fi­ca­tion gaming

JoshuaFox23 Apr 2020 8:01 UTC
30 points
2 comments1 min readLW link
(deepmind.com)

[Question] Does Agent-like Be­hav­ior Im­ply Agent-like Ar­chi­tec­ture?

Scott Garrabrant23 Aug 2019 2:01 UTC
38 points
7 comments1 min readLW link

Risks from Learned Op­ti­miza­tion: Con­clu­sion and Re­lated Work

7 Jun 2019 19:53 UTC
70 points
4 comments6 min readLW link

De­cep­tive Alignment

5 Jun 2019 20:16 UTC
69 points
11 comments17 min readLW link

The In­ner Align­ment Problem

4 Jun 2019 1:20 UTC
76 points
17 comments13 min readLW link

How the MtG Color Wheel Ex­plains AI Safety

Scott Garrabrant15 Feb 2019 23:42 UTC
56 points
4 comments6 min readLW link

[Question] How does Gra­di­ent Des­cent In­ter­act with Good­hart?

Scott Garrabrant2 Feb 2019 0:14 UTC
68 points
19 comments4 min readLW link

For­mal Open Prob­lem in De­ci­sion Theory

Scott Garrabrant29 Nov 2018 3:25 UTC
34 points
11 comments4 min readLW link

The Ubiquitous Con­verse Law­vere Problem

Scott Garrabrant29 Nov 2018 3:16 UTC
21 points
0 comments2 min readLW link

Embed­ded Curiosities

8 Nov 2018 14:19 UTC
85 points
1 comment2 min readLW link

Sub­sys­tem Alignment

6 Nov 2018 16:16 UTC
99 points
12 comments1 min readLW link

Ro­bust Delegation

4 Nov 2018 16:38 UTC
108 points
10 comments1 min readLW link

Embed­ded World-Models

2 Nov 2018 16:07 UTC
85 points
16 comments1 min readLW link

De­ci­sion Theory

31 Oct 2018 18:41 UTC
105 points
38 comments1 min readLW link

(A → B) → A

Scott Garrabrant11 Sep 2018 22:38 UTC
45 points
10 comments2 min readLW link

His­tory of the Devel­op­ment of Log­i­cal Induction

Scott Garrabrant29 Aug 2018 3:15 UTC
87 points
4 comments5 min readLW link

Op­ti­miza­tion Amplifies

Scott Garrabrant27 Jun 2018 1:51 UTC
86 points
12 comments4 min readLW link

What makes coun­ter­fac­tu­als com­pa­rable?

Chris_Leong24 Apr 2020 22:47 UTC
11 points
6 comments3 min readLW link

New Paper Ex­pand­ing on the Good­hart Taxonomy

Scott Garrabrant14 Mar 2018 9:01 UTC
17 points
4 comments1 min readLW link
(arxiv.org)

Sources of in­tu­itions and data on AGI

Scott Garrabrant31 Jan 2018 23:30 UTC
80 points
26 comments3 min readLW link

Corrigibility

paulfchristiano27 Nov 2018 21:50 UTC
40 points
4 comments6 min readLW link

AI pre­dic­tion case study 5: Omo­hun­dro’s AI drives

Stuart_Armstrong15 Mar 2013 9:09 UTC
10 points
5 comments8 min readLW link

Toy model: con­ver­gent in­stru­men­tal goals

Stuart_Armstrong25 Feb 2016 14:03 UTC
15 points
2 comments4 min readLW link

AI-cre­ated pseudo-deontology

Stuart_Armstrong12 Feb 2015 21:11 UTC
10 points
35 comments1 min readLW link

Eth­i­cal Injunctions

Eliezer Yudkowsky20 Oct 2008 23:00 UTC
47 points
76 comments9 min readLW link

Mo­ti­vat­ing Ab­strac­tion-First De­ci­sion Theory

johnswentworth29 Apr 2020 17:47 UTC
39 points
16 comments5 min readLW link

[AN #97]: Are there his­tor­i­cal ex­am­ples of large, ro­bust dis­con­ti­nu­ities?

rohinmshah29 Apr 2020 17:30 UTC
15 points
0 comments10 min readLW link
(mailchi.mp)

My Up­dat­ing Thoughts on AI policy

Ben Pace1 Mar 2020 7:06 UTC
20 points
1 comment9 min readLW link

Use­ful Does Not Mean Secure

Ben Pace30 Nov 2019 2:05 UTC
44 points
12 comments11 min readLW link

[Question] What is the al­ter­na­tive to in­tent al­ign­ment called?

Richard_Ngo30 Apr 2020 2:16 UTC
10 points
6 comments1 min readLW link

Op­ti­mis­ing So­ciety to Con­strain Risk of War from an Ar­tifi­cial Su­per­in­tel­li­gence

JohnCDraper30 Apr 2020 10:47 UTC
3 points
0 comments51 min readLW link

[Question] Juke­box: how to up­date from AI imi­tat­ing hu­mans?

Michaël Trazzi30 Apr 2020 20:50 UTC
9 points
0 comments1 min readLW link

Stan­ford En­cy­clo­pe­dia of Philos­o­phy on AI ethics and superintelligence

Kaj_Sotala2 May 2020 7:35 UTC
41 points
19 comments7 min readLW link
(plato.stanford.edu)

[Question] How does iter­ated am­plifi­ca­tion ex­ceed hu­man abil­ities?

riceissa2 May 2020 23:44 UTC
19 points
9 comments2 min readLW link

How uniform is the neo­cor­tex?

zhukeepa4 May 2020 2:16 UTC
70 points
22 comments11 min readLW link

Scott Garrabrant’s prob­lem on re­cov­er­ing Brouwer as a corol­lary of Lawvere

Rupert4 May 2020 10:01 UTC
25 points
2 comments2 min readLW link

“AI and Effi­ciency”, OA (44✕ im­prove­ment in CNNs since 2012)

gwern5 May 2020 16:32 UTC
47 points
0 comments1 min readLW link
(openai.com)

Com­pet­i­tive safety via gra­dated curricula

Richard_Ngo5 May 2020 18:11 UTC
35 points
5 comments5 min readLW link

Model­ing nat­u­ral­ized de­ci­sion prob­lems in lin­ear logic

jessicata6 May 2020 0:15 UTC
14 points
2 comments6 min readLW link
(unstableontology.com)

[AN #98]: Un­der­stand­ing neu­ral net train­ing by see­ing which gra­di­ents were helpful

rohinmshah6 May 2020 17:10 UTC
22 points
3 comments9 min readLW link
(mailchi.mp)

[Question] Is AI safety re­search less par­alleliz­able than AI re­search?

Mati_Roy10 May 2020 20:43 UTC
9 points
5 comments1 min readLW link

Thoughts on im­ple­ment­ing cor­rigible ro­bust alignment

Steven Byrnes26 Nov 2019 14:06 UTC
26 points
2 comments6 min readLW link

Wire­head­ing is in the eye of the beholder

Stuart_Armstrong30 Jan 2019 18:23 UTC
26 points
10 comments1 min readLW link

Wire­head­ing as a po­ten­tial prob­lem with the new im­pact measure

Stuart_Armstrong25 Sep 2018 14:15 UTC
25 points
20 comments4 min readLW link

Wire­head­ing and discontinuity

Michele Campolo18 Feb 2020 10:49 UTC
21 points
4 comments3 min readLW link

[AN #99]: Dou­bling times for the effi­ciency of AI algorithms

rohinmshah13 May 2020 17:20 UTC
29 points
0 comments10 min readLW link
(mailchi.mp)

How should AIs up­date a prior over hu­man prefer­ences?

Stuart_Armstrong15 May 2020 13:14 UTC
17 points
9 comments2 min readLW link

Con­jec­ture Workshop

johnswentworth15 May 2020 22:41 UTC
34 points
2 comments2 min readLW link

Multi-agent safety

Richard_Ngo16 May 2020 1:59 UTC
24 points
8 comments5 min readLW link

The Mechanis­tic and Nor­ma­tive Struc­ture of Agency

G Gordon Worley III18 May 2020 16:03 UTC
14 points
4 comments1 min readLW link
(philpapers.org)

“Star­wink” by Alicorn

Zack_M_Davis18 May 2020 8:17 UTC
41 points
1 comment1 min readLW link
(alicorn.elcenia.com)

[AN #100]: What might go wrong if you learn a re­ward func­tion while acting

rohinmshah20 May 2020 17:30 UTC
33 points
2 comments12 min readLW link
(mailchi.mp)

Prob­a­bil­ities, weights, sums: pretty much the same for re­ward functions

Stuart_Armstrong20 May 2020 15:19 UTC
11 points
1 comment2 min readLW link

[Question] Source code size vs learned model size in ML and in hu­mans?

riceissa20 May 2020 8:47 UTC
11 points
6 comments1 min readLW link

Com­par­ing re­ward learn­ing/​re­ward tam­per­ing formalisms

Stuart_Armstrong21 May 2020 12:03 UTC
9 points
3 comments3 min readLW link

AGIs as collectives

Richard_Ngo22 May 2020 20:36 UTC
21 points
23 comments4 min readLW link

[AN #101]: Why we should rigor­ously mea­sure and fore­cast AI progress

rohinmshah27 May 2020 17:20 UTC
15 points
0 comments10 min readLW link
(mailchi.mp)

AI Safety Dis­cus­sion Days

Linda Linsefors27 May 2020 16:54 UTC
12 points
1 comment3 min readLW link

Build­ing brain-in­spired AGI is in­finitely eas­ier than un­der­stand­ing the brain

Steven Byrnes2 Jun 2020 14:13 UTC
42 points
7 comments7 min readLW link

Spar­sity and in­ter­pretabil­ity?

1 Jun 2020 13:25 UTC
40 points
3 comments7 min readLW link

GPT-3: A Summary

leogao2 Jun 2020 18:14 UTC
19 points
0 comments1 min readLW link
(leogao.dev)

Inac­cessible information

paulfchristiano3 Jun 2020 5:10 UTC
82 points
15 comments14 min readLW link
(ai-alignment.com)

[AN #102]: Meta learn­ing by GPT-3, and a list of full pro­pos­als for AI alignment

rohinmshah3 Jun 2020 17:20 UTC
38 points
6 comments10 min readLW link
(mailchi.mp)

Feed­back is cen­tral to agency

alexflint1 Jun 2020 12:56 UTC
28 points
0 comments3 min readLW link

Think­ing About Su­per-Hu­man AI: An Ex­am­i­na­tion of Likely Paths and Ul­ti­mate Constitution

meanderingmoose4 Jun 2020 23:22 UTC
−3 points
0 comments7 min readLW link

Emer­gence and Con­trol: An ex­am­i­na­tion of our abil­ity to gov­ern the be­hav­ior of in­tel­li­gent systems

meanderingmoose5 Jun 2020 17:10 UTC
1 point
0 comments6 min readLW link

GAN Discrim­i­na­tors Don’t Gen­er­al­ize?

tryactions8 Jun 2020 20:36 UTC
18 points
7 comments2 min readLW link

More on dis­am­biguat­ing “dis­con­ti­nu­ity”

alenglander9 Jun 2020 15:16 UTC
16 points
1 comment3 min readLW link

[AN #103]: ARCHES: an agenda for ex­is­ten­tial safety, and com­bin­ing nat­u­ral lan­guage with deep RL

rohinmshah10 Jun 2020 17:20 UTC
27 points
1 comment10 min readLW link
(mailchi.mp)

Dutch-Book­ing CDT: Re­vised Argument

abramdemski27 Oct 2020 4:31 UTC
47 points
20 comments16 min readLW link

[Question] List of pub­lic pre­dic­tions of what GPT-X can or can’t do?

Daniel Kokotajlo14 Jun 2020 14:25 UTC
20 points
9 comments1 min readLW link

Achiev­ing AI al­ign­ment through de­liber­ate un­cer­tainty in mul­ti­a­gent systems

Florian Dietz15 Jun 2020 12:19 UTC
3 points
10 comments7 min readLW link

Su­per­ex­po­nen­tial His­toric Growth, by David Roodman

Ben Pace15 Jun 2020 21:49 UTC
43 points
6 comments5 min readLW link
(www.openphilanthropy.org)

Re­lat­ing HCH and Log­i­cal Induction

abramdemski16 Jun 2020 22:08 UTC
49 points
4 comments5 min readLW link

Image GPT

Daniel Kokotajlo18 Jun 2020 11:41 UTC
29 points
27 comments1 min readLW link
(openai.com)

[AN #104]: The per­ils of in­ac­cessible in­for­ma­tion, and what we can learn about AI al­ign­ment from COVID

rohinmshah18 Jun 2020 17:10 UTC
19 points
5 comments8 min readLW link
(mailchi.mp)

[Question] If AI is based on GPT, how to en­sure its safety?

avturchin18 Jun 2020 20:33 UTC
20 points
11 comments1 min readLW link

What’s Your Cog­ni­tive Al­gorithm?

Raemon18 Jun 2020 22:16 UTC
69 points
23 comments13 min readLW link

Rele­vant pre-AGI possibilities

Daniel Kokotajlo20 Jun 2020 10:52 UTC
30 points
7 comments19 min readLW link
(aiimpacts.org)

Plau­si­ble cases for HRAD work, and lo­cat­ing the crux in the “re­al­ism about ra­tio­nal­ity” debate

riceissa22 Jun 2020 1:10 UTC
80 points
14 comments10 min readLW link

The In­dex­ing Problem

johnswentworth22 Jun 2020 19:11 UTC
34 points
2 comments4 min readLW link

[Question] Re­quest­ing feed­back/​ad­vice: what Type The­ory to study for AI safety?

rvnnt23 Jun 2020 17:03 UTC
7 points
4 comments3 min readLW link

Lo­cal­ity of goals

adamShimi22 Jun 2020 21:56 UTC
16 points
8 comments6 min readLW link

[Question] What is “In­stru­men­tal Cor­rigi­bil­ity”?

joebernstein23 Jun 2020 20:24 UTC
3 points
1 comment1 min readLW link

Models, myths, dreams, and Cheshire cat grins

Stuart_Armstrong24 Jun 2020 10:50 UTC
21 points
7 comments2 min readLW link

[AN #105]: The eco­nomic tra­jec­tory of hu­man­ity, and what we might mean by optimization

rohinmshah24 Jun 2020 17:30 UTC
24 points
3 comments11 min readLW link
(mailchi.mp)

There’s an Awe­some AI Ethics List and it’s a lit­tle thin

AABoyles25 Jun 2020 13:43 UTC
13 points
1 comment1 min readLW link
(github.com)

GPT-3 Fic­tion Samples

gwern25 Jun 2020 16:12 UTC
61 points
18 comments1 min readLW link
(www.gwern.net)

Walk­through: The Trans­former Ar­chi­tec­ture [Part 1/​2]

Matthew Barnett30 Jul 2019 13:54 UTC
34 points
0 comments6 min readLW link

Ro­bust­ness as a Path to AI Alignment

abramdemski10 Oct 2017 8:14 UTC
45 points
9 comments9 min readLW link

Rad­i­cal Prob­a­bil­ism [Tran­script]

26 Jun 2020 22:14 UTC
45 points
12 comments6 min readLW link

AI safety via mar­ket making

evhub26 Jun 2020 23:07 UTC
49 points
40 comments11 min readLW link

[Question] Have gen­eral de­com­posers been for­mal­ized?

Quinn27 Jun 2020 18:09 UTC
8 points
5 comments1 min readLW link

Gary Mar­cus vs Cor­ti­cal Uniformity

Steven Byrnes28 Jun 2020 18:18 UTC
22 points
0 comments8 min readLW link

Web AI dis­cus­sion Groups

Donald Hobson30 Jun 2020 11:22 UTC
10 points
0 comments2 min readLW link

Com­par­ing AI Align­ment Ap­proaches to Min­i­mize False Pos­i­tive Risk

G Gordon Worley III30 Jun 2020 19:34 UTC
5 points
0 comments9 min readLW link

AvE: As­sis­tance via Empowerment

FactorialCode30 Jun 2020 22:07 UTC
12 points
1 comment1 min readLW link
(arxiv.org)

Evan Hub­inger on In­ner Align­ment, Outer Align­ment, and Pro­pos­als for Build­ing Safe Ad­vanced AI

Palus Astra1 Jul 2020 17:30 UTC
34 points
4 comments67 min readLW link

[AN #106]: Eval­u­at­ing gen­er­al­iza­tion abil­ity of learned re­ward models

rohinmshah1 Jul 2020 17:20 UTC
14 points
2 comments11 min readLW link
(mailchi.mp)

The “AI De­bate” Debate

michaelcohen2 Jul 2020 10:16 UTC
20 points
20 comments3 min readLW link

Idea: Imi­ta­tion/​Value Learn­ing AIXI

Zachary Robertson3 Jul 2020 17:10 UTC
3 points
6 comments1 min readLW link

Split­ting De­bate up into Two Subsystems

Nandi3 Jul 2020 20:11 UTC
13 points
5 comments4 min readLW link

AI Un­safety via Non-Zero-Sum Debate

VojtaKovarik3 Jul 2020 22:03 UTC
25 points
10 comments5 min readLW link

Clas­sify­ing games like the Pri­soner’s Dilemma

philh4 Jul 2020 17:10 UTC
78 points
23 comments6 min readLW link
(reasonableapproximation.net)

AI-Feyn­man as a bench­mark for what we should be aiming for

Faustus24 Jul 2020 9:24 UTC
8 points
1 comment2 min readLW link

Learn­ing the prior

paulfchristiano5 Jul 2020 21:00 UTC
78 points
26 comments8 min readLW link
(ai-alignment.com)

Bet­ter pri­ors as a safety problem

paulfchristiano5 Jul 2020 21:20 UTC
63 points
7 comments5 min readLW link
(ai-alignment.com)

[Question] How far is AGI?

Roko Jelavić5 Jul 2020 17:58 UTC
6 points
5 comments1 min readLW link

Clas­sify­ing speci­fi­ca­tion prob­lems as var­i­ants of Good­hart’s Law

Vika19 Aug 2019 20:40 UTC
67 points
5 comments5 min readLW link2 nominations1 review

New safety re­search agenda: scal­able agent al­ign­ment via re­ward modeling

Vika20 Nov 2018 17:29 UTC
34 points
13 comments1 min readLW link
(medium.com)

De­sign­ing agent in­cen­tives to avoid side effects

11 Mar 2019 20:55 UTC
29 points
0 comments2 min readLW link
(medium.com)

Dis­cus­sion on the ma­chine learn­ing ap­proach to AI safety

Vika1 Nov 2018 20:54 UTC
25 points
3 comments4 min readLW link

Speci­fi­ca­tion gam­ing ex­am­ples in AI

Vika3 Apr 2018 12:30 UTC
39 points
9 comments1 min readLW link

[Question] (an­swered: yes) Has any­one writ­ten up a con­sid­er­a­tion of Downs’s “Para­dox of Vot­ing” from the per­spec­tive of MIRI-ish de­ci­sion the­o­ries (UDT, FDT, or even just EDT)?

Jameson Quinn6 Jul 2020 18:26 UTC
9 points
24 comments1 min readLW link

New Deep­Mind AI Safety Re­search Blog

Vika27 Sep 2018 16:28 UTC
43 points
0 comments1 min readLW link
(medium.com)

Con­test: $1,000 for good ques­tions to ask to an Or­a­cle AI

Stuart_Armstrong31 Jul 2019 18:48 UTC
56 points
156 comments3 min readLW link

De­con­fus­ing Hu­man Values Re­search Agenda v1

G Gordon Worley III23 Mar 2020 16:25 UTC
23 points
12 comments4 min readLW link

[Question] How “hon­est” is GPT-3?

abramdemski8 Jul 2020 19:38 UTC
72 points
18 comments5 min readLW link

What does it mean to ap­ply de­ci­sion the­ory?

abramdemski8 Jul 2020 20:31 UTC
40 points
5 comments8 min readLW link

AI Re­search Con­sid­er­a­tions for Hu­man Ex­is­ten­tial Safety (ARCHES)

habryka9 Jul 2020 2:49 UTC
60 points
8 comments1 min readLW link
(arxiv.org)

The Un­rea­son­able Effec­tive­ness of Deep Learning

Richard_Ngo30 Sep 2018 15:48 UTC
81 points
5 comments13 min readLW link
(thinkingcomplete.blogspot.com)

mAIry’s room: AI rea­son­ing to solve philo­soph­i­cal problems

Stuart_Armstrong5 Mar 2019 20:24 UTC
91 points
41 comments6 min readLW link2 nominations2 reviews

Failures of an em­bod­ied AIXI

So8res15 Jun 2014 18:29 UTC
46 points
46 comments12 min readLW link

The Prob­lem with AIXI

Rob Bensinger18 Mar 2014 1:55 UTC
43 points
78 comments23 min readLW link

Ver­sions of AIXI can be ar­bi­trar­ily stupid

Stuart_Armstrong10 Aug 2015 13:23 UTC
29 points
59 comments1 min readLW link

Reflec­tive AIXI and Anthropics

Diffractor24 Sep 2018 2:15 UTC
17 points
13 comments8 min readLW link

AIXI and Ex­is­ten­tial Despair

paulfchristiano8 Dec 2011 20:03 UTC
23 points
38 comments6 min readLW link

How to make AIXI-tl in­ca­pable of learning

itaibn027 Jan 2014 0:05 UTC
7 points
5 comments2 min readLW link

Help re­quest: What is the Kol­mogorov com­plex­ity of com­putable ap­prox­i­ma­tions to AIXI?

AnnaSalamon5 Dec 2010 10:23 UTC
7 points
9 comments1 min readLW link

“AIXIjs: A Soft­ware Demo for Gen­eral Re­in­force­ment Learn­ing”, As­lanides 2017

gwern29 May 2017 21:09 UTC
7 points
1 comment1 min readLW link
(arxiv.org)

Can AIXI be trained to do any­thing a hu­man can?

Stuart_Armstrong20 Oct 2014 13:12 UTC
5 points
9 comments2 min readLW link

Shap­ing eco­nomic in­cen­tives for col­lab­o­ra­tive AGI

Kaj_Sotala29 Jun 2018 16:26 UTC
45 points
15 comments4 min readLW link

Is the Star Trek Fed­er­a­tion re­ally in­ca­pable of build­ing AI?

Kaj_Sotala18 Mar 2018 10:30 UTC
10 points
4 comments2 min readLW link
(kajsotala.fi)

Some con­cep­tual high­lights from “Disjunc­tive Sce­nar­ios of Catas­trophic AI Risk”

Kaj_Sotala12 Feb 2018 12:30 UTC
29 points
4 comments6 min readLW link
(kajsotala.fi)

Mis­con­cep­tions about con­tin­u­ous takeoff

Matthew Barnett8 Oct 2019 21:31 UTC
71 points
38 comments4 min readLW link1 nomination

Dist­in­guish­ing defi­ni­tions of takeoff

Matthew Barnett14 Feb 2020 0:16 UTC
53 points
6 comments6 min readLW link

Book re­view: Ar­tifi­cial In­tel­li­gence Safety and Security

PeterMcCluskey8 Dec 2018 3:47 UTC
27 points
3 comments8 min readLW link
(www.bayesianinvestor.com)

Why AI may not foom

John_Maxwell24 Mar 2013 8:11 UTC
28 points
81 comments12 min readLW link

Hu­mans Who Are Not Con­cen­trat­ing Are Not Gen­eral Intelligences

sarahconstantin25 Feb 2019 20:40 UTC
156 points
34 comments6 min readLW link4 nominations1 review
(srconstantin.wordpress.com)

The Hacker Learns to Trust

Ben Pace22 Jun 2019 0:27 UTC
78 points
18 comments8 min readLW link
(medium.com)

Book Re­view: Hu­man Compatible

Scott Alexander31 Jan 2020 5:20 UTC
75 points
6 comments16 min readLW link
(slatestarcodex.com)

SSC Jour­nal Club: AI Timelines

Scott Alexander8 Jun 2017 19:00 UTC
9 points
2 comments8 min readLW link

Ar­gu­ments against my­opic training

Richard_Ngo9 Jul 2020 16:07 UTC
51 points
37 comments12 min readLW link

On mo­ti­va­tions for MIRI’s highly re­li­able agent de­sign research

jessicata29 Jan 2017 19:34 UTC
22 points
1 comment5 min readLW link

Why is the im­pact penalty time-in­con­sis­tent?

Stuart_Armstrong9 Jul 2020 17:26 UTC
16 points
1 comment2 min readLW link

My cur­rent take on the Paul-MIRI dis­agree­ment on al­ignabil­ity of messy AI

jessicata29 Jan 2017 20:52 UTC
20 points
0 comments10 min readLW link

Ben Go­ertzel: The Sin­gu­lar­ity In­sti­tute’s Scary Idea (and Why I Don’t Buy It)

Paul Crowley30 Oct 2010 9:31 UTC
42 points
442 comments1 min readLW link

An An­a­lytic Per­spec­tive on AI Alignment

DanielFilan1 Mar 2020 4:10 UTC
53 points
45 comments8 min readLW link
(danielfilan.com)

Mechanis­tic Trans­parency for Ma­chine Learning

DanielFilan11 Jul 2018 0:34 UTC
55 points
9 comments4 min readLW link

A model I use when mak­ing plans to re­duce AI x-risk

Ben Pace19 Jan 2018 0:21 UTC
66 points
41 comments6 min readLW link

AI Re­searchers On AI Risk

Scott Alexander22 May 2015 11:16 UTC
14 points
0 comments16 min readLW link

Mini ad­vent cal­en­dar of Xrisks: Ar­tifi­cial Intelligence

Stuart_Armstrong7 Dec 2012 11:26 UTC
5 points
5 comments1 min readLW link

For FAI: Is “Molec­u­lar Nan­otech­nol­ogy” putting our best foot for­ward?

leplen22 Jun 2013 4:44 UTC
78 points
118 comments3 min readLW link

UFAI can­not be the Great Filter

Thrasymachus22 Dec 2012 11:26 UTC
59 points
92 comments3 min readLW link

Don’t Fear The Filter

Scott Alexander29 May 2014 0:45 UTC
7 points
17 comments6 min readLW link

The Great Filter is early, or AI is hard

Stuart_Armstrong29 Aug 2014 16:17 UTC
32 points
76 comments1 min readLW link

Talk: Key Is­sues In Near-Term AI Safety Research

alenglander10 Jul 2020 18:36 UTC
22 points
1 comment1 min readLW link

Mesa-Op­ti­miz­ers vs “Steered Op­ti­miz­ers”

Steven Byrnes10 Jul 2020 16:49 UTC
40 points
5 comments8 min readLW link

AlphaS­tar: Im­pres­sive for RL progress, not for AGI progress

orthonormal2 Nov 2019 1:50 UTC
111 points
58 comments2 min readLW link2 nominations1 review

The Catas­trophic Con­ver­gence Conjecture

TurnTrout14 Feb 2020 21:16 UTC
39 points
15 comments8 min readLW link

[Question] How well can the GPT ar­chi­tec­ture solve the par­ity task?

FactorialCode11 Jul 2020 19:02 UTC
18 points
3 comments1 min readLW link

Sun­day July 12 — talks by Scott Garrabrant, Alexflint, alexei, Stu­art_Armstrong

8 Jul 2020 0:27 UTC
19 points
2 comments1 min readLW link

[Link] Word-vec­tor based DL sys­tem achieves hu­man par­ity in ver­bal IQ tests

jacob_cannell13 Jun 2015 23:38 UTC
17 points
8 comments1 min readLW link

The Power of Intelligence

Eliezer Yudkowsky1 Jan 2007 20:00 UTC
42 points
3 comments4 min readLW link

Com­ments on CAIS

Richard_Ngo12 Jan 2019 15:20 UTC
64 points
12 comments7 min readLW link

[Question] What are CAIS’ bold­est near/​medium-term pre­dic­tions?

jacobjacob28 Mar 2019 13:14 UTC
31 points
17 comments1 min readLW link

Drexler on AI Risk

PeterMcCluskey1 Feb 2019 5:11 UTC
34 points
10 comments9 min readLW link
(www.bayesianinvestor.com)

Six AI Risk/​Strat­egy Ideas

Wei_Dai27 Aug 2019 0:40 UTC
62 points
18 comments4 min readLW link2 nominations1 review

New re­port: In­tel­li­gence Ex­plo­sion Microeconomics

Eliezer Yudkowsky29 Apr 2013 23:14 UTC
72 points
251 comments3 min readLW link

Book re­view: Hu­man Compatible

PeterMcCluskey19 Jan 2020 3:32 UTC
37 points
2 comments5 min readLW link
(www.bayesianinvestor.com)

Thoughts on “Hu­man-Com­pat­i­ble”

TurnTrout10 Oct 2019 5:24 UTC
58 points
35 comments5 min readLW link

Book Re­view: The AI Does Not Hate You

PeterMcCluskey28 Oct 2019 17:45 UTC
25 points
0 comments5 min readLW link
(www.bayesianinvestor.com)

[Link] Book Re­view: ‘The AI Does Not Hate You’ by Tom Chivers (Scott Aaron­son)

eigen7 Oct 2019 18:16 UTC
18 points
0 comments1 min readLW link

Book Re­view: Life 3.0: Be­ing Hu­man in the Age of Ar­tifi­cial Intelligence

J_Thomas_Moros18 Jan 2018 17:18 UTC
6 points
0 comments1 min readLW link
(ferocioustruth.com)

Book Re­view: Weapons of Math Destruction

Zvi4 Jun 2017 21:20 UTC
1 point
0 comments16 min readLW link

DARPA Digi­tal Tu­tor: Four Months to To­tal Tech­ni­cal Ex­per­tise?

JohnBuridan6 Jul 2020 23:34 UTC
145 points
15 comments7 min readLW link

Paper: Su­per­in­tel­li­gence as a Cause or Cure for Risks of Astro­nom­i­cal Suffering

Kaj_Sotala3 Jan 2018 14:39 UTC
1 point
6 comments1 min readLW link
(www.informatica.si)

Prevent­ing s-risks via in­dex­i­cal un­cer­tainty, acausal trade and dom­i­na­tion in the multiverse

avturchin27 Sep 2018 10:09 UTC
7 points
6 comments4 min readLW link

Pre­face to CLR’s Re­search Agenda on Co­op­er­a­tion, Con­flict, and TAI

JesseClifton13 Dec 2019 21:02 UTC
54 points
8 comments2 min readLW link

Sec­tions 1 & 2: In­tro­duc­tion, Strat­egy and Governance

JesseClifton17 Dec 2019 21:27 UTC
33 points
5 comments14 min readLW link

Sec­tions 3 & 4: Cred­i­bil­ity, Peace­ful Bar­gain­ing Mechanisms

JesseClifton17 Dec 2019 21:46 UTC
19 points
2 comments12 min readLW link

Sec­tions 5 & 6: Con­tem­po­rary Ar­chi­tec­tures, Hu­mans in the Loop

JesseClifton20 Dec 2019 3:52 UTC
27 points
4 comments10 min readLW link

Sec­tion 7: Foun­da­tions of Ra­tional Agency

JesseClifton22 Dec 2019 2:05 UTC
14 points
3 comments8 min readLW link

What counts as defec­tion?

TurnTrout12 Jul 2020 22:03 UTC
80 points
20 comments5 min readLW link

The “Com­mit­ment Races” problem

Daniel Kokotajlo23 Aug 2019 1:58 UTC
91 points
34 comments5 min readLW link1 nomination

Align­ment Newslet­ter #36

rohinmshah12 Dec 2018 1:10 UTC
21 points
0 comments11 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #47

rohinmshah4 Mar 2019 4:30 UTC
18 points
0 comments8 min readLW link
(mailchi.mp)

Un­der­stand­ing “Deep Dou­ble Des­cent”

evhub6 Dec 2019 0:00 UTC
130 points
40 comments5 min readLW link3 nominations4 reviews

[LINK] Strong AI Startup Raises $15M

olalonde21 Aug 2012 20:47 UTC
24 points
13 comments1 min readLW link

An­nounc­ing the AI Align­ment Prize

cousin_it3 Nov 2017 15:47 UTC
89 points
78 comments1 min readLW link

I’m leav­ing AI al­ign­ment – you bet­ter stay

rmoehn12 Mar 2020 5:58 UTC
139 points
19 comments5 min readLW link

New pa­per: AGI Agent Safety by Iter­a­tively Im­prov­ing the Utility Function

Koen.Holtman15 Jul 2020 14:05 UTC
21 points
2 comments6 min readLW link

[Question] How should AI de­bate be judged?

abramdemski15 Jul 2020 22:20 UTC
48 points
27 comments6 min readLW link

Align­ment pro­pos­als and com­plex­ity classes

evhub16 Jul 2020 0:27 UTC
31 points
26 comments13 min readLW link

[AN #107]: The con­ver­gent in­stru­men­tal sub­goals of goal-di­rected agents

rohinmshah16 Jul 2020 6:47 UTC
13 points
1 comment8 min readLW link
(mailchi.mp)

[AN #108]: Why we should scru­ti­nize ar­gu­ments for AI risk

rohinmshah16 Jul 2020 6:47 UTC
19 points
6 comments12 min readLW link
(mailchi.mp)

En­vi­ron­ments as a bot­tle­neck in AGI development

Richard_Ngo17 Jul 2020 5:02 UTC
25 points
19 comments6 min readLW link

[Question] Can an agent use in­ter­ac­tive proofs to check the al­ign­ment of suc­ce­sors?

PabloAMC17 Jul 2020 19:07 UTC
7 points
2 comments1 min readLW link

Les­sons on AI Takeover from the conquistadors

17 Jul 2020 22:35 UTC
56 points
30 comments5 min readLW link

What Would I Do? Self-pre­dic­tion in Sim­ple Algorithms

Scott Garrabrant20 Jul 2020 4:27 UTC
51 points
13 comments5 min readLW link

Wri­teup: Progress on AI Safety via Debate

5 Feb 2020 21:04 UTC
88 points
17 comments33 min readLW link

Oper­a­tional­iz­ing Interpretability

lifelonglearner20 Jul 2020 5:22 UTC
20 points
0 comments4 min readLW link

Learn­ing Values in Practice

Stuart_Armstrong20 Jul 2020 18:38 UTC
23 points
0 comments5 min readLW link

Par­allels Between AI Safety by De­bate and Ev­i­dence Law

Cullen_OKeefe20 Jul 2020 22:52 UTC
10 points
1 comment2 min readLW link
(cullenokeefe.com)

The Redis­cov­ery of In­te­ri­or­ity in Ma­chine Learning

DanB21 Jul 2020 5:02 UTC
5 points
4 comments1 min readLW link
(danburfoot.net)

The “AI Dun­geons” Dragon Model is heav­ily path de­pen­dent (test­ing GPT-3 on ethics)

Rafael Harth21 Jul 2020 12:14 UTC
44 points
9 comments6 min readLW link

How good is hu­man­ity at co­or­di­na­tion?

Buck21 Jul 2020 20:01 UTC
72 points
43 comments3 min readLW link

Align­ment As A Bot­tle­neck To Use­ful­ness Of GPT-3

johnswentworth21 Jul 2020 20:02 UTC
97 points
57 comments3 min readLW link

$1000 bounty for OpenAI to show whether GPT3 was “de­liber­ately” pre­tend­ing to be stupi­der than it is

jacobjacob21 Jul 2020 18:42 UTC
52 points
40 comments2 min readLW link
(twitter.com)

[Preprint] The Com­pu­ta­tional Limits of Deep Learning

G Gordon Worley III21 Jul 2020 21:25 UTC
8 points
1 comment1 min readLW link
(arxiv.org)

[AN #109]: Teach­ing neu­ral nets to gen­er­al­ize the way hu­mans would

rohinmshah22 Jul 2020 17:10 UTC
17 points
3 comments9 min readLW link
(mailchi.mp)

Re­search agenda for AI safety and a bet­ter civilization

agilecaveman22 Jul 2020 6:35 UTC
12 points
2 comments16 min readLW link

Weak HCH ac­cesses EXP

evhub22 Jul 2020 22:36 UTC
14 points
0 comments3 min readLW link

GPT-3 Gems

TurnTrout23 Jul 2020 0:46 UTC
30 points
7 comments41 min readLW link

Op­ti­miz­ing ar­bi­trary ex­pres­sions with a lin­ear num­ber of queries to a Log­i­cal In­duc­tion Or­a­cle (Car­toon Guide)

Donald Hobson23 Jul 2020 21:37 UTC
3 points
2 comments2 min readLW link

[Question] Con­struct a port­fo­lio to profit from AI progress.

deluks91725 Jul 2020 8:18 UTC
29 points
13 comments1 min readLW link

Think­ing soberly about the con­text and con­se­quences of Friendly AI

Mitchell_Porter16 Oct 2012 4:33 UTC
20 points
39 comments1 min readLW link

Goal re­ten­tion dis­cus­sion with Eliezer

MaxTegmark4 Sep 2014 22:23 UTC
92 points
26 comments6 min readLW link

[Question] Where do peo­ple dis­cuss do­ing things with GPT-3?

skybrian26 Jul 2020 14:31 UTC
2 points
7 comments1 min readLW link

You Can Prob­a­bly Am­plify GPT3 Directly

Zachary Robertson26 Jul 2020 21:58 UTC
34 points
14 comments6 min readLW link

[up­dated] how does gpt2′s train­ing cor­pus cap­ture in­ter­net dis­cus­sion? not well

nostalgebraist27 Jul 2020 22:30 UTC
24 points
3 comments2 min readLW link
(nostalgebraist.tumblr.com)

Agen­tic Lan­guage Model Memes

FactorialCode1 Aug 2020 18:03 UTC
16 points
1 comment2 min readLW link

A com­mu­nity-cu­rated repos­i­tory of in­ter­est­ing GPT-3 stuff

Rudi C28 Jul 2020 14:16 UTC
8 points
0 comments1 min readLW link
(github.com)

[Question] Does the lot­tery ticket hy­poth­e­sis sug­gest the scal­ing hy­poth­e­sis?

Daniel Kokotajlo28 Jul 2020 19:52 UTC
12 points
2 comments1 min readLW link

[Question] To what ex­tent are the scal­ing prop­er­ties of Trans­former net­works ex­cep­tional?

abramdemski28 Jul 2020 20:06 UTC
29 points
1 comment1 min readLW link

[Question] What hap­pens to var­i­ance as neu­ral net­work train­ing is scaled? What does it im­ply about “lot­tery tick­ets”?

abramdemski28 Jul 2020 20:22 UTC
25 points
4 comments1 min readLW link

[Question] How will in­ter­net fo­rums like LW be able to defend against GPT-style spam?

ChristianKl28 Jul 2020 20:12 UTC
14 points
18 comments1 min readLW link

Pre­dic­tions for GPT-N

hippke29 Jul 2020 1:16 UTC
34 points
31 comments1 min readLW link

An­nounce­ment: AI al­ign­ment prize win­ners and next round

cousin_it15 Jan 2018 14:33 UTC
80 points
68 comments2 min readLW link

Jeff Hawk­ins on neu­ro­mor­phic AGI within 20 years

Steven Byrnes15 Jul 2019 19:16 UTC
158 points
24 comments12 min readLW link1 nomination

Cas­cades, Cy­cles, In­sight...

Eliezer Yudkowsky24 Nov 2008 9:33 UTC
23 points
31 comments8 min readLW link

...Re­cur­sion, Magic

Eliezer Yudkowsky25 Nov 2008 9:10 UTC
22 points
28 comments5 min readLW link

Refer­ences & Re­sources for LessWrong

XiXiDu10 Oct 2010 14:54 UTC
146 points
106 comments20 min readLW link

[Question] A game de­signed to beat AI?

Long try17 Mar 2020 3:51 UTC
13 points
29 comments1 min readLW link

Truly Part Of You

Eliezer Yudkowsky21 Nov 2007 2:18 UTC
108 points
58 comments4 min readLW link

[AN #110]: Learn­ing fea­tures from hu­man feed­back to en­able re­ward learning

rohinmshah29 Jul 2020 17:20 UTC
13 points
2 comments10 min readLW link
(mailchi.mp)

Struc­tured Tasks for Lan­guage Models

Zachary Robertson29 Jul 2020 14:17 UTC
5 points
0 comments1 min readLW link

En­gag­ing Se­ri­ously with Short Timelines

deluks91729 Jul 2020 19:21 UTC
43 points
23 comments3 min readLW link

What Failure Looks Like: Distill­ing the Discussion

Ben Pace29 Jul 2020 21:49 UTC
71 points
11 comments7 min readLW link

Learn­ing the prior and generalization

evhub29 Jul 2020 22:49 UTC
34 points
16 comments4 min readLW link

[Question] Is the work on AI al­ign­ment rele­vant to GPT?

Richard_Kennaway30 Jul 2020 12:23 UTC
12 points
5 comments1 min readLW link

Ver­ifi­ca­tion and Transparency

DanielFilan8 Aug 2019 1:50 UTC
34 points
6 comments2 min readLW link
(danielfilan.com)

Robin Han­son on Lump­iness of AI Services

DanielFilan17 Feb 2019 23:08 UTC
15 points
2 comments2 min readLW link
(www.overcomingbias.com)

One Way to Think About ML Transparency

Matthew Barnett2 Sep 2019 23:27 UTC
26 points
28 comments5 min readLW link

What is In­ter­pretabil­ity?

17 Mar 2020 20:23 UTC
33 points
0 comments11 min readLW link

Re­laxed ad­ver­sar­ial train­ing for in­ner alignment

evhub10 Sep 2019 23:03 UTC
54 points
10 comments27 min readLW link

Con­clu­sion to ‘Refram­ing Im­pact’

TurnTrout28 Feb 2020 16:05 UTC
38 points
17 comments2 min readLW link

Bayesian Evolv­ing-to-Extinction

abramdemski14 Feb 2020 23:55 UTC
37 points
13 comments5 min readLW link

Do Suffi­ciently Ad­vanced Agents Use Logic?

abramdemski13 Sep 2019 19:53 UTC
38 points
11 comments9 min readLW link

World State is the Wrong Ab­strac­tion for Impact

TurnTrout1 Oct 2019 21:03 UTC
61 points
19 comments2 min readLW link

At­tain­able Utility Preser­va­tion: Concepts

TurnTrout17 Feb 2020 5:20 UTC
38 points
18 comments1 min readLW link

At­tain­able Utility Preser­va­tion: Em­piri­cal Results

22 Feb 2020 0:38 UTC
48 points
7 comments9 min readLW link

How Low Should Fruit Hang Be­fore We Pick It?

TurnTrout25 Feb 2020 2:08 UTC
26 points
9 comments12 min readLW link

At­tain­able Utility Preser­va­tion: Scal­ing to Superhuman

TurnTrout27 Feb 2020 0:52 UTC
26 points
20 comments8 min readLW link

Rea­sons for Ex­cite­ment about Im­pact of Im­pact Mea­sure Research

TurnTrout27 Feb 2020 21:42 UTC
31 points
8 comments4 min readLW link

Power as Easily Ex­ploitable Opportunities

TurnTrout1 Aug 2020 2:14 UTC
24 points
5 comments6 min readLW link

[Question] Would AGIs par­ent young AGIs?

Vishrut Arya2 Aug 2020 0:57 UTC
3 points
6 comments1 min readLW link

If I were a well-in­ten­tioned AI… I: Image classifier

Stuart_Armstrong26 Feb 2020 12:39 UTC
35 points
4 comments5 min readLW link

Non-Con­se­quen­tial­ist Co­op­er­a­tion?

abramdemski11 Jan 2019 9:15 UTC
47 points
15 comments7 min readLW link

Cu­ri­os­ity Killed the Cat and the Asymp­tot­i­cally Op­ti­mal Agent

michaelcohen20 Feb 2020 17:28 UTC
27 points
15 comments1 min readLW link

If I were a well-in­ten­tioned AI… IV: Mesa-optimising

Stuart_Armstrong2 Mar 2020 12:16 UTC
26 points
2 comments6 min readLW link

Re­sponse to Oren Etz­ioni’s “How to know if ar­tifi­cial in­tel­li­gence is about to de­stroy civ­i­liza­tion”

Daniel Kokotajlo27 Feb 2020 18:10 UTC
27 points
5 comments8 min readLW link

Clar­ify­ing Power-Seek­ing and In­stru­men­tal Convergence

TurnTrout20 Dec 2019 19:59 UTC
41 points
7 comments3 min readLW link

How im­por­tant are MDPs for AGI (Safety)?

michaelcohen26 Mar 2020 20:32 UTC
14 points
8 comments2 min readLW link

Syn­the­siz­ing am­plifi­ca­tion and debate

evhub5 Feb 2020 22:53 UTC
32 points
10 comments4 min readLW link

is gpt-3 few-shot ready for real ap­pli­ca­tions?

nostalgebraist3 Aug 2020 19:50 UTC
31 points
5 comments9 min readLW link
(nostalgebraist.tumblr.com)

In­ter­pretabil­ity in ML: A Broad Overview

lifelonglearner4 Aug 2020 19:03 UTC
41 points
5 comments15 min readLW link

In­finite Data/​Com­pute Ar­gu­ments in Alignment

johnswentworth4 Aug 2020 20:21 UTC
42 points
6 comments2 min readLW link

Four Ways An Im­pact Mea­sure Could Help Alignment

Matthew Barnett8 Aug 2019 0:10 UTC
21 points
1 comment8 min readLW link

Un­der­stand­ing Re­cent Im­pact Measures

Matthew Barnett7 Aug 2019 4:57 UTC
16 points
6 comments7 min readLW link

A Sur­vey of Early Im­pact Measures

Matthew Barnett6 Aug 2019 1:22 UTC
23 points
0 comments8 min readLW link

Op­ti­miza­tion Reg­u­lariza­tion through Time Penalty

Linda Linsefors1 Jan 2019 13:05 UTC
11 points
4 comments3 min readLW link

Stable Poin­t­ers to Value III: Re­cur­sive Quantilization

abramdemski21 Jul 2018 8:06 UTC
18 points
4 comments4 min readLW link

Thoughts on Quantilizers

Stuart_Armstrong2 Jun 2017 16:24 UTC
2 points
0 comments2 min readLW link

Quan­tiliz­ers max­i­mize ex­pected util­ity sub­ject to a con­ser­va­tive cost constraint

jessicata28 Sep 2015 2:17 UTC
12 points
0 comments5 min readLW link

Quan­tilal con­trol for finite MDPs

Vanessa Kosoy12 Apr 2018 9:21 UTC
4 points
0 comments13 min readLW link

The limits of corrigibility

Stuart_Armstrong10 Apr 2018 10:49 UTC
25 points
9 comments4 min readLW link

Align­ment Newslet­ter #16: 07/​23/​18

rohinmshah23 Jul 2018 16:20 UTC
42 points
0 comments12 min readLW link
(mailchi.mp)

Mea­sur­ing hard­ware overhang

hippke5 Aug 2020 19:59 UTC
43 points
6 comments4 min readLW link

[AN #111]: The Cir­cuits hy­pothe­ses for deep learning

rohinmshah5 Aug 2020 17:40 UTC
22 points
0 comments9 min readLW link
(mailchi.mp)

Self-Fulfilling Prophe­cies Aren’t Always About Self-Awareness

John_Maxwell18 Nov 2019 23:11 UTC
14 points
7 comments4 min readLW link

The Good­hart Game

John_Maxwell18 Nov 2019 23:22 UTC
13 points
5 comments5 min readLW link

Why don’t sin­gu­lar­i­tar­i­ans bet on the cre­ation of AGI by buy­ing stocks?

John_Maxwell11 Mar 2020 16:27 UTC
36 points
19 comments4 min readLW link

The Dual­ist Pre­dict-O-Matic ($100 prize)

John_Maxwell17 Oct 2019 6:45 UTC
16 points
35 comments5 min readLW link

[Question] What AI safety prob­lems need solv­ing for safe AI re­search as­sis­tants?

John_Maxwell5 Nov 2019 2:09 UTC
14 points
13 comments1 min readLW link

Refin­ing the Evolu­tion­ary Anal­ogy to AI

brglnd7 Aug 2020 23:13 UTC
9 points
2 comments4 min readLW link

The Fu­sion Power Gen­er­a­tor Scenario

johnswentworth8 Aug 2020 18:31 UTC
104 points
25 comments3 min readLW link

[Question] How much is known about the “in­fer­ence rules” of log­i­cal in­duc­tion?

Eigil Rischel8 Aug 2020 10:45 UTC
11 points
7 comments1 min readLW link

If I were a well-in­ten­tioned AI… II: Act­ing in a world

Stuart_Armstrong27 Feb 2020 11:58 UTC
20 points
0 comments3 min readLW link

If I were a well-in­ten­tioned AI… III: Ex­tremal Goodhart

Stuart_Armstrong28 Feb 2020 11:24 UTC
21 points
0 comments5 min readLW link

Towards a For­mal­i­sa­tion of Log­i­cal Counterfactuals

Bunthut8 Aug 2020 22:14 UTC
6 points
2 comments2 min readLW link

[Question] 10/​50/​90% chance of GPT-N Trans­for­ma­tive AI?

human_generated_text9 Aug 2020 0:10 UTC
24 points
8 comments1 min readLW link

[Question] Can we ex­pect more value from AI al­ign­ment than from an ASI with the goal of run­ning al­ter­nate tra­jec­to­ries of our uni­verse?

Maxime Riché9 Aug 2020 17:17 UTC
2 points
5 comments1 min readLW link

In defense of Or­a­cle (“Tool”) AI research

Steven Byrnes7 Aug 2019 19:14 UTC
20 points
11 comments4 min readLW link

How GPT-N will es­cape from its AI-box

hippke12 Aug 2020 19:34 UTC
7 points
9 comments1 min readLW link

Strong im­pli­ca­tion of prefer­ence uncertainty

Stuart_Armstrong12 Aug 2020 19:02 UTC
20 points
3 comments2 min readLW link

[AN #112]: Eng­ineer­ing a Safer World

rohinmshah13 Aug 2020 17:20 UTC
25 points
1 comment12 min readLW link
(mailchi.mp)

Room and Board for Peo­ple Self-Learn­ing ML or Do­ing In­de­pen­dent ML Research

SamuelKnoche14 Aug 2020 17:19 UTC
7 points
1 comment1 min readLW link

Talk and Q&A—Dan Hendrycks—Paper: Align­ing AI With Shared Hu­man Values. On Dis­cord at Aug 28, 2020 8:00-10:00 AM GMT+8.

wassname14 Aug 2020 23:57 UTC
1 point
0 comments1 min readLW link

Search ver­sus design

alexflint16 Aug 2020 16:53 UTC
83 points
39 comments36 min readLW link

Work on Se­cu­rity In­stead of Friendli­ness?

Wei_Dai21 Jul 2012 18:28 UTC
49 points
107 comments2 min readLW link

Goal-Direct­ed­ness: What Suc­cess Looks Like

adamShimi16 Aug 2020 18:33 UTC
9 points
0 comments2 min readLW link

[Question] A way to beat su­per­ra­tional/​EDT agents?

Abhimanyu Pallavi Sudhir17 Aug 2020 14:33 UTC
5 points
13 comments1 min readLW link

Learn­ing hu­man prefer­ences: op­ti­mistic and pes­simistic scenarios

Stuart_Armstrong18 Aug 2020 13:05 UTC
27 points
6 comments6 min readLW link

Mesa-Search vs Mesa-Control

abramdemski18 Aug 2020 18:51 UTC
53 points
45 comments7 min readLW link

Why we want un­bi­ased learn­ing processes

Stuart_Armstrong20 Feb 2018 14:48 UTC
13 points
3 comments3 min readLW link

In­tu­itive ex­am­ples of re­ward func­tion learn­ing?

Stuart_Armstrong6 Mar 2018 16:54 UTC
7 points
3 comments2 min readLW link

Open-Cat­e­gory Classification

TurnTrout28 Mar 2018 14:49 UTC
11 points
6 comments10 min readLW link

Look­ing for ad­ver­sar­ial col­lab­o­ra­tors to test our De­bate protocol

Beth Barnes19 Aug 2020 3:15 UTC
52 points
5 comments1 min readLW link

Walk­through of ‘For­mal­iz­ing Con­ver­gent In­stru­men­tal Goals’

TurnTrout26 Feb 2018 2:20 UTC
10 points
2 comments10 min readLW link

Am­bi­guity Detection

TurnTrout1 Mar 2018 4:23 UTC
11 points
9 comments4 min readLW link

Pe­nal­iz­ing Im­pact via At­tain­able Utility Preservation

TurnTrout28 Dec 2018 21:46 UTC
24 points
0 comments3 min readLW link
(arxiv.org)

What You See Isn’t Always What You Want

TurnTrout13 Sep 2019 4:17 UTC
30 points
12 comments3 min readLW link

[Question] In­stru­men­tal Oc­cam?

abramdemski31 Jan 2020 19:27 UTC
30 points
15 comments1 min readLW link

Com­pact vs. Wide Models

Vaniver16 Jul 2018 4:09 UTC
30 points
5 comments3 min readLW link

Alex Ir­pan: “My AI Timelines Have Sped Up”

Vaniver19 Aug 2020 16:23 UTC
43 points
20 comments1 min readLW link
(www.alexirpan.com)

[AN #113]: Check­ing the eth­i­cal in­tu­itions of large lan­guage models

rohinmshah19 Aug 2020 17:10 UTC
23 points
0 comments9 min readLW link
(mailchi.mp)

AI safety as feather­less bipeds *with broad flat nails*

Stuart_Armstrong19 Aug 2020 10:22 UTC
35 points
1 comment1 min readLW link

Time Magaz­ine has an ar­ti­cle about the Sin­gu­lar­ity...

Raemon11 Feb 2011 2:20 UTC
40 points
13 comments1 min readLW link

How rapidly are GPUs im­prov­ing in price perfor­mance?

gallabytes25 Nov 2018 19:54 UTC
31 points
9 comments1 min readLW link
(mediangroup.org)

Our val­ues are un­der­defined, change­able, and manipulable

Stuart_Armstrong2 Nov 2017 11:09 UTC
20 points
6 comments3 min readLW link

[Question] What fund­ing sources ex­ist for tech­ni­cal AI safety re­search?

johnswentworth1 Oct 2019 15:30 UTC
26 points
5 comments1 min readLW link

Hu­mans can drive cars

Apprentice30 Jan 2014 11:55 UTC
52 points
89 comments2 min readLW link

A Less Wrong sin­gu­lar­ity ar­ti­cle?

Kaj_Sotala17 Nov 2009 14:15 UTC
31 points
215 comments1 min readLW link

The Bayesian Tyrant

abramdemski20 Aug 2020 0:08 UTC
116 points
14 comments6 min readLW link

Con­cept Safety: Pro­duc­ing similar AI-hu­man con­cept spaces

Kaj_Sotala14 Apr 2015 20:39 UTC
49 points
45 comments8 min readLW link

[LINK] What should a rea­son­able per­son be­lieve about the Sin­gu­lar­ity?

Kaj_Sotala13 Jan 2011 9:32 UTC
38 points
14 comments2 min readLW link

The many ways AIs be­have badly

Stuart_Armstrong24 Apr 2018 11:40 UTC
10 points
3 comments2 min readLW link

July 2020 gw­ern.net newsletter

gwern20 Aug 2020 16:39 UTC
29 points
0 comments1 min readLW link
(www.gwern.net)

Do what we mean vs. do what we say

rohinmshah30 Aug 2018 22:03 UTC
34 points
14 comments1 min readLW link

[Question] What’s a De­com­pos­able Align­ment Topic?

elriggs21 Aug 2020 22:57 UTC
26 points
16 comments1 min readLW link

Tools ver­sus agents

Stuart_Armstrong16 May 2012 13:00 UTC
42 points
39 comments5 min readLW link

An un­al­igned benchmark

paulfchristiano17 Nov 2018 15:51 UTC
27 points
0 comments9 min readLW link

Fol­low­ing hu­man norms

rohinmshah20 Jan 2019 23:59 UTC
27 points
10 comments5 min readLW link

nos­talge­braist: Re­cur­sive Good­hart’s Law

Kaj_Sotala26 Aug 2020 11:07 UTC
52 points
27 comments1 min readLW link
(nostalgebraist.tumblr.com)

[AN #114]: The­ory-in­spired safety solu­tions for pow­er­ful Bayesian RL agents

rohinmshah26 Aug 2020 17:20 UTC
21 points
3 comments8 min readLW link
(mailchi.mp)

[Question] How hard would it be to change GPT-3 in a way that al­lows au­dio?

ChristianKl28 Aug 2020 14:42 UTC
8 points
5 comments1 min readLW link

Safe Scram­bling?

Hoagy29 Aug 2020 14:31 UTC
3 points
1 comment2 min readLW link

(Hu­mor) AI Align­ment Crit­i­cal Failure Table

Kaj_Sotala31 Aug 2020 19:51 UTC
24 points
2 comments1 min readLW link
(sl4.org)

What is am­bi­tious value learn­ing?

rohinmshah1 Nov 2018 16:20 UTC
42 points
28 comments2 min readLW link

The easy goal in­fer­ence prob­lem is still hard

paulfchristiano3 Nov 2018 14:41 UTC
41 points
17 comments4 min readLW link

[AN #115]: AI safety re­search prob­lems in the AI-GA framework

rohinmshah2 Sep 2020 17:10 UTC
19 points
16 comments6 min readLW link
(mailchi.mp)

Emo­tional valence vs RL re­ward: a video game analogy

Steven Byrnes3 Sep 2020 15:28 UTC
11 points
6 comments4 min readLW link

Us­ing GPT-N to Solve In­ter­pretabil­ity of Neu­ral Net­works: A Re­search Agenda

3 Sep 2020 18:27 UTC
60 points
11 comments2 min readLW link

“Learn­ing to Sum­ma­rize with Hu­man Feed­back”—OpenAI

Rekrul7 Sep 2020 17:59 UTC
57 points
2 comments1 min readLW link

[AN #116]: How to make ex­pla­na­tions of neu­rons compositional

rohinmshah9 Sep 2020 17:20 UTC
21 points
2 comments9 min readLW link
(mailchi.mp)

Safer sand­box­ing via col­lec­tive separation

Richard_Ngo9 Sep 2020 19:49 UTC
21 points
6 comments4 min readLW link

[Question] Do mesa-op­ti­mizer risk ar­gu­ments rely on the train-test paradigm?

Ben Cottier10 Sep 2020 15:36 UTC
12 points
7 comments1 min readLW link

Safety via se­lec­tion for obedience

Richard_Ngo10 Sep 2020 10:04 UTC
29 points
1 comment5 min readLW link

How Much Com­pu­ta­tional Power Does It Take to Match the Hu­man Brain?

habryka12 Sep 2020 6:38 UTC
41 points
1 comment1 min readLW link
(www.openphilanthropy.org)

De­ci­sion The­ory is multifaceted

Michele Campolo13 Sep 2020 22:30 UTC
6 points
12 comments8 min readLW link

AI Safety Dis­cus­sion Day

Linda Linsefors15 Sep 2020 14:40 UTC
20 points
0 comments1 min readLW link

[AN #117]: How neu­ral nets would fare un­der the TEVV framework

rohinmshah16 Sep 2020 17:20 UTC
27 points
0 comments7 min readLW link
(mailchi.mp)

Ap­ply­ing the Coun­ter­fac­tual Pri­soner’s Dilemma to Log­i­cal Uncertainty

Chris_Leong16 Sep 2020 10:34 UTC
9 points
5 comments2 min readLW link

Ar­tifi­cial In­tel­li­gence: A Modern Ap­proach (4th edi­tion) on the Align­ment Problem

Zack_M_Davis17 Sep 2020 2:23 UTC
72 points
12 comments5 min readLW link
(aima.cs.berkeley.edu)

The “Backchain­ing to Lo­cal Search” Tech­nique in AI Alignment

adamShimi18 Sep 2020 15:05 UTC
24 points
1 comment2 min readLW link

Draft re­port on AI timelines

Ajeya Cotra18 Sep 2020 23:47 UTC
142 points
49 comments1 min readLW link

Why GPT wants to mesa-op­ti­mize & how we might change this

John_Maxwell19 Sep 2020 13:48 UTC
53 points
32 comments9 min readLW link

My (Mis)Ad­ven­tures With Al­gorith­mic Ma­chine Learning

AHartNtkn20 Sep 2020 5:31 UTC
14 points
4 comments41 min readLW link

[Question] What AI com­pa­nies would be most likely to have a pos­i­tive long-term im­pact on the world as a re­sult of in­vest­ing in them?

MikkW21 Sep 2020 23:41 UTC
7 points
2 comments2 min readLW link

An­thro­po­mor­phi­sa­tion vs value learn­ing: type 1 vs type 2 errors

Stuart_Armstrong22 Sep 2020 10:46 UTC
16 points
10 comments1 min readLW link

AI Ad­van­tages [Gems from the Wiki]

22 Sep 2020 22:44 UTC
22 points
7 comments2 min readLW link
(www.lesswrong.com)

A long re­ply to Ben Garfinkel on Scru­ti­niz­ing Clas­sic AI Risk Arguments

Søren Elverlin27 Sep 2020 17:51 UTC
16 points
6 comments1 min readLW link

De­hu­man­i­sa­tion *er­rors*

Stuart_Armstrong23 Sep 2020 9:51 UTC
13 points
0 comments1 min readLW link

[AN #118]: Risks, solu­tions, and pri­ori­ti­za­tion in a world with many AI systems

rohinmshah23 Sep 2020 18:20 UTC
15 points
6 comments10 min readLW link
(mailchi.mp)

[Question] David Deutsch on Univer­sal Ex­plain­ers and AI

alanf24 Sep 2020 7:50 UTC
1 point
8 comments2 min readLW link

KL Diver­gence as Code Patch­ing Efficiency

Zachary Robertson27 Sep 2020 16:06 UTC
15 points
0 comments8 min readLW link

[Question] What to do with imi­ta­tion hu­mans, other than ask­ing them what the right thing to do is?

Charlie Steiner27 Sep 2020 21:51 UTC
10 points
6 comments1 min readLW link

[Question] What De­ci­sion The­ory is Im­plied By Pre­dic­tive Pro­cess­ing?

johnswentworth28 Sep 2020 17:20 UTC
52 points
17 comments1 min readLW link

AGI safety from first prin­ci­ples: Superintelligence

Richard_Ngo28 Sep 2020 19:53 UTC
64 points
2 comments9 min readLW link

AGI safety from first prin­ci­ples: Introduction

Richard_Ngo28 Sep 2020 19:53 UTC
91 points
14 comments2 min readLW link

[Question] Ex­am­ples of self-gov­er­nance to re­duce tech­nol­ogy risk?

Jia29 Sep 2020 19:31 UTC
10 points
4 comments1 min readLW link

AGI safety from first prin­ci­ples: Goals and Agency

Richard_Ngo29 Sep 2020 19:06 UTC
51 points
14 comments15 min readLW link

“Un­su­per­vised” trans­la­tion as an (in­tent) al­ign­ment problem

paulfchristiano30 Sep 2020 0:50 UTC
60 points
15 comments4 min readLW link
(ai-alignment.com)

[AN #119]: AI safety when agents are shaped by en­vi­ron­ments, not rewards

rohinmshah30 Sep 2020 17:10 UTC
11 points
0 comments11 min readLW link
(mailchi.mp)

AGI safety from first prin­ci­ples: Alignment

Richard_Ngo1 Oct 2020 3:13 UTC
48 points
2 comments13 min readLW link

AGI safety from first prin­ci­ples: Control

Richard_Ngo2 Oct 2020 21:51 UTC
48 points
3 comments9 min readLW link

AI race con­sid­er­a­tions in a re­port by the U.S. House Com­mit­tee on Armed Services

NunoSempere4 Oct 2020 12:11 UTC
41 points
4 comments13 min readLW link

[Question] Is there any work on in­cor­po­rat­ing aleatoric un­cer­tainty and/​or in­her­ent ran­dom­ness into AIXI?

capybaralet4 Oct 2020 8:10 UTC
7 points
7 comments1 min readLW link

AGI safety from first prin­ci­ples: Conclusion

Richard_Ngo4 Oct 2020 23:06 UTC
51 points
2 comments3 min readLW link

Univer­sal Eudaimonia

hg005 Oct 2020 13:45 UTC
17 points
6 comments2 min readLW link

The Align­ment Prob­lem: Ma­chine Learn­ing and Hu­man Values

rohinmshah6 Oct 2020 17:41 UTC
109 points
5 comments6 min readLW link
(www.amazon.com)

[AN #120]: Trac­ing the in­tel­lec­tual roots of AI and AI alignment

rohinmshah7 Oct 2020 17:10 UTC
13 points
4 comments10 min readLW link
(mailchi.mp)

[Question] Brain­storm­ing pos­i­tive vi­sions of AI

jungofthewon7 Oct 2020 16:09 UTC
48 points
25 comments1 min readLW link

[Question] How can an AI demon­strate purely through chat that it is an AI, and not a hu­man?

hugh.mann7 Oct 2020 17:53 UTC
3 points
4 comments1 min readLW link

[Question] Why isn’t JS a pop­u­lar lan­guage for deep learn­ing?

Will Clark8 Oct 2020 14:36 UTC
12 points
21 comments1 min readLW link

[Question] If GPT-6 is hu­man-level AGI but costs $200 per page of out­put, what would hap­pen?

Daniel Kokotajlo9 Oct 2020 12:00 UTC
28 points
30 comments1 min readLW link

[Question] Shouldn’t there be a Chi­nese trans­la­tion of Hu­man Com­pat­i­ble?

MakoYass9 Oct 2020 8:47 UTC
18 points
13 comments1 min readLW link

Ideal­ized Fac­tored Cognition

Rafael Harth30 Nov 2020 18:49 UTC
33 points
6 comments11 min readLW link

[Question] Re­views of the book ‘The Align­ment Prob­lem’

Mati_Roy11 Oct 2020 7:41 UTC
8 points
3 comments1 min readLW link

[Question] Re­views of TV show NeXt (about AI safety)

Mati_Roy11 Oct 2020 4:31 UTC
25 points
4 comments1 min readLW link

The Achilles Heel Hy­poth­e­sis for AI

scasper13 Oct 2020 14:35 UTC
20 points
6 comments1 min readLW link

Toy Prob­lem: De­tec­tive Story Alignment

johnswentworth13 Oct 2020 21:02 UTC
34 points
4 comments2 min readLW link

[Question] Does any­one worry about A.I. fo­rums like this where they re­in­force each other’s bi­ases/​ are led by big tech?

misabella1613 Oct 2020 15:14 UTC
4 points
3 comments1 min readLW link

[AN #121]: Fore­cast­ing trans­for­ma­tive AI timelines us­ing biolog­i­cal anchors

rohinmshah14 Oct 2020 17:20 UTC
22 points
5 comments14 min readLW link
(mailchi.mp)

Gra­di­ent hacking

evhub16 Oct 2019 0:53 UTC
74 points
34 comments3 min readLW link2 nominations2 reviews

Im­pact mea­sure­ment and value-neu­tral­ity verification

evhub15 Oct 2019 0:06 UTC
31 points
13 comments6 min readLW link

Outer al­ign­ment and imi­ta­tive amplification

evhub10 Jan 2020 0:26 UTC
29 points
11 comments9 min readLW link

Safe ex­plo­ra­tion and corrigibility

evhub28 Dec 2019 23:12 UTC
17 points
4 comments4 min readLW link

[Question] What are some non-purely-sam­pling ways to do deep RL?

evhub5 Dec 2019 0:09 UTC
15 points
9 comments2 min readLW link

More vari­a­tions on pseudo-alignment

evhub4 Nov 2019 23:24 UTC
25 points
8 comments3 min readLW link

Towards an em­piri­cal in­ves­ti­ga­tion of in­ner alignment

evhub23 Sep 2019 20:43 UTC
43 points
9 comments6 min readLW link

Are min­i­mal cir­cuits de­cep­tive?

evhub7 Sep 2019 18:11 UTC
51 points
8 comments8 min readLW link

Con­crete ex­per­i­ments in in­ner alignment

evhub6 Sep 2019 22:16 UTC
60 points
12 comments6 min readLW link

Towards a mechanis­tic un­der­stand­ing of corrigibility

evhub22 Aug 2019 23:20 UTC
39 points
26 comments6 min readLW link

A Con­crete Pro­posal for Ad­ver­sar­ial IDA

evhub26 Mar 2019 19:50 UTC
16 points
5 comments5 min readLW link

Nuances with as­crip­tion universality

evhub12 Feb 2019 23:38 UTC
20 points
1 comment2 min readLW link

Box in­ver­sion hypothesis

Jan Kulveit20 Oct 2020 16:20 UTC
50 points
4 comments3 min readLW link

[Question] Has any­one re­searched speci­fi­ca­tion gam­ing with biolog­i­cal an­i­mals?

capybaralet21 Oct 2020 0:20 UTC
11 points
3 comments1 min readLW link

Sun­day Oc­to­ber 25, 12:00PM (PT) — Scott Garrabrant on “Carte­sian Frames”

Ben Pace21 Oct 2020 3:27 UTC
48 points
3 comments2 min readLW link

[Question] Could we use recom­mender sys­tems to figure out hu­man val­ues?

Olga Babeeva20 Oct 2020 21:35 UTC
7 points
0 comments1 min readLW link

[Question] When was the term “AI al­ign­ment” coined?

capybaralet21 Oct 2020 18:27 UTC
11 points
8 comments1 min readLW link

[AN #122]: Ar­gu­ing for AGI-driven ex­is­ten­tial risk from first principles

rohinmshah21 Oct 2020 17:10 UTC
28 points
0 comments9 min readLW link
(mailchi.mp)

[Question] What’s the differ­ence be­tween GAI and a gov­ern­ment?

AllAmericanBreakfast21 Oct 2020 23:04 UTC
11 points
5 comments1 min readLW link

Mo­ral AI: Options

Manfred11 Jul 2015 21:46 UTC
14 points
6 comments4 min readLW link

Can few-shot learn­ing teach AI right from wrong?

Charlie Steiner20 Jul 2018 7:45 UTC
13 points
3 comments6 min readLW link

Some Com­ments on Stu­art Arm­strong’s “Re­search Agenda v0.9”

Charlie Steiner8 Jul 2019 19:03 UTC
20 points
11 comments4 min readLW link

The Ar­tifi­cial In­ten­tional Stance

Charlie Steiner27 Jul 2019 7:00 UTC
12 points
0 comments4 min readLW link

What’s the dream for giv­ing nat­u­ral lan­guage com­mands to AI?

Charlie Steiner8 Oct 2019 13:42 UTC
8 points
8 comments7 min readLW link

Su­per­vised learn­ing of out­puts in the brain

Steven Byrnes26 Oct 2020 14:32 UTC
26 points
8 comments10 min readLW link

Hu­mans are stun­ningly ra­tio­nal and stun­ningly irrational

Stuart_Armstrong23 Oct 2020 14:13 UTC
21 points
4 comments2 min readLW link

Re­ply to Je­bari and Lund­borg on Ar­tifi­cial Superintelligence

Richard_Ngo25 Oct 2020 13:50 UTC
31 points
4 comments5 min readLW link
(thinkingcomplete.blogspot.com)

Ad­di­tive Oper­a­tions on Carte­sian Frames

Scott Garrabrant26 Oct 2020 15:12 UTC
60 points
6 comments11 min readLW link

Se­cu­rity Mind­set and Take­off Speeds

DanielFilan27 Oct 2020 3:20 UTC
53 points
23 comments8 min readLW link
(danielfilan.com)

Biex­ten­sional Equivalence

Scott Garrabrant28 Oct 2020 14:07 UTC
42 points
13 comments10 min readLW link

Draft pa­pers for REALab and De­cou­pled Ap­proval on tampering

Jonathan Uesato28 Oct 2020 16:01 UTC
46 points
2 comments1 min readLW link

[AN #123]: In­fer­ring what is valuable in or­der to al­ign recom­mender systems

rohinmshah28 Oct 2020 17:00 UTC
20 points
1 comment8 min readLW link
(mailchi.mp)

“Scal­ing Laws for Au­tore­gres­sive Gen­er­a­tive Model­ing”, Henighan et al 2020 {OA}

gwern29 Oct 2020 1:45 UTC
25 points
11 comments1 min readLW link
(arxiv.org)

Con­trol­lables and Ob­serv­ables, Revisited

Scott Garrabrant29 Oct 2020 16:38 UTC
33 points
5 comments8 min readLW link

AI risk hub in Sin­ga­pore?

Daniel Kokotajlo29 Oct 2020 11:45 UTC
50 points
18 comments4 min readLW link

Func­tors and Coarse Worlds

Scott Garrabrant30 Oct 2020 15:19 UTC
48 points
4 comments8 min readLW link

[Question] Re­sponses to Chris­ti­ano on take­off speeds?

Richard_Ngo30 Oct 2020 15:16 UTC
28 points
7 comments1 min readLW link

/​r/​MLS­cal­ing: new sub­red­dit for NN scal­ing re­search/​discussion

gwern30 Oct 2020 20:50 UTC
19 points
0 comments1 min readLW link
(www.reddit.com)

“In­ner Align­ment Failures” Which Are Ac­tu­ally Outer Align­ment Failures

johnswentworth31 Oct 2020 20:18 UTC
51 points
38 comments5 min readLW link

Au­to­mated in­tel­li­gence is not AI

KatjaGrace1 Nov 2020 23:30 UTC
53 points
10 comments2 min readLW link
(meteuphoric.com)

Con­fu­ci­anism in AI Alignment

johnswentworth2 Nov 2020 21:16 UTC
33 points
28 comments6 min readLW link

[AN #124]: Prov­ably safe ex­plo­ra­tion through shielding

rohinmshah4 Nov 2020 18:20 UTC
13 points
0 comments9 min readLW link
(mailchi.mp)

Defin­ing ca­pa­bil­ity and al­ign­ment in gra­di­ent descent

Edouard Harris5 Nov 2020 14:36 UTC
21 points
6 comments10 min readLW link

Sub-Sums and Sub-Tensors

Scott Garrabrant5 Nov 2020 18:06 UTC
33 points
4 comments8 min readLW link

Mul­ti­plica­tive Oper­a­tions on Carte­sian Frames

Scott Garrabrant3 Nov 2020 19:27 UTC
33 points
23 comments12 min readLW link

Subagents of Carte­sian Frames

Scott Garrabrant2 Nov 2020 22:02 UTC
47 points
4 comments8 min readLW link

[Question] What con­sid­er­a­tions in­fluence whether I have more in­fluence over short or long timelines?

Daniel Kokotajlo5 Nov 2020 19:56 UTC
24 points
30 comments1 min readLW link

Ad­di­tive and Mul­ti­plica­tive Subagents

Scott Garrabrant6 Nov 2020 14:26 UTC
19 points
7 comments12 min readLW link

Com­mit­ting, As­sum­ing, Ex­ter­nal­iz­ing, and Internalizing

Scott Garrabrant9 Nov 2020 16:59 UTC
30 points
25 comments10 min readLW link

Build­ing AGI Us­ing Lan­guage Models

leogao9 Nov 2020 16:33 UTC
11 points
1 comment1 min readLW link
(leogao.dev)

Why You Should Care About Goal-Directedness

adamShimi9 Nov 2020 12:48 UTC
31 points
15 comments9 min readLW link

Clar­ify­ing in­ner al­ign­ment terminology

evhub9 Nov 2020 20:40 UTC
69 points
15 comments3 min readLW link

Eight Defi­ni­tions of Observability

Scott Garrabrant10 Nov 2020 23:37 UTC
33 points
26 comments12 min readLW link

[AN #125]: Neu­ral net­work scal­ing laws across mul­ti­ple modalities

rohinmshah11 Nov 2020 18:20 UTC
25 points
7 comments9 min readLW link
(mailchi.mp)

Time in Carte­sian Frames

Scott Garrabrant11 Nov 2020 20:25 UTC
46 points
16 comments7 min readLW link

Learn­ing Nor­ma­tivity: A Re­search Agenda

abramdemski11 Nov 2020 21:59 UTC
70 points
18 comments19 min readLW link

[Question] Any work on hon­ey­pots (to de­tect treach­er­ous turn at­tempts)?

capybaralet12 Nov 2020 5:41 UTC
16 points
4 comments1 min readLW link

Misal­ign­ment and mi­suse: whose val­ues are man­i­fest?

KatjaGrace13 Nov 2020 10:10 UTC
37 points
7 comments2 min readLW link
(meteuphoric.com)

A Self-Embed­ded Prob­a­bil­is­tic Model

johnswentworth13 Nov 2020 20:36 UTC
30 points
2 comments5 min readLW link

TU Darm­stadt, Com­puter Science Master’s with a fo­cus on Ma­chine Learning

Master Programs ML/AI14 Nov 2020 15:50 UTC
6 points
0 comments8 min readLW link

EPF Lau­sanne, ML re­lated MSc programs

Master Programs ML/AI14 Nov 2020 15:51 UTC
2 points
0 comments4 min readLW link

ETH Zurich, ML re­lated MSc programs

Master Programs ML/AI14 Nov 2020 15:49 UTC
2 points
0 comments10 min readLW link

Univer­sity of Oxford, Master’s Statis­ti­cal Science

Master Programs ML/AI14 Nov 2020 15:51 UTC
2 points
0 comments3 min readLW link

Univer­sity of Ed­in­burgh, Master’s Ar­tifi­cial Intelligence

Master Programs ML/AI14 Nov 2020 15:49 UTC
3 points
0 comments12 min readLW link

Univer­sity of Am­s­ter­dam (UvA), Master’s Ar­tifi­cial Intelligence

Master Programs ML/AI14 Nov 2020 15:49 UTC
8 points
4 comments21 min readLW link

Univer­sity of Tübin­gen, Master’s Ma­chine Learning

Master Programs ML/AI14 Nov 2020 15:50 UTC
9 points
0 comments7 min readLW link

A guide to Iter­ated Am­plifi­ca­tion & Debate

Rafael Harth15 Nov 2020 17:14 UTC
58 points
8 comments15 min readLW link

Solomonoff In­duc­tion and Sleep­ing Beauty

ike17 Nov 2020 2:28 UTC
7 points
0 comments2 min readLW link

The Poin­t­ers Prob­lem: Hu­man Values Are A Func­tion Of Hu­mans’ La­tent Variables

johnswentworth18 Nov 2020 17:47 UTC
45 points
35 comments11 min readLW link

The ethics of AI for the Rout­ledge En­cy­clo­pe­dia of Philosophy

Stuart_Armstrong18 Nov 2020 17:55 UTC
45 points
8 comments1 min readLW link

Per­sua­sion Tools: AI takeover with­out AGI or agency?

Daniel Kokotajlo20 Nov 2020 16:54 UTC
49 points
14 comments11 min readLW link

UDT might not pay a Coun­ter­fac­tual Mugger

winwonce21 Nov 2020 23:27 UTC
5 points
18 comments2 min readLW link

Chang­ing the AI race pay­off matrix

Gurkenglas22 Nov 2020 22:25 UTC
7 points
2 comments1 min readLW link

Syn­tax, se­man­tics, and sym­bol ground­ing, simplified

Stuart_Armstrong23 Nov 2020 16:12 UTC
25 points
4 comments9 min readLW link

Com­men­tary on AGI Safety from First Principles

Richard_Ngo23 Nov 2020 21:37 UTC
74 points
3 comments54 min readLW link

[Question] Cri­tiques of the Agent Foun­da­tions agenda?

Jsevillamol24 Nov 2020 16:11 UTC
15 points
3 comments1 min readLW link

[Question] How should OpenAI com­mu­ni­cate about the com­mer­cial perfor­mances of the GPT-3 API?

Maxime Riché24 Nov 2020 8:34 UTC
2 points
0 comments1 min readLW link

[AN #126]: Avoid­ing wire­head­ing by de­cou­pling ac­tion feed­back from ac­tion effects

rohinmshah26 Nov 2020 23:20 UTC
24 points
1 comment10 min readLW link
(mailchi.mp)

[Question] Is this a good way to bet on short timelines?

Daniel Kokotajlo28 Nov 2020 12:51 UTC
16 points
8 comments1 min readLW link

Pre­face to the Se­quence on Fac­tored Cognition

Rafael Harth30 Nov 2020 18:49 UTC
35 points
7 comments2 min readLW link

[Linkpost] AlphaFold: a solu­tion to a 50-year-old grand challenge in biology

adamShimi30 Nov 2020 17:33 UTC
54 points
22 comments1 min readLW link
(deepmind.com)

What is “pro­tein fold­ing”? A brief explanation

jasoncrawford1 Dec 2020 2:46 UTC
63 points
9 comments4 min readLW link
(rootsofprogress.org)

[Question] In a mul­ti­po­lar sce­nario, how do peo­ple ex­pect sys­tems to be trained to in­ter­act with sys­tems de­vel­oped by other labs?

JesseClifton1 Dec 2020 20:04 UTC
11 points
6 comments1 min readLW link

[AN #127]: Re­think­ing agency: Carte­sian frames as a for­mal­iza­tion of ways to carve up the world into an agent and its environment

rohinmshah2 Dec 2020 18:20 UTC
46 points
0 comments13 min readLW link
(mailchi.mp)

Beyond 175 billion pa­ram­e­ters: Can we an­ti­ci­pate fu­ture GPT-X Ca­pa­bil­ities?

bakztfuture4 Dec 2020 23:42 UTC
1 point
1 comment2 min readLW link

Thoughts on Robin Han­son’s AI Im­pacts interview

Steven Byrnes24 Nov 2019 1:40 UTC
25 points
3 comments7 min readLW link

[RXN#7] Rus­sian x-risks newslet­ter fall 2020

avturchin5 Dec 2020 16:28 UTC
12 points
0 comments3 min readLW link

The AI Safety Game (UPDATED)

Daniel Kokotajlo5 Dec 2020 10:27 UTC
38 points
5 comments3 min readLW link

Values Form a Shift­ing Land­scape (and why you might care)

VojtaKovarik5 Dec 2020 23:56 UTC
24 points
5 comments4 min readLW link

AI Prob­lems Shared by Non-AI Systems

VojtaKovarik5 Dec 2020 22:15 UTC
7 points
2 comments4 min readLW link

Chance that “AI safety ba­si­cally [doesn’t need] to be solved, we’ll just solve it by de­fault un­less we’re com­pletely com­pletely care­less”

8 Dec 2020 21:08 UTC
27 points
0 comments5 min readLW link

Min­i­mal Maps, Semi-De­ci­sions, and Neu­ral Representations

Zachary Robertson6 Dec 2020 15:15 UTC
30 points
2 comments4 min readLW link

Launch­ing the Fore­cast­ing AI Progress Tournament

Tamay7 Dec 2020 14:08 UTC
18 points
0 comments1 min readLW link
(www.metaculus.com)

[AN #128]: Pri­ori­tiz­ing re­search on AI ex­is­ten­tial safety based on its ap­pli­ca­tion to gov­er­nance demands

rohinmshah9 Dec 2020 18:20 UTC
16 points
2 comments10 min readLW link
(mailchi.mp)

Sum­mary of AI Re­search Con­sid­er­a­tions for Hu­man Ex­is­ten­tial Safety (ARCHES)

peterbarnett9 Dec 2020 23:28 UTC
3 points
0 comments13 min readLW link

Clar­ify­ing Fac­tored Cognition

Rafael Harth13 Dec 2020 20:02 UTC
23 points
2 comments3 min readLW link

Ho­mo­gene­ity vs. het­ero­gene­ity in AI take­off scenarios

evhub16 Dec 2020 1:37 UTC
82 points
48 comments4 min readLW link

LBIT Proofs 8: Propo­si­tions 53-58

Diffractor16 Dec 2020 3:29 UTC
7 points
0 comments18 min readLW link

LBIT Proofs 6: Propo­si­tions 39-47

Diffractor16 Dec 2020 3:33 UTC
7 points
0 comments23 min readLW link

LBIT Proofs 5: Propo­si­tions 29-38

Diffractor16 Dec 2020 3:35 UTC
7 points
0 comments21 min readLW link

LBIT Proofs 3: Propo­si­tions 19-22

Diffractor16 Dec 2020 3:40 UTC
7 points
0 comments17 min readLW link

LBIT Proofs 2: Propo­si­tions 10-18

Diffractor16 Dec 2020 3:45 UTC
7 points
0 comments20 min readLW link

LBIT Proofs 1: Propo­si­tions 1-9

Diffractor16 Dec 2020 3:48 UTC
7 points
0 comments25 min readLW link

LBIT Proofs 4: Propo­si­tions 22-28

Diffractor16 Dec 2020 3:38 UTC
7 points
0 comments17 min readLW link

LBIT Proofs 7: Propo­si­tions 48-52

Diffractor16 Dec 2020 3:31 UTC
7 points
0 comments20 min readLW link

Less Ba­sic In­framea­sure Theory

Diffractor16 Dec 2020 3:52 UTC
22 points
1 comment61 min readLW link

[AN #129]: Ex­plain­ing dou­ble de­scent by mea­sur­ing bias and variance

rohinmshah16 Dec 2020 18:10 UTC
14 points
1 comment7 min readLW link
(mailchi.mp)

Ma­chine learn­ing could be fun­da­men­tally unexplainable

George16 Dec 2020 13:32 UTC
25 points
15 comments15 min readLW link
(cerebralab.com)

Beta test GPT-3 based re­search assistant

jungofthewon16 Dec 2020 13:42 UTC
33 points
2 comments1 min readLW link

[Question] How long till In­verse AlphaFold?

Daniel Kokotajlo17 Dec 2020 19:56 UTC
41 points
18 comments1 min readLW link

Hier­ar­chi­cal plan­ning: con­text agents

Charlie Steiner19 Dec 2020 11:24 UTC
13 points
6 comments9 min readLW link

[Question] Is there a com­mu­nity al­igned with the idea of cre­at­ing species of AGI sys­tems for them to be­come our suc­ces­sors?

iamhefesto20 Dec 2020 19:06 UTC
−2 points
7 comments1 min readLW link

Intuition

Rafael Harth20 Dec 2020 21:49 UTC
26 points
1 comment6 min readLW link

2020 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

Larks21 Dec 2020 15:27 UTC
133 points
14 comments68 min readLW link

TAI Safety Biblio­graphic Database

JessRiedel22 Dec 2020 17:42 UTC
60 points
10 comments17 min readLW link

An­nounc­ing AXRP, the AI X-risk Re­search Podcast

DanielFilan23 Dec 2020 20:00 UTC
52 points
6 comments1 min readLW link
(danielfilan.com)

[AN #130]: A new AI x-risk pod­cast, and re­views of the field

rohinmshah24 Dec 2020 18:20 UTC
8 points
0 comments7 min readLW link
(mailchi.mp)

Can we model tech­nolog­i­cal sin­gu­lar­ity as the phase tran­si­tion?

Just Learning26 Dec 2020 3:20 UTC
3 points
0 comments4 min readLW link

AGI Align­ment Should Solve Cor­po­rate Alignment

magfrump27 Dec 2020 2:23 UTC
19 points
6 comments6 min readLW link

Against GDP as a met­ric for timelines and take­off speeds

Daniel Kokotajlo29 Dec 2020 17:42 UTC
112 points
13 comments14 min readLW link

AXRP Epi­sode 3 - Ne­go­tiable Re­in­force­ment Learn­ing with An­drew Critch

DanielFilan29 Dec 2020 20:45 UTC
26 points
0 comments27 min readLW link

AXRP Epi­sode 1 - Ad­ver­sar­ial Poli­cies with Adam Gleave

DanielFilan29 Dec 2020 20:41 UTC
10 points
5 comments33 min readLW link

AXRP Epi­sode 2 - Learn­ing Hu­man Bi­ases with Ro­hin Shah

DanielFilan29 Dec 2020 20:43 UTC
11 points
0 comments35 min readLW link

Dario Amodei leaves OpenAI

Daniel Kokotajlo29 Dec 2020 19:31 UTC
64 points
10 comments1 min readLW link

[Question] What Are Some Alter­na­tive Ap­proaches to Un­der­stand­ing Agency/​In­tel­li­gence?

interstice29 Dec 2020 23:21 UTC
15 points
12 comments1 min readLW link

Why Neu­ral Net­works Gen­er­al­ise, and Why They Are (Kind of) Bayesian

Joar Skalse29 Dec 2020 13:33 UTC
53 points
53 comments1 min readLW link

De­bate Minus Fac­tored Cognition

abramdemski29 Dec 2020 22:59 UTC
37 points
42 comments11 min readLW link

[AN #131]: For­mal­iz­ing the ar­gu­ment of ig­nored at­tributes in a util­ity function

rohinmshah31 Dec 2020 18:20 UTC
9 points
2 comments19 min readLW link
(mailchi.mp)

Reflec­tions on Larks’ 2020 AI al­ign­ment liter­a­ture review

alexflint1 Jan 2021 22:53 UTC
77 points
8 comments6 min readLW link

Men­tal sub­agent im­pli­ca­tions for AI Safety

moridinamael3 Jan 2021 18:59 UTC
11 points
0 comments3 min readLW link

The Na­tional Defense Autho­riza­tion Act Con­tains AI Provisions

ryan_b5 Jan 2021 15:51 UTC
24 points
24 comments1 min readLW link

The Poin­t­ers Prob­lem: Clar­ifi­ca­tions/​Variations

abramdemski5 Jan 2021 17:29 UTC
46 points
6 comments18 min readLW link

[AN #132]: Com­plex and sub­tly in­cor­rect ar­gu­ments as an ob­sta­cle to debate

rohinmshah6 Jan 2021 18:20 UTC
18 points
1 comment19 min readLW link
(mailchi.mp)

Out-of-body rea­son­ing (OOBR)

Jon Zero9 Jan 2021 16:10 UTC
4 points
0 comments4 min readLW link

Re­view of Soft Take­off Can Still Lead to DSA

Daniel Kokotajlo10 Jan 2021 18:10 UTC
72 points
13 comments6 min readLW link

Re­view of ‘De­bate on In­stru­men­tal Con­ver­gence be­tween LeCun, Rus­sell, Ben­gio, Zador, and More’

TurnTrout12 Jan 2021 3:57 UTC
37 points
1 comment2 min readLW link

[AN #133]: Build­ing ma­chines that can co­op­er­ate (with hu­mans, in­sti­tu­tions, or other ma­chines)

rohinmshah13 Jan 2021 18:10 UTC
14 points
0 comments9 min readLW link
(mailchi.mp)

An Ex­plo­ra­tory Toy AI Take­off Model

niplav13 Jan 2021 18:13 UTC
8 points
3 comments12 min readLW link

Some re­cent sur­vey pa­pers on (mostly near-term) AI safety, se­cu­rity, and assurance

alenglander13 Jan 2021 21:50 UTC
11 points
0 comments3 min readLW link

A po­ten­tial prob­lem with re­duced impact

Chantiel14 Jan 2021 0:59 UTC
1 point
0 comments2 min readLW link

Thoughts on Ia­son Gabriel’s Ar­tifi­cial In­tel­li­gence, Values, and Alignment

alexflint14 Jan 2021 12:58 UTC
34 points
14 comments4 min readLW link

Why I’m ex­cited about Debate

Richard_Ngo15 Jan 2021 23:37 UTC
66 points
12 comments7 min readLW link

Ex­cerpt from Ar­bital Solomonoff in­duc­tion dialogue

Richard_Ngo17 Jan 2021 3:49 UTC
36 points
6 comments5 min readLW link
(arbital.com)

Short sum­mary of mAIry’s room

Stuart_Armstrong18 Jan 2021 18:11 UTC
26 points
2 comments4 min readLW link

DALL-E does sym­bol grounding

p.b.17 Jan 2021 21:20 UTC
5 points
0 comments1 min readLW link

Some thoughts on risks from nar­row, non-agen­tic AI

Richard_Ngo19 Jan 2021 0:04 UTC
31 points
18 comments16 min readLW link

Against the Back­ward Ap­proach to Goal-Directedness

adamShimi19 Jan 2021 18:46 UTC
19 points
6 comments4 min readLW link

[AN #134]: Un­der­speci­fi­ca­tion as a cause of frag­ility to dis­tri­bu­tion shift

rohinmshah21 Jan 2021 18:10 UTC
13 points
0 comments7 min readLW link
(mailchi.mp)

Coun­ter­fac­tual con­trol incentives

Stuart_Armstrong21 Jan 2021 16:54 UTC
20 points
10 comments9 min readLW link

Policy re­stric­tions and Se­cret keep­ing AI

Donald Hobson24 Jan 2021 20:59 UTC
6 points
3 comments3 min readLW link

FC fi­nal: Can Fac­tored Cog­ni­tion schemes scale?

Rafael Harth24 Jan 2021 22:18 UTC
14 points
0 comments17 min readLW link

[AN #135]: Five prop­er­ties of goal-di­rected systems

rohinmshah27 Jan 2021 18:10 UTC
33 points
0 comments8 min readLW link
(mailchi.mp)

AMA on EA Fo­rum: Ajeya Co­tra, re­searcher at Open Phil

Ajeya Cotra29 Jan 2021 23:05 UTC
16 points
0 comments1 min readLW link
(forum.effectivealtruism.org)

Play with neu­ral net

KatjaGrace30 Jan 2021 10:50 UTC
15 points
0 comments1 min readLW link
(worldspiritsockpuppet.com)

A Cri­tique of Non-Obstruction

Joe_Collman3 Feb 2021 8:45 UTC
13 points
10 comments4 min readLW link

Dist­in­guish­ing claims about train­ing vs deployment

Richard_Ngo3 Feb 2021 11:30 UTC
50 points
30 comments9 min readLW link

Graph­i­cal World Models, Coun­ter­fac­tu­als, and Ma­chine Learn­ing Agents

Koen.Holtman17 Feb 2021 11:07 UTC
6 points
2 comments10 min readLW link

OpenAI: “Scal­ing Laws for Trans­fer”, Her­nan­dez et al.

Lanrian4 Feb 2021 12:49 UTC
13 points
3 comments1 min readLW link
(arxiv.org)

Evolu­tions Build­ing Evolu­tions: Lay­ers of Gen­er­ate and Test

plex5 Feb 2021 18:21 UTC
11 points
1 comment6 min readLW link

Episte­mol­ogy of HCH

adamShimi9 Feb 2021 11:46 UTC
15 points
2 comments10 min readLW link

[Question] Math­e­mat­i­cal Models of Progress?

abramdemski16 Feb 2021 0:21 UTC
28 points
8 comments2 min readLW link

[Question] Sugges­tions of posts on the AF to review

adamShimi16 Feb 2021 12:40 UTC
50 points
17 comments1 min readLW link

Disen­tan­gling Cor­rigi­bil­ity: 2015-2021

Koen.Holtman16 Feb 2021 18:01 UTC
15 points
20 comments9 min readLW link

Carte­sian frames as gen­er­al­ised models

Stuart_Armstrong16 Feb 2021 16:09 UTC
20 points
0 comments5 min readLW link

[AN #138]: Why AI gov­er­nance should find prob­lems rather than just solv­ing them

rohinmshah17 Feb 2021 18:50 UTC
12 points
0 comments9 min readLW link
(mailchi.mp)

Safely con­trol­ling the AGI agent re­ward function

Koen.Holtman17 Feb 2021 14:47 UTC
7 points
0 comments5 min readLW link

AXRP Epi­sode 4 - Risks from Learned Op­ti­miza­tion with Evan Hubinger

DanielFilan18 Feb 2021 0:03 UTC
41 points
10 comments86 min readLW link

Utility Max­i­miza­tion = De­scrip­tion Length Minimization

johnswentworth18 Feb 2021 18:04 UTC
142 points
29 comments5 min readLW link

Google’s Eth­i­cal AI team and AI Safety

magfrump20 Feb 2021 9:42 UTC
12 points
15 comments7 min readLW link

AI Safety Begin­ners Meetup (Euro­pean Time)

Linda Linsefors20 Feb 2021 13:20 UTC
8 points
2 comments1 min readLW link

Min­i­mal Map Constraints

Zachary Robertson21 Feb 2021 17:49 UTC
6 points
0 comments3 min readLW link

[AN #139]: How the sim­plic­ity of re­al­ity ex­plains the suc­cess of neu­ral nets

rohinmshah24 Feb 2021 18:30 UTC
26 points
3 comments12 min readLW link
(mailchi.mp)

My Thoughts on the Ap­per­cep­tion Engine

Jemist25 Feb 2021 19:43 UTC
3 points
1 comment3 min readLW link

The Case for Pri­vacy Optimism

bmgarfinkel10 Mar 2020 20:30 UTC
43 points
1 comment32 min readLW link
(benmgarfinkel.wordpress.com)

[Question] How might cryp­tocur­ren­cies af­fect AGI timelines?

Telofy28 Feb 2021 19:16 UTC
7 points
38 comments2 min readLW link

Fun with +12 OOMs of Compute

Daniel Kokotajlo1 Mar 2021 13:30 UTC
123 points
62 comments12 min readLW link

Links for Feb 2021

ike1 Mar 2021 5:13 UTC
6 points
0 comments6 min readLW link
(misinfounderload.substack.com)

In­tro­duc­tion to Re­in­force­ment Learning

Dr. Birdbrain28 Feb 2021 23:03 UTC
4 points
1 comment3 min readLW link

Cu­ri­os­ity about Align­ing Values

esweet3 Mar 2021 0:22 UTC
3 points
7 comments1 min readLW link

How does bee learn­ing com­pare with ma­chine learn­ing?

guicosta4 Mar 2021 1:59 UTC
50 points
11 comments24 min readLW link

Some re­cent in­ter­views with AI/​math lu­mi­nar­ies.

fowlertm4 Mar 2021 1:26 UTC
0 points
0 comments1 min readLW link

A Semitech­ni­cal In­tro­duc­tory Dialogue on Solomonoff Induction

Eliezer Yudkowsky4 Mar 2021 17:27 UTC
92 points
16 comments54 min readLW link

Con­nect­ing the good reg­u­la­tor the­o­rem with se­man­tics and sym­bol grounding

Stuart_Armstrong4 Mar 2021 14:35 UTC
11 points
0 comments2 min readLW link

[AN #140]: The­o­ret­i­cal mod­els that pre­dict scal­ing laws

rohinmshah4 Mar 2021 18:10 UTC
45 points
0 comments10 min readLW link
(mailchi.mp)

Take­aways from the In­tel­li­gence Ris­ing RPG

5 Mar 2021 10:27 UTC
47 points
8 comments12 min readLW link

GPT-3 and the fu­ture of knowl­edge work

fowlertm5 Mar 2021 17:40 UTC
16 points
0 comments2 min readLW link

The case for al­ign­ing nar­rowly su­per­hu­man models

Ajeya Cotra5 Mar 2021 22:29 UTC
169 points
72 comments38 min readLW link

MIRI com­ments on Co­tra’s “Case for Align­ing Nar­rowly Su­per­hu­man Models”

Rob Bensinger5 Mar 2021 23:43 UTC
124 points
13 comments26 min readLW link

[Question] What are the biggest cur­rent im­pacts of AI?

Sam Clarke7 Mar 2021 21:44 UTC
15 points
4 comments1 min readLW link

CLR’s re­cent work on multi-agent systems

JesseClifton9 Mar 2021 2:28 UTC
50 points
0 comments13 min readLW link

De-con­fus­ing my­self about Pas­cal’s Mug­ging and New­comb’s Problem

AllAmericanBreakfast9 Mar 2021 20:45 UTC
7 points
1 comment3 min readLW link

Open Prob­lems with Myopia

10 Mar 2021 18:38 UTC
42 points
13 comments8 min readLW link

[AN #141]: The case for prac­tic­ing al­ign­ment work on GPT-3 and other large models

rohinmshah10 Mar 2021 18:30 UTC
26 points
4 comments8 min readLW link
(mailchi.mp)

[Link] Whit­tle­stone et al., The So­cietal Im­pli­ca­tions of Deep Re­in­force­ment Learning

alenglander10 Mar 2021 18:13 UTC
11 points
1 comment1 min readLW link
(jair.org)

Four Mo­ti­va­tions for Learn­ing Normativity

abramdemski11 Mar 2021 20:13 UTC
42 points
7 comments5 min readLW link

[Question] What’s a good way to test ba­sic ma­chine learn­ing code?

Kenny11 Mar 2021 21:27 UTC
5 points
9 comments1 min readLW link

[Video] In­tel­li­gence and Stu­pidity: The Orthog­o­nal­ity Thesis

plex13 Mar 2021 0:32 UTC
5 points
1 comment1 min readLW link
(www.youtube.com)

AI x-risk re­duc­tion: why I chose academia over industry

capybaralet14 Mar 2021 17:25 UTC
51 points
13 comments3 min readLW link

[Question] Par­tial-Con­scious­ness as se­man­tic/​sym­bolic rep­re­sen­ta­tional lan­guage model trained on NN

Joe Kwon16 Mar 2021 18:51 UTC
2 points
3 comments1 min readLW link

[AN #142]: The quest to un­der­stand a net­work well enough to reim­ple­ment it by hand

rohinmshah17 Mar 2021 17:10 UTC
34 points
4 comments8 min readLW link
(mailchi.mp)

In­ter­mit­tent Distil­la­tions #1

Mark Xu17 Mar 2021 5:15 UTC
25 points
1 comment10 min readLW link

HCH Spec­u­la­tion Post #2A

Charlie Steiner17 Mar 2021 13:26 UTC
39 points
7 comments9 min readLW link

The Age of Imag­i­na­tive Machines

Yuli_Ban18 Mar 2021 0:35 UTC
10 points
1 comment11 min readLW link

Gen­er­al­iz­ing Power to multi-agent games

22 Mar 2021 2:41 UTC
43 points
17 comments7 min readLW link

My re­search methodology

paulfchristiano22 Mar 2021 21:20 UTC
135 points
35 comments16 min readLW link
(ai-alignment.com)

“In­fra-Bayesi­anism with Vanessa Kosoy” – Watch/​Dis­cuss Party

Ben Pace22 Mar 2021 23:44 UTC
27 points
42 comments1 min readLW link

Prefer­ences and bi­ases, the in­for­ma­tion argument

Stuart_Armstrong23 Mar 2021 12:44 UTC
14 points
5 comments1 min readLW link

[AN #143]: How to make em­bed­ded agents that rea­son prob­a­bil­is­ti­cally about their environments

rohinmshah24 Mar 2021 17:20 UTC
13 points
3 comments8 min readLW link
(mailchi.mp)

Toy model of prefer­ence, bias, and ex­tra information

Stuart_Armstrong24 Mar 2021 10:14 UTC
9 points
0 comments4 min readLW link

On lan­guage mod­el­ing and fu­ture ab­stract rea­son­ing research

alexlyzhov25 Mar 2021 17:43 UTC
3 points
1 comment1 min readLW link
(docs.google.com)

In­framea­sures and Do­main Theory

Diffractor28 Mar 2021 9:19 UTC
26 points
3 comments33 min readLW link

In­fra-Do­main Proofs 2

Diffractor28 Mar 2021 9:15 UTC
13 points
0 comments21 min readLW link

In­fra-Do­main proofs 1

Diffractor28 Mar 2021 9:16 UTC
13 points
0 comments23 min readLW link

Sce­nar­ios and Warn­ing Signs for Ajeya’s Ag­gres­sive, Con­ser­va­tive, and Best Guess AI Timelines

Kevin Liu29 Mar 2021 1:38 UTC
24 points
1 comment9 min readLW link
(kliu.io)

[Question] How do we pre­pare for fi­nal crunch time?

Eli Tyre30 Mar 2021 5:47 UTC
102 points
27 comments8 min readLW link

[Question] TAI?

Logan Zoellner30 Mar 2021 12:41 UTC
20 points
8 comments1 min readLW link

A use for Clas­si­cal AI—Ex­pert Systems

Glpusna31 Mar 2021 2:37 UTC
1 point
2 comments2 min readLW link

What Mul­tipo­lar Failure Looks Like, and Ro­bust Agent-Ag­nos­tic Pro­cesses (RAAPs)

Andrew_Critch31 Mar 2021 23:50 UTC
117 points
47 comments22 min readLW link

AI and the Prob­a­bil­ity of Conflict

tonyoconnor1 Apr 2021 7:00 UTC
8 points
10 comments8 min readLW link

“AI and Com­pute” trend isn’t pre­dic­tive of what is happening

alexlyzhov2 Apr 2021 0:44 UTC
82 points
9 comments1 min readLW link

[AN #144]: How lan­guage mod­els can also be fine­tuned for non-lan­guage tasks

rohinmshah2 Apr 2021 17:20 UTC
19 points
0 comments6 min readLW link
(mailchi.mp)

2012 Robin Han­son com­ment on “In­tel­li­gence Ex­plo­sion: Ev­i­dence and Im­port”

Rob Bensinger2 Apr 2021 16:26 UTC
28 points
4 comments3 min readLW link

My take on Michael Littman on “The HCI of HAI”

alexflint2 Apr 2021 19:51 UTC
56 points
4 comments7 min readLW link

[Question] How do scal­ing laws work for fine-tun­ing?

Daniel Kokotajlo4 Apr 2021 12:18 UTC
24 points
10 comments1 min readLW link

Avert­ing suffer­ing with sen­tience throt­tlers (pro­posal)

Quinn5 Apr 2021 10:54 UTC
8 points
5 comments3 min readLW link

Reflec­tive Bayesianism

abramdemski6 Apr 2021 19:48 UTC
48 points
27 comments13 min readLW link

[Question] What will GPT-4 be in­ca­pable of?

Michaël Trazzi6 Apr 2021 19:57 UTC
31 points
32 comments1 min readLW link

I Trained a Neu­ral Net­work to Play Helltaker

lsusr7 Apr 2021 8:24 UTC
27 points
5 comments3 min readLW link

Another (outer) al­ign­ment failure story

paulfchristiano7 Apr 2021 20:12 UTC
127 points
20 comments12 min readLW link

[AN #145]: Our three year an­niver­sary!

rohinmshah9 Apr 2021 17:48 UTC
19 points
0 comments8 min readLW link
(mailchi.mp)

Align­ment Newslet­ter Three Year Retrospective

rohinmshah7 Apr 2021 14:39 UTC
54 points
0 comments5 min readLW link

Which coun­ter­fac­tu­als should an AI fol­low?

Stuart_Armstrong7 Apr 2021 16:47 UTC
19 points
5 comments7 min readLW link

Solv­ing the whole AGI con­trol prob­lem, ver­sion 0.0001

Steven Byrnes8 Apr 2021 15:14 UTC
41 points
4 comments26 min readLW link

The Ja­panese Quiz: a Thought Ex­per­i­ment of Statis­ti­cal Epistemology

DanB8 Apr 2021 17:37 UTC
9 points
0 comments9 min readLW link

A pos­si­ble prefer­ence algorithm

Stuart_Armstrong8 Apr 2021 18:25 UTC
22 points
0 comments4 min readLW link

If you don’t de­sign for ex­trap­o­la­tion, you’ll ex­trap­o­late poorly—pos­si­bly fatally

Stuart_Armstrong8 Apr 2021 18:10 UTC
17 points
0 comments4 min readLW link

AXRP Epi­sode 6 - De­bate and Imi­ta­tive Gen­er­al­iza­tion with Beth Barnes

DanielFilan8 Apr 2021 21:20 UTC
23 points
3 comments59 min readLW link

My Cur­rent Take on Counterfactuals

abramdemski9 Apr 2021 17:51 UTC
49 points
13 comments24 min readLW link

Opinions on In­ter­pretable Ma­chine Learn­ing and 70 Sum­maries of Re­cent Papers

9 Apr 2021 19:19 UTC
109 points
10 comments102 min readLW link

Why un­rig­gable *al­most* im­plies uninfluenceable

Stuart_Armstrong9 Apr 2021 17:07 UTC
11 points
0 comments4 min readLW link

In­ter­mit­tent Distil­la­tions #2

Mark Xu14 Apr 2021 6:47 UTC
23 points
4 comments9 min readLW link

Test Cases for Im­pact Reg­u­lari­sa­tion Methods

DanielFilan6 Feb 2019 21:50 UTC
58 points
5 comments12 min readLW link
(danielfilan.com)

Su­per­ra­tional Agents Kelly Bet In­fluence!

abramdemski16 Apr 2021 22:08 UTC
36 points
4 comments5 min readLW link

Defin­ing “op­ti­mizer”

Chantiel17 Apr 2021 15:38 UTC
6 points
4 comments1 min readLW link

Alex Flint on “A soft­ware en­g­ineer’s per­spec­tive on log­i­cal in­duc­tion”

Raemon17 Apr 2021 6:56 UTC
21 points
5 comments1 min readLW link

The Hu­man’s Hid­den Utility Func­tion (Maybe)

lukeprog23 Jan 2012 19:39 UTC
60 points
88 comments3 min readLW link

Us­ing vec­tor fields to vi­su­al­ise prefer­ences and make them consistent

28 Jan 2020 19:44 UTC
39 points
32 comments11 min readLW link

[Ar­ti­cle re­view] Ar­tifi­cial In­tel­li­gence, Values, and Alignment

MichaelA9 Mar 2020 12:42 UTC
13 points
5 comments10 min readLW link

Clar­ify­ing some key hy­pothe­ses in AI alignment

15 Aug 2019 21:29 UTC
75 points
11 comments9 min readLW link

Failures in tech­nol­ogy fore­cast­ing? A re­ply to Ord and Yudkowsky

MichaelA8 May 2020 12:41 UTC
44 points
19 comments11 min readLW link

[Link and com­men­tary] The Offense-Defense Balance of Scien­tific Knowl­edge: Does Pub­lish­ing AI Re­search Re­duce Mi­suse?

MichaelA16 Feb 2020 19:56 UTC
24 points
4 comments3 min readLW link

Allow­ing Ex­ploita­bil­ity in Game Theory

Liam Goddard17 May 2020 23:19 UTC
2 points
4 comments2 min readLW link

How can In­ter­pretabil­ity help Align­ment?

23 May 2020 16:16 UTC
33 points
3 comments9 min readLW link

A Prob­lem With Patternism

Bob Jacobs19 May 2020 20:16 UTC
5 points
52 comments1 min readLW link

Goal-di­rect­ed­ness is be­hav­ioral, not structural

adamShimi8 Jun 2020 23:05 UTC
6 points
12 comments3 min readLW link

Learn­ing Deep Learn­ing: Join­ing data sci­ence re­search as a mathematician

magfrump19 Oct 2017 19:14 UTC
10 points
4 comments3 min readLW link

Will AI un­dergo dis­con­tin­u­ous progress?

SDM21 Feb 2020 22:16 UTC
25 points
20 comments20 min readLW link

The Value Defi­ni­tion Problem

SDM18 Nov 2019 19:56 UTC
14 points
6 comments11 min readLW link

Life at Three Tails of the Bell Curve

lsusr27 Jun 2020 8:49 UTC
56 points
8 comments4 min readLW link

How do take­off speeds af­fect the prob­a­bil­ity of bad out­comes from AGI?

KR29 Jun 2020 22:06 UTC
15 points
2 comments8 min readLW link

AI Benefits Post 2: How AI Benefits Differs from AI Align­ment & AI for Good

Cullen_OKeefe29 Jun 2020 17:00 UTC
8 points
7 comments2 min readLW link

Null-box­ing New­comb’s Problem

Yitz13 Jul 2020 16:32 UTC
27 points
10 comments4 min readLW link

No non­sense ver­sion of the “racial al­gorithm bias”

Yuxi_Liu13 Jul 2019 15:39 UTC
104 points
20 comments2 min readLW link2 nominations

Ed­u­ca­tion 2.0 — A brand new ed­u­ca­tion system

aryan15 Jul 2020 10:09 UTC
−8 points
3 comments6 min readLW link

What it means to optimise

Neel Nanda25 Jul 2020 9:40 UTC
3 points
0 comments8 min readLW link
(www.neelnanda.io)

[Question] Where are peo­ple think­ing and talk­ing about global co­or­di­na­tion for AI safety?

Wei_Dai22 May 2019 6:24 UTC
95 points
22 comments1 min readLW link

The strat­egy-steal­ing assumption

paulfchristiano16 Sep 2019 15:23 UTC
68 points
46 comments12 min readLW link2 nominations3 reviews

Con­ver­sa­tion with Paul Christiano

abergal11 Sep 2019 23:20 UTC
44 points
6 comments30 min readLW link
(aiimpacts.org)

Tran­scrip­tion of Eliezer’s Jan­uary 2010 video Q&A

curiousepic14 Nov 2011 17:02 UTC
109 points
9 comments56 min readLW link

Re­sources for AI Align­ment Cartography

Gyrodiot4 Apr 2020 14:20 UTC
40 points
8 comments9 min readLW link

Thoughts on Ben Garfinkel’s “How sure are we about this AI stuff?”

capybaralet6 Feb 2019 19:09 UTC
25 points
17 comments1 min readLW link

An­nounce­ment: AI al­ign­ment prize round 2 win­ners and next round

cousin_it16 Apr 2018 3:08 UTC
64 points
29 comments2 min readLW link

An­nounce­ment: AI al­ign­ment prize round 3 win­ners and next round

cousin_it15 Jul 2018 7:40 UTC
93 points
7 comments1 min readLW link

Se­cu­rity Mind­set and the Lo­gis­tic Suc­cess Curve

Eliezer Yudkowsky26 Nov 2017 15:58 UTC
67 points
45 comments20 min readLW link

Ar­bital scrape

emmab6 Jun 2019 23:11 UTC
89 points
23 comments1 min readLW link

The Strangest Thing An AI Could Tell You

Eliezer Yudkowsky15 Jul 2009 2:27 UTC
101 points
601 comments2 min readLW link

Self-fulfilling correlations

PhilGoetz26 Aug 2010 21:07 UTC
139 points
50 comments3 min readLW link

Zoom In: An In­tro­duc­tion to Circuits

evhub10 Mar 2020 19:36 UTC
81 points
11 comments2 min readLW link
(distill.pub)

Should ethi­cists be in­side or out­side a pro­fes­sion?

Eliezer Yudkowsky12 Dec 2018 1:40 UTC
77 points
6 comments9 min readLW link

Im­plicit extortion

paulfchristiano13 Apr 2018 16:33 UTC
29 points
16 comments6 min readLW link
(ai-alignment.com)

Bayesian Judo

Eliezer Yudkowsky31 Jul 2007 5:53 UTC
74 points
107 comments1 min readLW link

An­nounc­ing Align­men­tFo­rum.org Beta

Raemon10 Jul 2018 20:19 UTC
67 points
35 comments2 min readLW link

An­nounc­ing the Align­ment Newsletter

rohinmshah9 Apr 2018 21:16 UTC
29 points
3 comments1 min readLW link

He­len Toner on China, CSET, and AI

Rob Bensinger21 Apr 2019 4:10 UTC
67 points
3 comments7 min readLW link
(rationallyspeakingpodcast.org)

A sim­ple en­vi­ron­ment for show­ing mesa misalignment

Matthew Barnett26 Sep 2019 4:44 UTC
63 points
9 comments2 min readLW link

The E-Coli Test for AI Alignment

johnswentworth16 Dec 2018 8:10 UTC
66 points
24 comments1 min readLW link

Re­cent Progress in