RSS

AI

Core TagLast edit: Jan 23, 2025, 12:13 PM by Dakara

Artificial Intelligence is the study of creating intelligence in algorithms. AI Alignment is the task of ensuring [powerful] AI system are aligned with human values and interests. The central concern is that a powerful enough AI, if not designed and implemented with sufficient understanding, would optimize something unintended by its creators and pose an existential threat to the future of humanity. This is known as the AI alignment problem.

Common terms in this space are superintelligence, AI Alignment, AI Safety, Friendly AI, Transformative AI, human-level-intelligence, AI Governance, and Beneficial AI. This entry and the associated tag roughly encompass all of these topics: anything part of the broad cluster of understanding AI and its future impacts on our civilization deserves this tag.

AI Alignment

There are narrow conceptions of alignment, where you’re trying to get it to do something like cure Alzheimer’s disease without destroying the rest of the world. And there’s much more ambitious notions of alignment, where you’re trying to get it to do the right thing and achieve a happy intergalactic civilization.

But both the narrow and the ambitious alignment have in common that you’re trying to have the AI do that thing rather than making a lot of paperclips.

See also General Intelligence.

Basic Alignment Theory

AIXI
Coherent Extrapolated Volition
Complexity of Value
Corrigibility
Deceptive Alignment
Decision Theory
Embedded Agency
Goodhart’s Law
Goal-Directedness
Gradient Hacking
Infra-Bayesianism
Inner Alignment
Instrumental Convergence
Intelligence Explosion
Logical Induction
Logical Uncertainty
Mesa-Optimization
Multipolar Scenarios
Myopia
Newcomb’s Problem
Optimization
Orthogonality Thesis
Outer Alignment
Paperclip Maximizer
Power Seeking (AI)
Recursive Self-Improvement
Simulator Theory
Sharp Left Turn
Solomonoff Induction
Superintelligence
Symbol Grounding
Transformative AI
Utility Functions
Whole Brain Emulation

Engineering Alignment

Agent Foundations
AI-assisted Alignment
AI Boxing (Containment)
Debate (AI safety technique)
Eliciting Latent Knowledge
Factored Cognition
Humans Consulting HCH
Impact Measures
Inverse Reinforcement Learning
Iterated Amplification
Mild Optimization
Oracle AI
Reward Functions
RLHF
Shard Theory
Tool AI
Interpretability (ML & AI)
Value Learning

Organizations

AI Safety Camp
Alignment Research Center
Anthropic
Apart Research
AXRP
CHAI (UC Berkeley)
Conjecture (org)
DeepMind
FHI (Oxford)
Future of Life Institute
MATS Program
MIRI
OpenAI
Ought
Redwood Research

Strategy

AI Alignment Fieldbuilding
AI Governance
AI Persuasion
AI Risk
AI Risk Concrete Stories
AI Risk Skepticism
AI Safety Public Materials
AI Services (CAIS)
AI Success Models
AI Takeoff
AI Timelines
Computing Overhang
Regulation and AI Risk
Restrain AI Development

Other

AI Alignment Intro Materials
AI Capabilities
Compute
GPT
Language Models
Machine Learning
Narrow AI
Neuromorphic AI
Prompt Engineering
Reinforcement Learning
Research Agendas

An overview of 11 pro­pos­als for build­ing safe ad­vanced AI

evhubMay 29, 2020, 8:38 PM
194 points

92 votes

Overall karma indicates overall quality.

36 comments38 min readLW link2 reviews

There’s No Fire Alarm for Ar­tifi­cial Gen­eral Intelligence

Eliezer YudkowskyOct 13, 2017, 9:38 PM
124 points

63 votes

Overall karma indicates overall quality.

71 comments25 min readLW link

Su­per­in­tel­li­gence FAQ

Scott AlexanderSep 20, 2016, 7:00 PM
92 points

58 votes

Overall karma indicates overall quality.

16 comments27 min readLW link

Risks from Learned Op­ti­miza­tion: Introduction

May 31, 2019, 11:44 PM
166 points

62 votes

Overall karma indicates overall quality.

42 comments12 min readLW link3 reviews

Embed­ded Agents

Oct 29, 2018, 7:53 PM
198 points

95 votes

Overall karma indicates overall quality.

41 comments1 min readLW link2 reviews

What failure looks like

paulfchristianoMar 17, 2019, 8:18 PM
319 points

179 votes

Overall karma indicates overall quality.

49 comments8 min readLW link2 reviews

The Rocket Align­ment Problem

Eliezer YudkowskyOct 4, 2018, 12:38 AM
198 points

108 votes

Overall karma indicates overall quality.

42 comments15 min readLW link2 reviews

Challenges to Chris­ti­ano’s ca­pa­bil­ity am­plifi­ca­tion proposal

Eliezer YudkowskyMay 19, 2018, 6:18 PM
115 points

67 votes

Overall karma indicates overall quality.

54 comments23 min readLW link1 review

Embed­ded Agency (full-text ver­sion)

Nov 15, 2018, 7:49 PM
143 points

65 votes

Overall karma indicates overall quality.

15 comments54 min readLW link

A space of pro­pos­als for build­ing safe ad­vanced AI

Richard_NgoJul 10, 2020, 4:58 PM
55 points

25 votes

Overall karma indicates overall quality.

4 comments4 min readLW link

Biol­ogy-In­spired AGI Timelines: The Trick That Never Works

Eliezer YudkowskyDec 1, 2021, 10:35 PM
181 points

100 votes

Overall karma indicates overall quality.

143 comments65 min readLW link

PreDCA: vanessa kosoy’s al­ign­ment protocol

Tamsin LeakeAug 20, 2022, 10:03 AM
46 points

24 votes

Overall karma indicates overall quality.

8 comments7 min readLW link
(carado.moe)

larger lan­guage mod­els may dis­ap­point you [or, an eter­nally un­finished draft]

nostalgebraistNov 26, 2021, 11:08 PM
237 points

99 votes

Overall karma indicates overall quality.

29 comments31 min readLW link1 review

Deep­mind’s Go­pher—more pow­er­ful than GPT-3

hathDec 8, 2021, 5:06 PM
86 points

38 votes

Overall karma indicates overall quality.

27 comments1 min readLW link
(deepmind.com)

Pro­ject pro­posal: Test­ing the IBP defi­ni­tion of agent

Aug 9, 2022, 1:09 AM
21 points

11 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

Good­hart Taxonomy

Scott GarrabrantDec 30, 2017, 4:38 PM
180 points

100 votes

Overall karma indicates overall quality.

33 comments10 min readLW link

AI Align­ment 2018-19 Review

Rohin ShahJan 28, 2020, 2:19 AM
125 points

43 votes

Overall karma indicates overall quality.

6 comments35 min readLW link

Some AI re­search ar­eas and their rele­vance to ex­is­ten­tial safety

Andrew_CritchNov 19, 2020, 3:18 AM
199 points

81 votes

Overall karma indicates overall quality.

40 comments50 min readLW link2 reviews

Mo­ravec’s Para­dox Comes From The Availa­bil­ity Heuristic

james.lucassenOct 20, 2021, 6:23 AM
32 points

17 votes

Overall karma indicates overall quality.

2 comments2 min readLW link
(jlucassen.com)

In­fer­ence cost limits the im­pact of ever larger models

SoerenMindOct 23, 2021, 10:51 AM
36 points

13 votes

Overall karma indicates overall quality.

28 comments2 min readLW link

[Linkpost] Chi­nese gov­ern­ment’s guidelines on AI

RomanSDec 10, 2021, 9:10 PM
61 points

29 votes

Overall karma indicates overall quality.

14 comments1 min readLW link

That Alien Message

Eliezer YudkowskyMay 22, 2008, 5:55 AM
304 points

236 votes

Overall karma indicates overall quality.

173 comments10 min readLW link

Episte­molog­i­cal Fram­ing for AI Align­ment Research

adamShimiMar 8, 2021, 10:05 PM
53 points

19 votes

Overall karma indicates overall quality.

7 comments9 min readLW link

Effi­cien­tZero: hu­man ALE sam­ple-effi­ciency w/​MuZero+self-supervised

gwernNov 2, 2021, 2:32 AM
134 points

59 votes

Overall karma indicates overall quality.

52 comments1 min readLW link
(arxiv.org)

Dis­cus­sion with Eliezer Yud­kowsky on AGI interventions

Nov 11, 2021, 3:01 AM
325 points

153 votes

Overall karma indicates overall quality.

257 comments34 min readLW link

Shul­man and Yud­kowsky on AI progress

Dec 3, 2021, 8:05 PM
90 points

28 votes

Overall karma indicates overall quality.

16 comments20 min readLW link

Fu­ture ML Sys­tems Will Be Qual­i­ta­tively Different

jsteinhardtJan 11, 2022, 7:50 PM
113 points

58 votes

Overall karma indicates overall quality.

10 comments5 min readLW link
(bounded-regret.ghost.io)

[Linkpost] Tro­janNet: Embed­ding Hid­den Tro­jan Horse Models in Neu­ral Networks

Gunnar_ZarnckeFeb 11, 2022, 1:17 AM
13 points

4 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Briefly think­ing through some analogs of debate

Eli TyreSep 11, 2022, 12:02 PM
20 points

9 votes

Overall karma indicates overall quality.

3 comments4 min readLW link

Ro­bust­ness to Scale

Scott GarrabrantFeb 21, 2018, 10:55 PM
109 points

56 votes

Overall karma indicates overall quality.

22 comments2 min readLW link1 review

Chris Olah’s views on AGI safety

evhubNov 1, 2019, 8:13 PM
197 points

82 votes

Overall karma indicates overall quality.

38 comments12 min readLW link2 reviews

[AN #96]: Buck and I dis­cuss/​ar­gue about AI Alignment

Rohin ShahApr 22, 2020, 5:20 PM
17 points

7 votes

Overall karma indicates overall quality.

4 comments10 min readLW link
(mailchi.mp)

Matt Botv­inick on the spon­ta­neous emer­gence of learn­ing algorithms

Adam SchollAug 12, 2020, 7:47 AM
147 points

69 votes

Overall karma indicates overall quality.

87 comments5 min readLW link

A de­scrip­tive, not pre­scrip­tive, overview of cur­rent AI Align­ment Research

Jun 6, 2022, 9:59 PM
126 points

72 votes

Overall karma indicates overall quality.

21 comments7 min readLW link

Co­her­ence ar­gu­ments do not en­tail goal-di­rected behavior

Rohin ShahDec 3, 2018, 3:26 AM
101 points

45 votes

Overall karma indicates overall quality.

69 comments7 min readLW link3 reviews

Align­ment By Default

johnswentworthAug 12, 2020, 6:54 PM
153 points

57 votes

Overall karma indicates overall quality.

92 comments11 min readLW link2 reviews

Book re­view: “A Thou­sand Brains” by Jeff Hawkins

Steven ByrnesMar 4, 2021, 5:10 AM
110 points

47 votes

Overall karma indicates overall quality.

18 comments19 min readLW link

Model­ling Trans­for­ma­tive AI Risks (MTAIR) Pro­ject: Introduction

Aug 16, 2021, 7:12 AM
89 points

39 votes

Overall karma indicates overall quality.

0 comments9 min readLW link

In­fra-Bayesian phys­i­cal­ism: a for­mal the­ory of nat­u­ral­ized induction

Vanessa KosoyNov 30, 2021, 10:25 PM
98 points

33 votes

Overall karma indicates overall quality.

20 comments42 min readLW link1 review

What an ac­tu­ally pes­simistic con­tain­ment strat­egy looks like

lcApr 5, 2022, 12:19 AM
554 points

279 votes

Overall karma indicates overall quality.

136 comments6 min readLW link

Why I think strong gen­eral AI is com­ing soon

porbySep 28, 2022, 5:40 AM
269 points

171 votes

Overall karma indicates overall quality.

126 comments34 min readLW link

AlphaGo Zero and the Foom Debate

Eliezer YudkowskyOct 21, 2017, 2:18 AM
89 points

49 votes

Overall karma indicates overall quality.

17 comments3 min readLW link

Trade­off be­tween de­sir­able prop­er­ties for baseline choices in im­pact measures

VikaJul 4, 2020, 11:56 AM
37 points

11 votes

Overall karma indicates overall quality.

24 comments5 min readLW link

Com­pe­ti­tion: Am­plify Ro­hin’s Pre­dic­tion on AGI re­searchers & Safety Concerns

stuhlmuellerJul 21, 2020, 8:06 PM
80 points

26 votes

Overall karma indicates overall quality.

40 comments3 min readLW link

the scal­ing “in­con­sis­tency”: openAI’s new insight

nostalgebraistNov 7, 2020, 7:40 AM
146 points

65 votes

Overall karma indicates overall quality.

14 comments9 min readLW link
(nostalgebraist.tumblr.com)

2019 Re­view Rewrite: Seek­ing Power is Often Ro­bustly In­stru­men­tal in MDPs

TurnTroutDec 23, 2020, 5:16 PM
35 points

9 votes

Overall karma indicates overall quality.

0 comments4 min readLW link
(www.lesswrong.com)

Boot­strapped Alignment

Gordon Seidoh WorleyFeb 27, 2021, 3:46 PM
19 points

10 votes

Overall karma indicates overall quality.

12 comments2 min readLW link

Mul­ti­modal Neu­rons in Ar­tifi­cial Neu­ral Networks

Kaj_SotalaMar 5, 2021, 9:01 AM
57 points

18 votes

Overall karma indicates overall quality.

2 comments2 min readLW link
(distill.pub)

Re­view of “Fun with +12 OOMs of Com­pute”

Mar 28, 2021, 2:55 PM
60 points

22 votes

Overall karma indicates overall quality.

20 comments8 min readLW link

Draft re­port on ex­is­ten­tial risk from power-seek­ing AI

Joe CarlsmithApr 28, 2021, 9:41 PM
80 points

26 votes

Overall karma indicates overall quality.

23 comments1 min readLW link

Rogue AGI Em­bod­ies Valuable In­tel­lec­tual Property

Jun 3, 2021, 8:37 PM
70 points

31 votes

Overall karma indicates overall quality.

9 comments3 min readLW link

Deep­Mind: Gen­er­ally ca­pa­ble agents emerge from open-ended play

Daniel KokotajloJul 27, 2021, 2:19 PM
247 points

122 votes

Overall karma indicates overall quality.

53 comments2 min readLW link
(deepmind.com)

Analo­gies and Gen­eral Pri­ors on Intelligence

Aug 20, 2021, 9:03 PM
57 points

22 votes

Overall karma indicates overall quality.

12 comments14 min readLW link

We’re already in AI takeoff

ValentineMar 8, 2022, 11:09 PM
120 points

161 votes

Overall karma indicates overall quality.

115 comments7 min readLW link

It Looks Like You’re Try­ing To Take Over The World

gwernMar 9, 2022, 4:35 PM
386 points

193 votes

Overall karma indicates overall quality.

125 comments1 min readLW link
(www.gwern.net)

In­ter­pretabil­ity’s Align­ment-Solv­ing Po­ten­tial: Anal­y­sis of 7 Scenarios

Evan R. MurphyMay 12, 2022, 8:01 PM
45 points

23 votes

Overall karma indicates overall quality.

0 comments59 min readLW link

Why all the fuss about re­cur­sive self-im­prove­ment?

So8resJun 12, 2022, 8:53 PM
150 points

78 votes

Overall karma indicates overall quality.

62 comments7 min readLW link

AI Safety bounty for prac­ti­cal ho­mo­mor­phic encryption

acylhalideAug 19, 2022, 12:27 PM
29 points

16 votes

Overall karma indicates overall quality.

9 comments4 min readLW link

Paper: Dis­cov­er­ing novel al­gorithms with AlphaTen­sor [Deep­mind]

LawrenceCOct 5, 2022, 4:20 PM
80 points

44 votes

Overall karma indicates overall quality.

18 comments1 min readLW link
(www.deepmind.com)

The Teacup Test

lsusrOct 8, 2022, 4:25 AM
71 points

49 votes

Overall karma indicates overall quality.

28 comments2 min readLW link

Dis­con­tin­u­ous progress in his­tory: an update

KatjaGraceApr 14, 2020, 12:00 AM
179 points

68 votes

Overall karma indicates overall quality.

25 comments31 min readLW link1 review
(aiimpacts.org)

Repli­ca­tion Dy­nam­ics Bridge to RL in Ther­mo­dy­namic Limit

Past AccountMay 18, 2020, 1:02 AM
6 points

3 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

The ground of optimization

Alex FlintJun 20, 2020, 12:38 AM
218 points

82 votes

Overall karma indicates overall quality.

74 comments27 min readLW link1 review

Model­ling Con­tin­u­ous Progress

Sammy MartinJun 23, 2020, 6:06 PM
29 points

11 votes

Overall karma indicates overall quality.

3 comments7 min readLW link

Refram­ing Su­per­in­tel­li­gence: Com­pre­hen­sive AI Ser­vices as Gen­eral Intelligence

Rohin ShahJan 8, 2019, 7:12 AM
118 points

54 votes

Overall karma indicates overall quality.

75 comments5 min readLW link2 reviews
(www.fhi.ox.ac.uk)

Clas­sifi­ca­tion of AI al­ign­ment re­search: de­con­fu­sion, “good enough” non-su­per­in­tel­li­gent AI al­ign­ment, su­per­in­tel­li­gent AI alignment

philip_bJul 14, 2020, 10:48 PM
35 points

13 votes

Overall karma indicates overall quality.

25 comments3 min readLW link

Col­lec­tion of GPT-3 results

Kaj_SotalaJul 18, 2020, 8:04 PM
89 points

41 votes

Overall karma indicates overall quality.

24 comments1 min readLW link
(twitter.com)

Hiring en­g­ineers and re­searchers to help al­ign GPT-3

paulfchristianoOct 1, 2020, 6:54 PM
206 points

77 votes

Overall karma indicates overall quality.

14 comments3 min readLW link

The date of AI Takeover is not the day the AI takes over

Daniel KokotajloOct 22, 2020, 10:41 AM
116 points

66 votes

Overall karma indicates overall quality.

32 comments2 min readLW link1 review

[Question] What could one do with truly un­limited com­pu­ta­tional power?

YitzNov 11, 2020, 10:03 AM
30 points

10 votes

Overall karma indicates overall quality.

22 comments2 min readLW link

AGI Predictions

Nov 21, 2020, 3:46 AM
110 points

42 votes

Overall karma indicates overall quality.

36 comments4 min readLW link

[Question] What are the best prece­dents for in­dus­tries failing to in­vest in valuable AI re­search?

Daniel KokotajloDec 14, 2020, 11:57 PM
18 points

7 votes

Overall karma indicates overall quality.

17 comments1 min readLW link

Ex­trap­o­lat­ing GPT-N performance

Lukas FinnvedenDec 18, 2020, 9:41 PM
103 points

33 votes

Overall karma indicates overall quality.

31 comments25 min readLW link1 review

De­bate up­date: Obfus­cated ar­gu­ments problem

Beth BarnesDec 23, 2020, 3:24 AM
125 points

38 votes

Overall karma indicates overall quality.

21 comments16 min readLW link

Liter­a­ture Re­view on Goal-Directedness

Jan 18, 2021, 11:15 AM
69 points

26 votes

Overall karma indicates overall quality.

21 comments31 min readLW link

[Question] How will OpenAI + GitHub’s Copi­lot af­fect pro­gram­ming?

smountjoyJun 29, 2021, 4:42 PM
55 points

27 votes

Overall karma indicates overall quality.

23 comments1 min readLW link

Model­ing Risks From Learned Optimization

Ben CottierOct 12, 2021, 8:54 PM
44 points

11 votes

Overall karma indicates overall quality.

0 comments12 min readLW link

Truth­ful AI: Devel­op­ing and gov­ern­ing AI that does not lie

Oct 18, 2021, 6:37 PM
81 points

27 votes

Overall karma indicates overall quality.

9 comments10 min readLW link

Effi­cien­tZero: How It Works

1a3ornNov 26, 2021, 3:17 PM
273 points

126 votes

Overall karma indicates overall quality.

42 comments29 min readLW link

The­o­ret­i­cal Neu­ro­science For Align­ment Theory

Cameron BergDec 7, 2021, 9:50 PM
62 points

33 votes

Overall karma indicates overall quality.

19 comments23 min readLW link

Magna Alta Doctrina

jacob_cannellDec 11, 2021, 9:54 PM
37 points

18 votes

Overall karma indicates overall quality.

7 comments28 min readLW link

DL to­wards the un­al­igned Re­cur­sive Self-Op­ti­miza­tion attractor

jacob_cannellDec 18, 2021, 2:15 AM
32 points

13 votes

Overall karma indicates overall quality.

22 comments4 min readLW link

Reg­u­lariza­tion Causes Mo­du­lar­ity Causes Generalization

dkirmaniJan 1, 2022, 11:34 PM
49 points

22 votes

Overall karma indicates overall quality.

7 comments3 min readLW link

Is Gen­eral In­tel­li­gence “Com­pact”?

DragonGodJul 4, 2022, 1:27 PM
21 points

11 votes

Overall karma indicates overall quality.

6 comments22 min readLW link

The Tree of Life: Stan­ford AI Align­ment The­ory of Change

Gabe MJul 2, 2022, 6:36 PM
22 points

10 votes

Overall karma indicates overall quality.

0 comments14 min readLW link

Shard The­ory: An Overview

David UdellAug 11, 2022, 5:44 AM
135 points

46 votes

Overall karma indicates overall quality.

34 comments10 min readLW link

How evolu­tion suc­ceeds and fails at value alignment

OcracokeAug 21, 2022, 7:14 AM
21 points

11 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

An Un­trol­lable Math­e­mat­i­cian Illustrated

abramdemskiMar 20, 2018, 12:00 AM
155 points

114 votes

Overall karma indicates overall quality.

38 comments1 min readLW link1 review

Con­di­tions for Mesa-Optimization

Jun 1, 2019, 8:52 PM
75 points

30 votes

Overall karma indicates overall quality.

48 comments12 min readLW link

Thoughts on Hu­man Models

Feb 21, 2019, 9:10 AM
124 points

45 votes

Overall karma indicates overall quality.

32 comments10 min readLW link1 review

In­ner al­ign­ment in the brain

Steven ByrnesApr 22, 2020, 1:14 PM
76 points

27 votes

Overall karma indicates overall quality.

16 comments16 min readLW link

Prob­lem re­lax­ation as a tactic

TurnTroutApr 22, 2020, 11:44 PM
113 points

48 votes

Overall karma indicates overall quality.

8 comments7 min readLW link

[Question] How should po­ten­tial AI al­ign­ment re­searchers gauge whether the field is right for them?

TurnTroutMay 6, 2020, 12:24 PM
20 points

8 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Speci­fi­ca­tion gam­ing: the flip side of AI ingenuity

May 6, 2020, 11:51 PM
46 points

18 votes

Overall karma indicates overall quality.

8 comments6 min readLW link

Les­sons from Isaac: Pit­falls of Reason

adamShimiMay 8, 2020, 8:44 PM
9 points

4 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

Cor­rigi­bil­ity as out­side view

TurnTroutMay 8, 2020, 9:56 PM
36 points

15 votes

Overall karma indicates overall quality.

11 comments4 min readLW link

[Question] How to choose a PhD with AI Safety in mind

kwiat.devMay 15, 2020, 10:19 PM
9 points

3 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Re­ward func­tions and up­dat­ing as­sump­tions can hide a mul­ti­tude of sins

Stuart_ArmstrongMay 18, 2020, 3:18 PM
16 points

5 votes

Overall karma indicates overall quality.

2 comments9 min readLW link

Pos­si­ble take­aways from the coro­n­avirus pan­demic for slow AI takeoff

VikaMay 31, 2020, 5:51 PM
135 points

61 votes

Overall karma indicates overall quality.

36 comments3 min readLW link1 review

Fo­cus: you are al­lowed to be bad at ac­com­plish­ing your goals

adamShimiJun 3, 2020, 9:04 PM
19 points

10 votes

Overall karma indicates overall quality.

17 comments3 min readLW link

Re­ply to Paul Chris­ti­ano on Inac­cessible Information

Alex FlintJun 5, 2020, 9:10 AM
77 points

34 votes

Overall karma indicates overall quality.

15 comments6 min readLW link

Our take on CHAI’s re­search agenda in un­der 1500 words

Alex FlintJun 17, 2020, 12:24 PM
112 points

46 votes

Overall karma indicates overall quality.

19 comments5 min readLW link

[Question] Ques­tion on GPT-3 Ex­cel Demo

Zhitao HouJun 22, 2020, 8:31 PM
0 points

2 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Dy­namic in­con­sis­tency of the in­ac­tion and ini­tial state baseline

Stuart_ArmstrongJul 7, 2020, 12:02 PM
30 points

7 votes

Overall karma indicates overall quality.

8 comments2 min readLW link

Cortés, Pizarro, and Afonso as Prece­dents for Takeover

Daniel KokotajloMar 1, 2020, 3:49 AM
145 points

76 votes

Overall karma indicates overall quality.

75 comments11 min readLW link1 review

[Question] What prob­lem would you like to see Re­in­force­ment Learn­ing ap­plied to?

Julian SchrittwieserJul 8, 2020, 2:40 AM
43 points

11 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

My cur­rent frame­work for think­ing about AGI timelines

zhukeepaMar 30, 2020, 1:23 AM
107 points

52 votes

Overall karma indicates overall quality.

5 comments3 min readLW link

[Question] To what ex­tent is GPT-3 ca­pa­ble of rea­son­ing?

TurnTroutJul 20, 2020, 5:10 PM
70 points

41 votes

Overall karma indicates overall quality.

74 comments16 min readLW link

Repli­cat­ing the repli­ca­tion crisis with GPT-3?

skybrianJul 22, 2020, 9:20 PM
29 points

18 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

Can you get AGI from a Trans­former?

Steven ByrnesJul 23, 2020, 3:27 PM
114 points

50 votes

Overall karma indicates overall quality.

39 comments12 min readLW link

Writ­ing with GPT-3

Jacob FalkovichJul 24, 2020, 3:22 PM
42 points

19 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

In­ner Align­ment: Ex­plain like I’m 12 Edition

Rafael HarthAug 1, 2020, 3:24 PM
175 points

65 votes

Overall karma indicates overall quality.

46 comments13 min readLW link2 reviews

Devel­op­men­tal Stages of GPTs

orthonormalJul 26, 2020, 10:03 PM
140 points

64 votes

Overall karma indicates overall quality.

74 comments7 min readLW link1 review

Gen­er­al­iz­ing the Power-Seek­ing Theorems

TurnTroutJul 27, 2020, 12:28 AM
40 points

12 votes

Overall karma indicates overall quality.

6 comments4 min readLW link

Are we in an AI over­hang?

Andy JonesJul 27, 2020, 12:48 PM
255 points

137 votes

Overall karma indicates overall quality.

109 comments4 min readLW link

[Question] What spe­cific dan­gers arise when ask­ing GPT-N to write an Align­ment Fo­rum post?

Matthew BarnettJul 28, 2020, 2:56 AM
44 points

19 votes

Overall karma indicates overall quality.

14 comments1 min readLW link

[Question] Prob­a­bil­ity that other ar­chi­tec­tures will scale as well as Trans­form­ers?

Daniel KokotajloJul 28, 2020, 7:36 PM
22 points

8 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

What a 20-year-lead in mil­i­tary tech might look like

Daniel KokotajloJul 29, 2020, 8:10 PM
68 points

32 votes

Overall karma indicates overall quality.

44 comments16 min readLW link

[Question] What if memes are com­mon in highly ca­pa­ble minds?

Daniel KokotajloJul 30, 2020, 8:45 PM
36 points

17 votes

Overall karma indicates overall quality.

15 comments2 min readLW link

Three men­tal images from think­ing about AGI de­bate & corrigibility

Steven ByrnesAug 3, 2020, 2:29 PM
55 points

18 votes

Overall karma indicates overall quality.

35 comments4 min readLW link

Solv­ing Key Align­ment Prob­lems Group

Logan RiggsAug 3, 2020, 7:30 PM
19 points

7 votes

Overall karma indicates overall quality.

7 comments2 min readLW link

How eas­ily can we sep­a­rate a friendly AI in de­sign space from one which would bring about a hy­per­ex­is­ten­tial catas­tro­phe?

AnirandisSep 10, 2020, 12:40 AM
19 points

11 votes

Overall karma indicates overall quality.

20 comments2 min readLW link

My com­pu­ta­tional frame­work for the brain

Steven ByrnesSep 14, 2020, 2:19 PM
144 points

66 votes

Overall karma indicates overall quality.

26 comments13 min readLW link1 review

[Question] Where is hu­man level on text pre­dic­tion? (GPTs task)

Daniel KokotajloSep 20, 2020, 9:00 AM
27 points

14 votes

Overall karma indicates overall quality.

19 comments1 min readLW link

Needed: AI in­fo­haz­ard policy

Vanessa KosoySep 21, 2020, 3:26 PM
61 points

22 votes

Overall karma indicates overall quality.

17 comments2 min readLW link

The Col­lid­ing Ex­po­nen­tials of AI

VermillionOct 14, 2020, 11:31 PM
27 points

14 votes

Overall karma indicates overall quality.

16 comments5 min readLW link

“Lit­tle glimpses of em­pa­thy” as the foun­da­tion for so­cial emotions

Steven ByrnesOct 22, 2020, 11:02 AM
31 points

12 votes

Overall karma indicates overall quality.

1 comment5 min readLW link

In­tro­duc­tion to Carte­sian Frames

Scott GarrabrantOct 22, 2020, 1:00 PM
145 points

49 votes

Overall karma indicates overall quality.

29 comments22 min readLW link1 review

“Carte­sian Frames” Talk #2 this Sun­day at 2pm (PT)

Rob BensingerOct 28, 2020, 1:59 PM
30 points

4 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Does SGD Pro­duce De­cep­tive Align­ment?

Mark XuNov 6, 2020, 11:48 PM
85 points

28 votes

Overall karma indicates overall quality.

6 comments16 min readLW link

[Question] How can I bet on short timelines?

Daniel KokotajloNov 7, 2020, 12:44 PM
43 points

26 votes

Overall karma indicates overall quality.

16 comments2 min readLW link

Non-Ob­struc­tion: A Sim­ple Con­cept Mo­ti­vat­ing Corrigibility

TurnTroutNov 21, 2020, 7:35 PM
67 points

20 votes

Overall karma indicates overall quality.

19 comments19 min readLW link

Carte­sian Frames Definitions

Rob BensingerNov 8, 2020, 12:44 PM
25 points

8 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Com­mu­ni­ca­tion Prior as Align­ment Strategy

johnswentworthNov 12, 2020, 10:06 PM
40 points

11 votes

Overall karma indicates overall quality.

8 comments6 min readLW link

How Rood­man’s GWP model trans­lates to TAI timelines

Daniel KokotajloNov 16, 2020, 2:05 PM
22 points

10 votes

Overall karma indicates overall quality.

5 comments3 min readLW link

Normativity

abramdemskiNov 18, 2020, 4:52 PM
46 points

17 votes

Overall karma indicates overall quality.

11 comments9 min readLW link

In­ner Align­ment in Salt-Starved Rats

Steven ByrnesNov 19, 2020, 2:40 AM
136 points

53 votes

Overall karma indicates overall quality.

39 comments11 min readLW link2 reviews

Con­tin­u­ing the take­offs debate

Richard_NgoNov 23, 2020, 3:58 PM
67 points

17 votes

Overall karma indicates overall quality.

13 comments9 min readLW link

The next AI win­ter will be due to en­ergy costs

hippkeNov 24, 2020, 4:53 PM
57 points

29 votes

Overall karma indicates overall quality.

7 comments2 min readLW link

Re­cur­sive Quan­tiliz­ers II

abramdemskiDec 2, 2020, 3:26 PM
30 points

12 votes

Overall karma indicates overall quality.

15 comments13 min readLW link

Su­per­vised learn­ing in the brain, part 4: com­pres­sion /​ filtering

Steven ByrnesDec 5, 2020, 5:06 PM
12 points

5 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Con­ser­vatism in neo­cor­tex-like AGIs

Steven ByrnesDec 8, 2020, 4:37 PM
22 points

11 votes

Overall karma indicates overall quality.

5 comments8 min readLW link

Avoid­ing Side Effects in Com­plex Environments

Dec 12, 2020, 12:34 AM
62 points

21 votes

Overall karma indicates overall quality.

9 comments2 min readLW link
(avoiding-side-effects.github.io)

The Power of Annealing

meanderingmooseDec 14, 2020, 11:02 AM
25 points

13 votes

Overall karma indicates overall quality.

6 comments5 min readLW link

[link] The AI Gir­lfriend Se­duc­ing China’s Lonely Men

Kaj_SotalaDec 14, 2020, 8:18 PM
34 points

13 votes

Overall karma indicates overall quality.

11 comments1 min readLW link
(www.sixthtone.com)

Oper­a­tional­iz­ing com­pat­i­bil­ity with strat­egy-stealing

evhubDec 24, 2020, 10:36 PM
41 points

11 votes

Overall karma indicates overall quality.

6 comments4 min readLW link

De­fus­ing AGI Danger

Mark XuDec 24, 2020, 10:58 PM
48 points

18 votes

Overall karma indicates overall quality.

9 comments9 min readLW link

Multi-di­men­sional re­wards for AGI in­ter­pretabil­ity and control

Steven ByrnesJan 4, 2021, 3:08 AM
19 points

6 votes

Overall karma indicates overall quality.

8 comments10 min readLW link

DALL-E by OpenAI

Daniel KokotajloJan 5, 2021, 8:05 PM
97 points

43 votes

Overall karma indicates overall quality.

22 comments1 min readLW link

Re­view of ‘But ex­actly how com­plex and frag­ile?’

TurnTroutJan 6, 2021, 6:39 PM
55 points

18 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

The Case for a Jour­nal of AI Alignment

adamShimiJan 9, 2021, 6:13 PM
45 points

20 votes

Overall karma indicates overall quality.

32 comments4 min readLW link

Trans­parency and AGI safety

jylin04Jan 11, 2021, 6:51 PM
52 points

14 votes

Overall karma indicates overall quality.

12 comments30 min readLW link

Birds, Brains, Planes, and AI: Against Ap­peals to the Com­plex­ity/​Mys­te­ri­ous­ness/​Effi­ciency of the Brain

Daniel KokotajloJan 18, 2021, 12:08 PM
184 points

81 votes

Overall karma indicates overall quality.

85 comments14 min readLW link1 review

In­fra-Bayesi­anism Unwrapped

adamShimiJan 20, 2021, 1:35 PM
41 points

16 votes

Overall karma indicates overall quality.

0 comments24 min readLW link

Op­ti­mal play in hu­man-judged De­bate usu­ally won’t an­swer your question

Joe CollmanJan 27, 2021, 7:34 AM
33 points

9 votes

Overall karma indicates overall quality.

12 comments12 min readLW link

Creat­ing AGI Safety Interlocks

Koen.HoltmanFeb 5, 2021, 12:01 PM
7 points

3 votes

Overall karma indicates overall quality.

4 comments8 min readLW link

Timeline of AI safety

riceissaFeb 7, 2021, 10:29 PM
63 points

26 votes

Overall karma indicates overall quality.

6 comments2 min readLW link
(timelines.issarice.com)

Tour­ne­sol, YouTube and AI Risk

adamShimiFeb 12, 2021, 6:56 PM
36 points

19 votes

Overall karma indicates overall quality.

13 comments4 min readLW link

In­ter­net En­cy­clo­pe­dia of Philos­o­phy on Ethics of Ar­tifi­cial Intelligence

Kaj_SotalaFeb 20, 2021, 1:54 PM
15 points

6 votes

Overall karma indicates overall quality.

1 comment4 min readLW link
(iep.utm.edu)

Be­hav­ioral Suffi­cient Statis­tics for Goal-Directedness

adamShimiMar 11, 2021, 3:01 PM
21 points

5 votes

Overall karma indicates overall quality.

12 comments9 min readLW link

A sim­ple way to make GPT-3 fol­low instructions

Quintin PopeMar 8, 2021, 2:57 AM
11 points

6 votes

Overall karma indicates overall quality.

5 comments4 min readLW link

Towards a Mechanis­tic Un­der­stand­ing of Goal-Directedness

Mark XuMar 9, 2021, 8:17 PM
45 points

16 votes

Overall karma indicates overall quality.

1 comment5 min readLW link

AXRP Epi­sode 5 - In­fra-Bayesi­anism with Vanessa Kosoy

DanielFilanMar 10, 2021, 4:30 AM
33 points

13 votes

Overall karma indicates overall quality.

12 comments35 min readLW link

Com­ments on “The Sin­gu­lar­ity is Nowhere Near”

Steven ByrnesMar 16, 2021, 11:59 PM
50 points

23 votes

Overall karma indicates overall quality.

6 comments8 min readLW link

Is RL in­volved in sen­sory pro­cess­ing?

Steven ByrnesMar 18, 2021, 1:57 PM
21 points

10 votes

Overall karma indicates overall quality.

21 comments5 min readLW link

Against evolu­tion as an anal­ogy for how hu­mans will cre­ate AGI

Steven ByrnesMar 23, 2021, 12:29 PM
44 points

21 votes

Overall karma indicates overall quality.

25 comments25 min readLW link

My AGI Threat Model: Misal­igned Model-Based RL Agent

Steven ByrnesMar 25, 2021, 1:45 PM
66 points

28 votes

Overall karma indicates overall quality.

40 comments16 min readLW link

Co­her­ence ar­gu­ments im­ply a force for goal-di­rected behavior

KatjaGraceMar 26, 2021, 4:10 PM
88 points

31 votes

Overall karma indicates overall quality.

27 comments14 min readLW link
(aiimpacts.org)

Trans­parency Trichotomy

Mark XuMar 28, 2021, 8:26 PM
25 points

10 votes

Overall karma indicates overall quality.

2 comments7 min readLW link

Hard­ware is already ready for the sin­gu­lar­ity. Al­gorithm knowl­edge is the only bar­rier.

Andrew VlahosMar 30, 2021, 10:48 PM
16 points

9 votes

Overall karma indicates overall quality.

3 comments3 min readLW link

Ben Go­ertzel’s “Kinds of Minds”

JoshuaFoxApr 11, 2021, 12:41 PM
12 points

4 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Up­dat­ing the Lot­tery Ticket Hypothesis

johnswentworthApr 18, 2021, 9:45 PM
73 points

28 votes

Overall karma indicates overall quality.

41 comments2 min readLW link

Three rea­sons to ex­pect long AI timelines

Matthew BarnettApr 22, 2021, 6:44 PM
68 points

37 votes

Overall karma indicates overall quality.

29 comments11 min readLW link
(matthewbarnett.substack.com)

Be­ware over-use of the agent model

Alex FlintApr 25, 2021, 10:19 PM
28 points

10 votes

Overall karma indicates overall quality.

10 comments5 min readLW link1 review

Agents Over Carte­sian World Models

Apr 27, 2021, 2:06 AM
62 points

18 votes

Overall karma indicates overall quality.

3 comments27 min readLW link

Less Real­is­tic Tales of Doom

Mark XuMay 6, 2021, 11:01 PM
110 points

57 votes

Overall karma indicates overall quality.

13 comments4 min readLW link

Challenge: know ev­ery­thing that the best go bot knows about go

DanielFilanMay 11, 2021, 5:10 AM
48 points

26 votes

Overall karma indicates overall quality.

93 comments2 min readLW link
(danielfilan.com)

For­mal In­ner Align­ment, Prospectus

abramdemskiMay 12, 2021, 7:57 PM
91 points

26 votes

Overall karma indicates overall quality.

57 comments16 min readLW link

Agency in Con­way’s Game of Life

Alex FlintMay 13, 2021, 1:07 AM
97 points

56 votes

Overall karma indicates overall quality.

81 comments9 min readLW link1 review

Knowl­edge Neu­rons in Pre­trained Transformers

evhubMay 17, 2021, 10:54 PM
98 points

41 votes

Overall karma indicates overall quality.

7 comments2 min readLW link
(arxiv.org)

De­cou­pling de­liber­a­tion from competition

paulfchristianoMay 25, 2021, 6:50 PM
72 points

24 votes

Overall karma indicates overall quality.

16 comments9 min readLW link
(ai-alignment.com)

Power dy­nam­ics as a blind spot or blurry spot in our col­lec­tive world-mod­el­ing, es­pe­cially around AI

Andrew_CritchJun 1, 2021, 6:45 PM
176 points

77 votes

Overall karma indicates overall quality.

26 comments6 min readLW link

Game-the­o­retic Align­ment in terms of At­tain­able Utility

Jun 8, 2021, 12:36 PM
20 points

7 votes

Overall karma indicates overall quality.

2 comments9 min readLW link

Beijing Academy of Ar­tifi­cial In­tel­li­gence an­nounces 1,75 trillion pa­ram­e­ters model, Wu Dao 2.0

OzyrusJun 3, 2021, 12:07 PM
23 points

10 votes

Overall karma indicates overall quality.

9 comments1 min readLW link
(www.engadget.com)

An In­tu­itive Guide to Garrabrant Induction

Mark XuJun 3, 2021, 10:21 PM
115 points

32 votes

Overall karma indicates overall quality.

18 comments24 min readLW link

Con­ser­va­tive Agency with Mul­ti­ple Stakeholders

TurnTroutJun 8, 2021, 12:30 AM
31 points

9 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

Sup­ple­ment to “Big pic­ture of pha­sic dopamine”

Steven ByrnesJun 8, 2021, 1:08 PM
13 points

6 votes

Overall karma indicates overall quality.

2 comments9 min readLW link

Look­ing Deeper at Deconfusion

adamShimiJun 13, 2021, 9:29 PM
57 points

29 votes

Overall karma indicates overall quality.

13 comments15 min readLW link

[Question] Open prob­lem: how can we quan­tify player al­ign­ment in 2x2 nor­mal-form games?

TurnTroutJun 16, 2021, 2:09 AM
23 points

9 votes

Overall karma indicates overall quality.

59 comments1 min readLW link

Re­ward Is Not Enough

Steven ByrnesJun 16, 2021, 1:52 PM
105 points

44 votes

Overall karma indicates overall quality.

18 comments10 min readLW link

En­vi­ron­men­tal Struc­ture Can Cause In­stru­men­tal Convergence

TurnTroutJun 22, 2021, 10:26 PM
71 points

25 votes

Overall karma indicates overall quality.

44 comments16 min readLW link
(arxiv.org)

AXRP Epi­sode 9 - Finite Fac­tored Sets with Scott Garrabrant

DanielFilanJun 24, 2021, 10:10 PM
56 points

11 votes

Overall karma indicates overall quality.

2 comments58 min readLW link

Mus­ings on gen­eral sys­tems alignment

Alex FlintJun 30, 2021, 6:16 PM
31 points

12 votes

Overall karma indicates overall quality.

11 comments3 min readLW link

Thoughts on safety in pre­dic­tive learning

Steven ByrnesJun 30, 2021, 7:17 PM
18 points

8 votes

Overall karma indicates overall quality.

17 comments19 min readLW link

The More Power At Stake, The Stronger In­stru­men­tal Con­ver­gence Gets For Op­ti­mal Policies

TurnTroutJul 11, 2021, 5:36 PM
45 points

13 votes

Overall karma indicates overall quality.

7 comments6 min readLW link

A world in which the al­ign­ment prob­lem seems lower-stakes

TurnTroutJul 8, 2021, 2:31 AM
19 points

8 votes

Overall karma indicates overall quality.

17 comments2 min readLW link

Frac­tional progress es­ti­mates for AI timelines and im­plied re­source requirements

Jul 15, 2021, 6:43 PM
55 points

26 votes

Overall karma indicates overall quality.

6 comments7 min readLW link

Ex­per­i­men­ta­tion with AI-gen­er­ated images (VQGAN+CLIP) | So­larpunk air­ships flee­ing a dragon

Kaj_SotalaJul 15, 2021, 11:00 AM
44 points

22 votes

Overall karma indicates overall quality.

4 comments2 min readLW link
(kajsotala.fi)

Seek­ing Power is Con­ver­gently In­stru­men­tal in a Broad Class of Environments

TurnTroutAug 8, 2021, 2:02 AM
41 points

13 votes

Overall karma indicates overall quality.

15 comments8 min readLW link

LCDT, A My­opic De­ci­sion Theory

Aug 3, 2021, 10:41 PM
50 points

20 votes

Overall karma indicates overall quality.

51 comments15 min readLW link

When Most VNM-Co­her­ent Prefer­ence Order­ings Have Con­ver­gent In­stru­men­tal Incentives

TurnTroutAug 9, 2021, 5:22 PM
52 points

9 votes

Overall karma indicates overall quality.

4 comments5 min readLW link

Two AI-risk-re­lated game de­sign ideas

Daniel KokotajloAug 5, 2021, 1:36 PM
47 points

25 votes

Overall karma indicates overall quality.

9 comments5 min readLW link

Re­search agenda update

Steven ByrnesAug 6, 2021, 7:24 PM
54 points

18 votes

Overall karma indicates overall quality.

40 comments7 min readLW link

What 2026 looks like

Daniel KokotajloAug 6, 2021, 4:14 PM
371 points

210 votes

Overall karma indicates overall quality.

109 comments16 min readLW link1 review

Satis­ficers Tend To Seek Power: In­stru­men­tal Con­ver­gence Via Retargetability

TurnTroutNov 18, 2021, 1:54 AM
69 points

24 votes

Overall karma indicates overall quality.

8 comments17 min readLW link
(www.overleaf.com)

Dopamine-su­per­vised learn­ing in mam­mals & fruit flies

Steven ByrnesAug 10, 2021, 4:13 PM
16 points

4 votes

Overall karma indicates overall quality.

6 comments8 min readLW link

Free course re­view — Reli­able and In­ter­pretable Ar­tifi­cial In­tel­li­gence (ETH Zurich)

Jan CzechowskiAug 10, 2021, 4:36 PM
7 points

3 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

Tech­ni­cal Pre­dic­tions Re­lated to AI Safety

lsusrAug 13, 2021, 12:29 AM
28 points

14 votes

Overall karma indicates overall quality.

12 comments8 min readLW link

Provide feed­back on Open Philan­thropy’s AI al­ign­ment RFP

Aug 20, 2021, 7:52 PM
56 points

17 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

AI Safety Papers: An App for the TAI Safety Database

ozziegooenAug 21, 2021, 2:02 AM
74 points

28 votes

Overall karma indicates overall quality.

13 comments2 min readLW link

Ran­dal Koene on brain un­der­stand­ing be­fore whole brain emulation

Steven ByrnesAug 23, 2021, 8:59 PM
36 points

11 votes

Overall karma indicates overall quality.

12 comments3 min readLW link

MIRI/​OP ex­change about de­ci­sion theory

Rob BensingerAug 25, 2021, 10:44 PM
47 points

21 votes

Overall karma indicates overall quality.

7 comments10 min readLW link

Good­hart Ethology

Charlie SteinerSep 17, 2021, 5:31 PM
18 points

5 votes

Overall karma indicates overall quality.

4 comments14 min readLW link

[Question] What are good al­ign­ment con­fer­ence pa­pers?

adamShimiAug 28, 2021, 1:35 PM
12 points

4 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Brain-Com­puter In­ter­faces and AI Alignment

niplavAug 28, 2021, 7:48 PM
31 points

16 votes

Overall karma indicates overall quality.

6 comments11 min readLW link

Su­per­in­tel­li­gent In­tro­spec­tion: A Counter-ar­gu­ment to the Orthog­o­nal­ity Thesis

DirectedEvolutionAug 29, 2021, 4:53 AM
3 points

11 votes

Overall karma indicates overall quality.

18 comments4 min readLW link

Align­ment Re­search = Con­cep­tual Align­ment Re­search + Ap­plied Align­ment Research

adamShimiAug 30, 2021, 9:13 PM
37 points

17 votes

Overall karma indicates overall quality.

14 comments5 min readLW link

AXRP Epi­sode 11 - At­tain­able Utility and Power with Alex Turner

DanielFilanSep 25, 2021, 9:10 PM
19 points

4 votes

Overall karma indicates overall quality.

5 comments52 min readLW link

Is progress in ML-as­sisted the­o­rem-prov­ing benefi­cial?

mako yassSep 28, 2021, 1:54 AM
10 points

4 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Take­off Speeds and Discontinuities

Sep 30, 2021, 1:50 PM
62 points

18 votes

Overall karma indicates overall quality.

1 comment15 min readLW link

My take on Vanessa Kosoy’s take on AGI safety

Steven ByrnesSep 30, 2021, 12:23 PM
84 points

30 votes

Overall karma indicates overall quality.

10 comments31 min readLW link

[Pre­dic­tion] We are in an Al­gorith­mic Overhang

lsusrSep 29, 2021, 11:40 PM
31 points

18 votes

Overall karma indicates overall quality.

14 comments1 min readLW link

In­ter­view with Skynet

lsusrSep 30, 2021, 2:20 AM
49 points

27 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

AI learns be­trayal and how to avoid it

Stuart_ArmstrongSep 30, 2021, 9:39 AM
30 points

6 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

The Dark Side of Cog­ni­tion Hypothesis

Cameron BergOct 3, 2021, 8:10 PM
19 points

14 votes

Overall karma indicates overall quality.

1 comment16 min readLW link

[Question] How to think about and deal with OpenAI

Rafael HarthOct 9, 2021, 1:10 PM
107 points

57 votes

Overall karma indicates overall quality.

71 comments1 min readLW link

NVIDIA and Microsoft re­leases 530B pa­ram­e­ter trans­former model, Me­ga­tron-Tur­ing NLG

OzyrusOct 11, 2021, 3:28 PM
51 points

26 votes

Overall karma indicates overall quality.

36 comments1 min readLW link
(developer.nvidia.com)

Post­mod­ern Warfare

lsusrOct 25, 2021, 9:02 AM
61 points

37 votes

Overall karma indicates overall quality.

25 comments2 min readLW link

A very crude de­cep­tion eval is already passed

Beth BarnesOct 29, 2021, 5:57 PM
105 points

39 votes

Overall karma indicates overall quality.

8 comments2 min readLW link

Study Guide

johnswentworthNov 6, 2021, 1:23 AM
220 points

132 votes

Overall karma indicates overall quality.

41 comments16 min readLW link

Re: At­tempted Gears Anal­y­sis of AGI In­ter­ven­tion Dis­cus­sion With Eliezer

lsusrNov 15, 2021, 10:02 AM
20 points

13 votes

Overall karma indicates overall quality.

8 comments15 min readLW link

Ngo and Yud­kowsky on al­ign­ment difficulty

Nov 15, 2021, 8:31 PM
235 points

92 votes

Overall karma indicates overall quality.

143 comments99 min readLW link

Cor­rigi­bil­ity Can Be VNM-Incoherent

TurnTroutNov 20, 2021, 12:30 AM
64 points

22 votes

Overall karma indicates overall quality.

24 comments7 min readLW link

Visi­ble Thoughts Pro­ject and Bounty Announcement

So8resNov 30, 2021, 12:19 AM
245 points

90 votes

Overall karma indicates overall quality.

104 comments13 min readLW link

In­ter­pret­ing Yud­kowsky on Deep vs Shal­low Knowledge

adamShimiDec 5, 2021, 5:32 PM
100 points

44 votes

Overall karma indicates overall quality.

32 comments24 min readLW link

Are there al­ter­na­tive to solv­ing value trans­fer and ex­trap­o­la­tion?

Stuart_ArmstrongDec 6, 2021, 6:53 PM
19 points

7 votes

Overall karma indicates overall quality.

7 comments5 min readLW link

Con­sid­er­a­tions on in­ter­ac­tion be­tween AI and ex­pected value of the fu­ture

Beth BarnesDec 7, 2021, 2:46 AM
64 points

20 votes

Overall karma indicates overall quality.

28 comments4 min readLW link

Some thoughts on why ad­ver­sar­ial train­ing might be useful

Beth BarnesDec 8, 2021, 1:28 AM
9 points

4 votes

Overall karma indicates overall quality.

5 comments3 min readLW link

The Plan

johnswentworthDec 10, 2021, 11:41 PM
235 points

109 votes

Overall karma indicates overall quality.

77 comments14 min readLW link

Moore’s Law, AI, and the pace of progress

VeedracDec 11, 2021, 3:02 AM
120 points

51 votes

Overall karma indicates overall quality.

39 comments24 min readLW link

Sum­mary of the Acausal At­tack Is­sue for AIXI

DiffractorDec 13, 2021, 8:16 AM
14 points

9 votes

Overall karma indicates overall quality.

6 comments4 min readLW link

Con­se­quen­tial­ism & corrigibility

Steven ByrnesDec 14, 2021, 1:23 PM
60 points

21 votes

Overall karma indicates overall quality.

27 comments7 min readLW link

Should we rely on the speed prior for safety?

Marc CarauleanuDec 14, 2021, 8:45 PM
14 points

8 votes

Overall karma indicates overall quality.

6 comments5 min readLW link

The Case for Rad­i­cal Op­ti­mism about Interpretability

Quintin PopeDec 16, 2021, 11:38 PM
57 points

27 votes

Overall karma indicates overall quality.

16 comments8 min readLW link1 review

Re­searcher in­cen­tives cause smoother progress on bench­marks

ryan_greenblattDec 21, 2021, 4:13 AM
20 points

9 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Self-Or­ganised Neu­ral Net­works: A sim­ple, nat­u­ral and effi­cient way to intelligence

D𝜋Jan 1, 2022, 11:24 PM
41 points

28 votes

Overall karma indicates overall quality.

51 comments44 min readLW link

Prizes for ELK proposals

paulfchristianoJan 3, 2022, 8:23 PM
141 points

72 votes

Overall karma indicates overall quality.

156 comments7 min readLW link

D𝜋′s Spik­ing Network

lsusrJan 4, 2022, 4:08 AM
50 points

28 votes

Overall karma indicates overall quality.

37 comments4 min readLW link

More Is Differ­ent for AI

jsteinhardtJan 4, 2022, 7:30 PM
137 points

82 votes

Overall karma indicates overall quality.

22 comments3 min readLW link
(bounded-regret.ghost.io)

In­stru­men­tal Con­ver­gence For Real­is­tic Agent Objectives

TurnTroutJan 22, 2022, 12:41 AM
35 points

10 votes

Overall karma indicates overall quality.

9 comments9 min readLW link

What’s Up With Con­fus­ingly Per­va­sive Con­se­quen­tial­ism?

RaemonJan 20, 2022, 7:22 PM
169 points

80 votes

Overall karma indicates overall quality.

88 comments4 min readLW link

[In­tro to brain-like-AGI safety] 1. What’s the prob­lem & Why work on it now?

Steven ByrnesJan 26, 2022, 3:23 PM
119 points

53 votes

Overall karma indicates overall quality.

19 comments23 min readLW link

Ar­gu­ments about Highly Reli­able Agent De­signs as a Use­ful Path to Ar­tifi­cial In­tel­li­gence Safety

Jan 27, 2022, 1:13 PM
27 points

7 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(arxiv.org)

Com­pet­i­tive pro­gram­ming with AlphaCode

AlgonFeb 2, 2022, 4:49 PM
58 points

32 votes

Overall karma indicates overall quality.

37 comments15 min readLW link
(deepmind.com)

Thoughts on AGI safety from the top

jylin04Feb 2, 2022, 8:06 PM
35 points

14 votes

Overall karma indicates overall quality.

3 comments32 min readLW link

Paradigm-build­ing from first prin­ci­ples: Effec­tive al­tru­ism, AGI, and alignment

Cameron BergFeb 8, 2022, 4:12 PM
24 points

21 votes

Overall karma indicates overall quality.

5 comments14 min readLW link

[In­tro to brain-like-AGI safety] 3. Two sub­sys­tems: Learn­ing & Steering

Steven ByrnesFeb 9, 2022, 1:09 PM
59 points

26 votes

Overall karma indicates overall quality.

3 comments24 min readLW link

[In­tro to brain-like-AGI safety] 4. The “short-term pre­dic­tor”

Steven ByrnesFeb 16, 2022, 1:12 PM
51 points

21 votes

Overall karma indicates overall quality.

11 comments13 min readLW link

ELK Pro­posal: Think­ing Via A Hu­man Imitator

TurnTroutFeb 22, 2022, 1:52 AM
28 points

12 votes

Overall karma indicates overall quality.

6 comments11 min readLW link

Why I’m co-found­ing Aligned AI

Stuart_ArmstrongFeb 17, 2022, 7:55 PM
93 points

64 votes

Overall karma indicates overall quality.

54 comments3 min readLW link

Im­pli­ca­tions of au­to­mated on­tol­ogy identification

Feb 18, 2022, 3:30 AM
67 points

20 votes

Overall karma indicates overall quality.

29 comments23 min readLW link

Align­ment re­search exercises

Richard_NgoFeb 21, 2022, 8:24 PM
146 points

63 votes

Overall karma indicates overall quality.

17 comments8 min readLW link

[In­tro to brain-like-AGI safety] 5. The “long-term pre­dic­tor”, and TD learning

Steven ByrnesFeb 23, 2022, 2:44 PM
41 points

15 votes

Overall karma indicates overall quality.

25 comments21 min readLW link

How do new mod­els from OpenAI, Deep­Mind and An­thropic perform on Truth­fulQA?

Owain_EvansFeb 26, 2022, 12:46 PM
42 points

23 votes

Overall karma indicates overall quality.

3 comments11 min readLW link

Es­ti­mat­ing Brain-Equiv­a­lent Com­pute from Image Recog­ni­tion Al­gorithms

Gunnar_ZarnckeFeb 27, 2022, 2:45 AM
14 points

4 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

[Link] Aligned AI AMA

Stuart_ArmstrongMar 1, 2022, 12:01 PM
18 points

6 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

[In­tro to brain-like-AGI safety] 6. Big pic­ture of mo­ti­va­tion, de­ci­sion-mak­ing, and RL

Steven ByrnesMar 2, 2022, 3:26 PM
41 points

15 votes

Overall karma indicates overall quality.

13 comments16 min readLW link

[Question] Would (my­opic) gen­eral pub­lic good pro­duc­ers sig­nifi­cantly ac­cel­er­ate the de­vel­op­ment of AGI?

mako yassMar 2, 2022, 11:47 PM
25 points

11 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

[In­tro to brain-like-AGI safety] 7. From hard­coded drives to fore­sighted plans: A worked example

Steven ByrnesMar 9, 2022, 2:28 PM
56 points

20 votes

Overall karma indicates overall quality.

0 comments9 min readLW link

[In­tro to brain-like-AGI safety] 9. Take­aways from neuro 2/​2: On AGI motivation

Steven ByrnesMar 23, 2022, 12:48 PM
31 points

11 votes

Overall karma indicates overall quality.

6 comments23 min readLW link

Hu­mans pre­tend­ing to be robots pre­tend­ing to be human

Richard_KennawayMar 28, 2022, 3:13 PM
27 points

17 votes

Overall karma indicates overall quality.

15 comments1 min readLW link

[In­tro to brain-like-AGI safety] 10. The al­ign­ment problem

Steven ByrnesMar 30, 2022, 1:24 PM
34 points

12 votes

Overall karma indicates overall quality.

4 comments21 min readLW link

AXRP Epi­sode 13 - First Prin­ci­ples of AGI Safety with Richard Ngo

DanielFilanMar 31, 2022, 5:20 AM
24 points

9 votes

Overall karma indicates overall quality.

1 comment48 min readLW link

Un­con­trol­lable Su­per-Pow­er­ful Explosives

Sammy MartinApr 2, 2022, 8:13 PM
53 points

22 votes

Overall karma indicates overall quality.

12 comments5 min readLW link

The case for Do­ing Some­thing Else (if Align­ment is doomed)

Rafael HarthApr 5, 2022, 5:52 PM
81 points

35 votes

Overall karma indicates overall quality.

14 comments2 min readLW link

[In­tro to brain-like-AGI safety] 11. Safety ≠ al­ign­ment (but they’re close!)

Steven ByrnesApr 6, 2022, 1:39 PM
25 points

10 votes

Overall karma indicates overall quality.

1 comment10 min readLW link

Strate­gic Con­sid­er­a­tions Re­gard­ing Autis­tic/​Literal AI

Chris_LeongApr 6, 2022, 2:57 PM
−1 points

8 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

DALL·E 2 by OpenAI

P.Apr 6, 2022, 2:17 PM
44 points

24 votes

Overall karma indicates overall quality.

51 comments1 min readLW link
(openai.com)

How to train your trans­former

p.b.Apr 7, 2022, 9:34 AM
6 points

3 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

AMA Con­jec­ture, A New Align­ment Startup

adamShimiApr 9, 2022, 9:43 AM
46 points

20 votes

Overall karma indicates overall quality.

42 comments1 min readLW link

Worse than an un­al­igned AGI

ShmiApr 10, 2022, 3:35 AM
−1 points

9 votes

Overall karma indicates overall quality.

12 comments1 min readLW link

[Question] Did OpenAI let GPT out of the box?

ChristianKlApr 16, 2022, 2:56 PM
4 points

11 votes

Overall karma indicates overall quality.

12 comments1 min readLW link

In­stru­men­tal Con­ver­gence To Offer Hope?

michael_mjdApr 22, 2022, 1:56 AM
12 points

7 votes

Overall karma indicates overall quality.

7 comments3 min readLW link

[In­tro to brain-like-AGI safety] 13. Sym­bol ground­ing & hu­man so­cial instincts

Steven ByrnesApr 27, 2022, 1:30 PM
54 points

22 votes

Overall karma indicates overall quality.

13 comments14 min readLW link

[In­tro to brain-like-AGI safety] 14. Con­trol­led AGI

Steven ByrnesMay 11, 2022, 1:17 PM
26 points

11 votes

Overall karma indicates overall quality.

25 comments18 min readLW link

[Question] What’s keep­ing con­cerned ca­pa­bil­ities gain re­searchers from leav­ing the field?

sovranMay 12, 2022, 12:16 PM
19 points

14 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

[Question] What’s keep­ing con­cerned ca­pa­bil­ities gain re­searchers from leav­ing the field?

sovranMay 12, 2022, 12:16 PM
19 points

14 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Read­ing the ethi­cists: A re­view of ar­ti­cles on AI in the jour­nal Science and Eng­ineer­ing Ethics

Charlie SteinerMay 18, 2022, 8:52 PM
50 points

23 votes

Overall karma indicates overall quality.

8 comments14 min readLW link

Con­fused why a “ca­pa­bil­ities re­search is good for al­ign­ment progress” po­si­tion isn’t dis­cussed more

Kaj_SotalaJun 2, 2022, 9:41 PM
132 points

60 votes

Overall karma indicates overall quality.

26 comments4 min readLW link

I’m try­ing out “as­ter­oid mind­set”

Alex_AltairJun 3, 2022, 1:35 PM
85 points

44 votes

Overall karma indicates overall quality.

5 comments4 min readLW link

An­nounc­ing the Align­ment of Com­plex Sys­tems Re­search Group

Jun 4, 2022, 4:10 AM
79 points

41 votes

Overall karma indicates overall quality.

18 comments5 min readLW link

AGI Ruin: A List of Lethalities

Eliezer YudkowskyJun 5, 2022, 10:05 PM
725 points

398 votes

Overall karma indicates overall quality.

653 comments30 min readLW link

Yes, AI re­search will be sub­stan­tially cur­tailed if a lab causes a ma­jor disaster

lcJun 14, 2022, 10:17 PM
96 points

69 votes

Overall karma indicates overall quality.

35 comments2 min readLW link

Lamda is not an LLM

KevinJun 19, 2022, 11:13 AM
7 points

17 votes

Overall karma indicates overall quality.

10 comments1 min readLW link
(www.wired.com)

Google’s new text-to-image model—Parti, a demon­stra­tion of scal­ing benefits

KaydenJun 22, 2022, 8:00 PM
32 points

14 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

[Link] OpenAI: Learn­ing to Play Minecraft with Video PreTrain­ing (VPT)

Aryeh EnglanderJun 23, 2022, 4:29 PM
53 points

26 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

An­nounc­ing Epoch: A re­search or­ga­ni­za­tion in­ves­ti­gat­ing the road to Trans­for­ma­tive AI

Jun 27, 2022, 1:55 PM
95 points

43 votes

Overall karma indicates overall quality.

2 comments2 min readLW link
(epochai.org)

Paper: Fore­cast­ing world events with neu­ral nets

Jul 1, 2022, 7:40 PM
39 points

13 votes

Overall karma indicates overall quality.

3 comments4 min readLW link

Naive Hy­pothe­ses on AI Alignment

Shoshannah TekofskyJul 2, 2022, 7:03 PM
89 points

47 votes

Overall karma indicates overall quality.

29 comments5 min readLW link

Hu­mans provide an un­tapped wealth of ev­i­dence about alignment

Jul 14, 2022, 2:31 AM
175 points

74 votes

Overall karma indicates overall quality.

92 comments10 min readLW link

Ex­am­ples of AI In­creas­ing AI Progress

TW123Jul 17, 2022, 8:06 PM
104 points

67 votes

Overall karma indicates overall quality.

14 comments1 min readLW link

Fore­cast­ing ML Bench­marks in 2023

jsteinhardtJul 18, 2022, 2:50 AM
36 points

13 votes

Overall karma indicates overall quality.

19 comments12 min readLW link
(bounded-regret.ghost.io)

Ro­bust­ness to Scal­ing Down: More Im­por­tant Than I Thought

adamShimiJul 23, 2022, 11:40 AM
37 points

16 votes

Overall karma indicates overall quality.

5 comments3 min readLW link

Com­par­ing Four Ap­proaches to In­ner Alignment

Lucas TeixeiraJul 29, 2022, 9:06 PM
33 points

10 votes

Overall karma indicates overall quality.

1 comment9 min readLW link

Where are the red lines for AI?

Karl von WendtAug 5, 2022, 9:34 AM
23 points

13 votes

Overall karma indicates overall quality.

8 comments6 min readLW link

Jack Clark on the re­al­ities of AI policy

Kaj_SotalaAug 7, 2022, 8:44 AM
66 points

40 votes

Overall karma indicates overall quality.

3 comments3 min readLW link
(threadreaderapp.com)

GD’s Im­plicit Bias on Separable Data

Xander DaviesOct 17, 2022, 4:13 AM
23 points

9 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

AI Trans­parency: Why it’s crit­i­cal and how to ob­tain it.

Zohar JacksonAug 14, 2022, 10:31 AM
6 points

3 votes

Overall karma indicates overall quality.

1 comment5 min readLW link

Brain-like AGI pro­ject “ain­telope”

Gunnar_ZarnckeAug 14, 2022, 4:33 PM
48 points

23 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

A Mechanis­tic In­ter­pretabil­ity Anal­y­sis of Grokking

Aug 15, 2022, 2:41 AM
338 points

149 votes

Overall karma indicates overall quality.

39 comments42 min readLW link
(colab.research.google.com)

What if we ap­proach AI safety like a tech­ni­cal en­g­ineer­ing safety problem

zeshenAug 20, 2022, 10:29 AM
30 points

14 votes

Overall karma indicates overall quality.

5 comments7 min readLW link

AI art isn’t “about to shake things up”. It’s already here.

Davis_KingsleyAug 22, 2022, 11:17 AM
65 points

46 votes

Overall karma indicates overall quality.

19 comments3 min readLW link

Some con­cep­tual al­ign­ment re­search projects

Richard_NgoAug 25, 2022, 10:51 PM
168 points

82 votes

Overall karma indicates overall quality.

14 comments3 min readLW link

Lev­el­ling Up in AI Safety Re­search Engineering

Gabe MSep 2, 2022, 4:59 AM
40 points

25 votes

Overall karma indicates overall quality.

7 comments17 min readLW link

The shard the­ory of hu­man values

Sep 4, 2022, 4:28 AM
202 points

94 votes

Overall karma indicates overall quality.

57 comments24 min readLW link

Quintin’s al­ign­ment pa­pers roundup—week 1

Quintin PopeSep 10, 2022, 6:39 AM
119 points

55 votes

Overall karma indicates overall quality.

5 comments9 min readLW link

LOVE in a sim­box is all you need

jacob_cannellSep 28, 2022, 6:25 PM
59 points

31 votes

Overall karma indicates overall quality.

69 comments44 min readLW link

A shot at the di­a­mond-al­ign­ment problem

TurnTroutOct 6, 2022, 6:29 PM
77 points

37 votes

Overall karma indicates overall quality.

53 comments15 min readLW link

More ex­am­ples of goal misgeneralization

Oct 7, 2022, 2:38 PM
51 points

31 votes

Overall karma indicates overall quality.

8 comments2 min readLW link
(deepmindsafetyresearch.medium.com)

[Cross­post] AlphaTen­sor, Taste, and the Scal­a­bil­ity of AI

jamierumbelowOct 9, 2022, 7:42 PM
16 points

9 votes

Overall karma indicates overall quality.

4 comments1 min readLW link
(jamieonsoftware.com)

QAPR 4: In­duc­tive biases

Quintin PopeOct 10, 2022, 10:08 PM
63 points

20 votes

Overall karma indicates overall quality.

2 comments18 min readLW link

In­finite Pos­si­bil­ity Space and the Shut­down Problem

magfrumpOct 18, 2022, 5:37 AM
6 points

2 votes

Overall karma indicates overall quality.

0 comments2 min readLW link
(www.magfrump.net)

Cruxes in Katja Grace’s Counterarguments

azsantoskOct 16, 2022, 8:44 AM
16 points

8 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

Deep­Mind on Strat­ego, an im­perfect in­for­ma­tion game

sanxiynOct 24, 2022, 5:57 AM
15 points

4 votes

Overall karma indicates overall quality.

9 comments1 min readLW link
(arxiv.org)

An­nounc­ing: What Fu­ture World? - Grow­ing the AI Gover­nance Community

DavidCorfieldNov 2, 2022, 1:24 AM
1 point

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Poster Ses­sion on AI Safety

Neil CrawfordNov 12, 2022, 3:50 AM
7 points

5 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

AI will change the world, but won’t take it over by play­ing “3-di­men­sional chess”.

Nov 22, 2022, 6:57 PM
103 points

59 votes

Overall karma indicates overall quality.

86 comments24 min readLW link

A challenge for AGI or­ga­ni­za­tions, and a challenge for readers

Dec 1, 2022, 11:11 PM
265 points

127 votes

Overall karma indicates overall quality.

30 comments2 min readLW link

Towards Hodge-podge Alignment

Cleo NardoDec 19, 2022, 8:12 PM
65 points

37 votes

Overall karma indicates overall quality.

26 comments9 min readLW link

[AN #94]: AI al­ign­ment as trans­la­tion be­tween hu­mans and machines

Rohin ShahApr 8, 2020, 5:10 PM
11 points

3 votes

Overall karma indicates overall quality.

0 comments7 min readLW link
(mailchi.mp)

[Question] What are the rel­a­tive speeds of AI ca­pa­bil­ities and AI safety?

NunoSempereApr 24, 2020, 6:21 PM
8 points

4 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Seek­ing Power is Often Con­ver­gently In­stru­men­tal in MDPs

Dec 5, 2019, 2:33 AM
153 points

53 votes

Overall karma indicates overall quality.

38 comments16 min readLW link2 reviews
(arxiv.org)

“Don’t even think about hell”

emmabMay 2, 2020, 8:06 AM
6 points

3 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

[Question] AI Box­ing for Hard­ware-bound agents (aka the China al­ign­ment prob­lem)

Logan ZoellnerMay 8, 2020, 3:50 PM
11 points

6 votes

Overall karma indicates overall quality.

27 comments10 min readLW link

Point­ing to a Flower

johnswentworthMay 18, 2020, 6:54 PM
59 points

22 votes

Overall karma indicates overall quality.

18 comments9 min readLW link

Learn­ing and ma­nipu­lat­ing learning

Stuart_ArmstrongMay 19, 2020, 1:02 PM
39 points

12 votes

Overall karma indicates overall quality.

5 comments10 min readLW link

[Question] Why aren’t we test­ing gen­eral in­tel­li­gence dis­tri­bu­tion?

B JacobsMay 26, 2020, 4:07 PM
25 points

13 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

OpenAI an­nounces GPT-3

gwernMay 29, 2020, 1:49 AM
67 points

33 votes

Overall karma indicates overall quality.

23 comments1 min readLW link
(arxiv.org)

GPT-3: a dis­ap­point­ing paper

nostalgebraistMay 29, 2020, 7:06 PM
65 points

63 votes

Overall karma indicates overall quality.

44 comments8 min readLW link1 review

In­tro­duc­tion to Ex­is­ten­tial Risks from Ar­tifi­cial In­tel­li­gence, for an EA audience

JoshuaFoxJun 2, 2020, 8:30 AM
10 points

3 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Prepar­ing for “The Talk” with AI projects

Daniel KokotajloJun 13, 2020, 11:01 PM
64 points

25 votes

Overall karma indicates overall quality.

16 comments3 min readLW link

[Question] What are the high-level ap­proaches to AI al­ign­ment?

Gordon Seidoh WorleyJun 16, 2020, 5:10 PM
12 points

4 votes

Overall karma indicates overall quality.

13 comments1 min readLW link

Re­sults of $1,000 Or­a­cle con­test!

Stuart_ArmstrongJun 17, 2020, 5:44 PM
58 points

23 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

[Question] Like­li­hood of hy­per­ex­is­ten­tial catas­tro­phe from a bug?

AnirandisJun 18, 2020, 4:23 PM
13 points

10 votes

Overall karma indicates overall quality.

27 comments1 min readLW link

AI Benefits Post 1: In­tro­duc­ing “AI Benefits”

CullenJun 22, 2020, 4:59 PM
11 points

7 votes

Overall karma indicates overall quality.

3 comments3 min readLW link

Goals and short descriptions

Michele CampoloJul 2, 2020, 5:41 PM
14 points

8 votes

Overall karma indicates overall quality.

8 comments5 min readLW link

Re­search ideas to study hu­mans with AI Safety in mind

Riccardo VolpatoJul 3, 2020, 4:01 PM
23 points

8 votes

Overall karma indicates overall quality.

2 comments5 min readLW link

AI Benefits Post 3: Direct and Indi­rect Ap­proaches to AI Benefits

CullenJul 6, 2020, 6:48 PM
8 points

4 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

An­titrust-Com­pli­ant AI In­dus­try Self-Regulation

CullenJul 7, 2020, 8:53 PM
9 points

5 votes

Overall karma indicates overall quality.

3 comments1 min readLW link
(cullenokeefe.com)

Should AI Be Open?

Scott AlexanderDec 17, 2015, 8:25 AM
20 points

13 votes

Overall karma indicates overall quality.

3 comments13 min readLW link

Meta Pro­gram­ming GPT: A route to Su­per­in­tel­li­gence?

dmteaJul 11, 2020, 2:51 PM
10 points

5 votes

Overall karma indicates overall quality.

7 comments4 min readLW link

The Dilemma of Worse Than Death Scenarios

arkaeikJul 10, 2018, 9:18 AM
5 points

8 votes

Overall karma indicates overall quality.

18 comments4 min readLW link

[Question] What are the mostly likely ways AGI will emerge?

Craig QuiterJul 14, 2020, 12:58 AM
3 points

2 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

AI Benefits Post 4: Out­stand­ing Ques­tions on Select­ing Benefits

CullenJul 14, 2020, 5:26 PM
4 points

2 votes

Overall karma indicates overall quality.

4 comments5 min readLW link

Solv­ing Math Prob­lems by Relay

Jul 17, 2020, 3:32 PM
98 points

33 votes

Overall karma indicates overall quality.

26 comments7 min readLW link

AI Benefits Post 5: Out­stand­ing Ques­tions on Govern­ing Benefits

CullenJul 21, 2020, 4:46 PM
4 points

2 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

[Question] Why is pseudo-al­ign­ment “worse” than other ways ML can fail to gen­er­al­ize?

nostalgebraistJul 18, 2020, 10:54 PM
45 points

12 votes

Overall karma indicates overall quality.

10 comments2 min readLW link

[Question] “Do Noth­ing” util­ity func­tion, 3½ years later?

niplavJul 20, 2020, 11:09 AM
5 points

3 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

[AN #80]: Why AI risk might be solved with­out ad­di­tional in­ter­ven­tion from longtermists

Rohin ShahJan 2, 2020, 6:20 PM
34 points

18 votes

Overall karma indicates overall quality.

94 comments10 min readLW link
(mailchi.mp)

Ac­cess to AI: a hu­man right?

dmteaJul 25, 2020, 9:38 AM
5 points

5 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

The Rise of Com­mon­sense Reasoning

DragonGodJul 27, 2020, 7:01 PM
8 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(www.reddit.com)

AI and Efficiency

DragonGodJul 27, 2020, 8:58 PM
9 points

4 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(openai.com)

FHI Re­port: How Will Na­tional Se­cu­rity Con­sid­er­a­tions Affect An­titrust De­ci­sions in AI? An Ex­am­i­na­tion of His­tor­i­cal Precedents

CullenJul 28, 2020, 6:34 PM
2 points

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link
(www.fhi.ox.ac.uk)

The “best pre­dic­tor is mal­i­cious op­ti­miser” problem

Donald HobsonJul 29, 2020, 11:49 AM
14 points

7 votes

Overall karma indicates overall quality.

10 comments2 min readLW link

Suffi­ciently Ad­vanced Lan­guage Models Can Do Re­in­force­ment Learning

Past AccountAug 2, 2020, 3:32 PM
21 points

16 votes

Overall karma indicates overall quality.

7 comments7 min readLW link

[Question] What are the most im­por­tant pa­pers/​post/​re­sources to read to un­der­stand more of GPT-3?

adamShimiAug 2, 2020, 8:53 PM
22 points

12 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

[Question] What should an Ein­stein-like figure in Ma­chine Learn­ing do?

RaziedAug 5, 2020, 11:52 PM
3 points

2 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Book re­view: Ar­chi­tects of In­tel­li­gence by Martin Ford (2018)

OferAug 11, 2020, 5:30 PM
15 points

7 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

[Question] Will OpenAI’s work un­in­ten­tion­ally in­crease ex­is­ten­tial risks re­lated to AI?

adamShimiAug 11, 2020, 6:16 PM
50 points

32 votes

Overall karma indicates overall quality.

56 comments1 min readLW link

Blog post: A tale of two re­search communities

Aryeh EnglanderAug 12, 2020, 8:41 PM
14 points

6 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Map­ping Out Alignment

Aug 15, 2020, 1:02 AM
42 points

12 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

My Un­der­stand­ing of Paul Chris­ti­ano’s Iter­ated Am­plifi­ca­tion AI Safety Re­search Agenda

Chi NguyenAug 15, 2020, 8:02 PM
119 points

38 votes

Overall karma indicates overall quality.

21 comments39 min readLW link

GPT-3, be­lief, and consistency

skybrianAug 16, 2020, 11:12 PM
18 points

10 votes

Overall karma indicates overall quality.

7 comments2 min readLW link

[Question] What pre­cisely do we mean by AI al­ign­ment?

Gordon Seidoh WorleyDec 9, 2018, 2:23 AM
27 points

8 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

Thoughts on the Fea­si­bil­ity of Pro­saic AGI Align­ment?

iamthouthouartiAug 21, 2020, 11:25 PM
8 points

5 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

[Question] Fore­cast­ing Thread: AI Timelines

Aug 22, 2020, 2:33 AM
133 points

65 votes

Overall karma indicates overall quality.

95 comments2 min readLW link

Learn­ing hu­man prefer­ences: black-box, white-box, and struc­tured white-box access

Stuart_ArmstrongAug 24, 2020, 11:42 AM
25 points

9 votes

Overall karma indicates overall quality.

9 comments6 min readLW link

Proofs Sec­tion 2.3 (Up­dates, De­ci­sion The­ory)

DiffractorAug 27, 2020, 7:49 AM
7 points

2 votes

Overall karma indicates overall quality.

0 comments31 min readLW link

Proofs Sec­tion 2.2 (Iso­mor­phism to Ex­pec­ta­tions)

DiffractorAug 27, 2020, 7:52 AM
7 points

2 votes

Overall karma indicates overall quality.

0 comments46 min readLW link

Proofs Sec­tion 2.1 (The­o­rem 1, Lem­mas)

DiffractorAug 27, 2020, 7:54 AM
7 points

2 votes

Overall karma indicates overall quality.

0 comments36 min readLW link

Proofs Sec­tion 1.1 (Ini­tial re­sults to LF-du­al­ity)

DiffractorAug 27, 2020, 7:59 AM
7 points

4 votes

Overall karma indicates overall quality.

0 comments20 min readLW link

Proofs Sec­tion 1.2 (Mix­tures, Up­dates, Push­for­wards)

DiffractorAug 27, 2020, 7:57 AM
7 points

2 votes

Overall karma indicates overall quality.

0 comments14 min readLW link

Ba­sic In­framea­sure Theory

DiffractorAug 27, 2020, 8:02 AM
35 points

12 votes

Overall karma indicates overall quality.

16 comments25 min readLW link

Belief Func­tions And De­ci­sion Theory

DiffractorAug 27, 2020, 8:00 AM
15 points

7 votes

Overall karma indicates overall quality.

8 comments39 min readLW link

Tech­ni­cal model re­fine­ment formalism

Stuart_ArmstrongAug 27, 2020, 11:54 AM
19 points

3 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

Pong from pix­els with­out read­ing “Pong from Pix­els”

Ian McKenzieAug 29, 2020, 5:26 PM
15 points

6 votes

Overall karma indicates overall quality.

1 comment7 min readLW link

Reflec­tions on AI Timelines Fore­cast­ing Thread

AmandangoSep 1, 2020, 1:42 AM
53 points

26 votes

Overall karma indicates overall quality.

7 comments5 min readLW link

on “learn­ing to sum­ma­rize”

nostalgebraistSep 12, 2020, 3:20 AM
25 points

10 votes

Overall karma indicates overall quality.

13 comments8 min readLW link
(nostalgebraist.tumblr.com)

[Question] The uni­ver­sal­ity of com­pu­ta­tion and mind de­sign space

alanfSep 12, 2020, 2:58 PM
1 point

3 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

Clar­ify­ing “What failure looks like”

Sam ClarkeSep 20, 2020, 8:40 PM
95 points

45 votes

Overall karma indicates overall quality.

14 comments17 min readLW link

Hu­man Bi­ases that Ob­scure AI Progress

Danielle EnsignSep 25, 2020, 12:24 AM
42 points

22 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

[Question] Com­pe­tence vs Alignment

kwiat.devSep 30, 2020, 9:03 PM
6 points

3 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

AGI safety from first prin­ci­ples: Alignment

Richard_NgoOct 1, 2020, 3:13 AM
56 points

20 votes

Overall karma indicates overall quality.

2 comments13 min readLW link

[Question] GPT-3 + GAN

stick109Oct 17, 2020, 7:58 AM
4 points

3 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Book Re­view: Re­in­force­ment Learn­ing by Sut­ton and Barto

billmeiOct 20, 2020, 7:40 PM
52 points

19 votes

Overall karma indicates overall quality.

3 comments10 min readLW link

GPT-X, Paper­clip Max­i­mizer? An­a­lyz­ing AGI and Fi­nal Goals

meanderingmooseOct 22, 2020, 2:33 PM
8 points

5 votes

Overall karma indicates overall quality.

1 comment6 min readLW link

Con­tain­ing the AI… In­side a Si­mu­lated Reality

HumaneAutomationOct 31, 2020, 4:16 PM
1 point

6 votes

Overall karma indicates overall quality.

9 comments2 min readLW link

Why those who care about catas­trophic and ex­is­ten­tial risk should care about au­tonomous weapons

aaguirreNov 11, 2020, 3:22 PM
60 points

30 votes

Overall karma indicates overall quality.

20 comments19 min readLW link

Euro­pean Master’s Pro­grams in Ma­chine Learn­ing, Ar­tifi­cial In­tel­li­gence, and re­lated fields

Master Programs ML/AINov 14, 2020, 3:51 PM
32 points

23 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

Should we post­pone AGI un­til we reach safety?

otto.bartenNov 18, 2020, 3:43 PM
27 points

18 votes

Overall karma indicates overall quality.

36 comments3 min readLW link

Com­mit­ment and cred­i­bil­ity in mul­ti­po­lar AI scenarios

anni_leskelaDec 4, 2020, 6:48 PM
25 points

15 votes

Overall karma indicates overall quality.

3 comments18 min readLW link

[Question] AI Win­ter Is Com­ing—How to profit from it?

maximkazhenkovDec 5, 2020, 8:23 PM
10 points

5 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

An­nounc­ing the Tech­ni­cal AI Safety Podcast

QuinnDec 7, 2020, 6:51 PM
42 points

19 votes

Overall karma indicates overall quality.

6 comments2 min readLW link
(technical-ai-safety.libsyn.com)

All GPT skills are translation

p.b.Dec 13, 2020, 8:06 PM
4 points

3 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

[Question] Judg­ing AGI Output

cy6erlionDec 14, 2020, 12:43 PM
3 points

2 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Risk Map of AI Systems

Dec 15, 2020, 9:16 AM
25 points

11 votes

Overall karma indicates overall quality.

3 comments8 min readLW link

AI Align­ment, Philo­soph­i­cal Plu­ral­ism, and the Rele­vance of Non-Western Philosophy

xuanJan 1, 2021, 12:08 AM
30 points

17 votes

Overall karma indicates overall quality.

21 comments20 min readLW link

Are we all mis­al­igned?

Mateusz MazurkiewiczJan 3, 2021, 2:42 AM
11 points

8 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

[Question] What do we *re­ally* ex­pect from a well-al­igned AI?

Jan BetleyJan 4, 2021, 8:57 PM
8 points

3 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

Eight claims about multi-agent AGI safety

Richard_NgoJan 7, 2021, 1:34 PM
73 points

25 votes

Overall karma indicates overall quality.

18 comments5 min readLW link

Imi­ta­tive Gen­er­al­i­sa­tion (AKA ‘Learn­ing the Prior’)

Beth BarnesJan 10, 2021, 12:30 AM
92 points

28 votes

Overall karma indicates overall quality.

14 comments12 min readLW link

Pre­dic­tion can be Outer Aligned at Optimum

Lukas FinnvedenJan 10, 2021, 6:48 PM
15 points

8 votes

Overall karma indicates overall quality.

12 comments11 min readLW link

[Question] Poll: Which vari­ables are most strate­gi­cally rele­vant?

Jan 22, 2021, 5:17 PM
32 points

11 votes

Overall karma indicates overall quality.

34 comments1 min readLW link

AISU 2021

Linda LinseforsJan 30, 2021, 5:40 PM
28 points

11 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Deep­mind has made a gen­eral in­duc­tor (“Mak­ing sense of sen­sory in­put”)

mako yassFeb 2, 2021, 2:54 AM
48 points

21 votes

Overall karma indicates overall quality.

10 comments1 min readLW link
(www.sciencedirect.com)

Coun­ter­fac­tual Plan­ning in AGI Systems

Koen.HoltmanFeb 3, 2021, 1:54 PM
7 points

5 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

[AN #136]: How well will GPT-N perform on down­stream tasks?

Rohin ShahFeb 3, 2021, 6:10 PM
21 points

5 votes

Overall karma indicates overall quality.

2 comments9 min readLW link
(mailchi.mp)

For­mal Solu­tion to the In­ner Align­ment Problem

michaelcohenFeb 18, 2021, 2:51 PM
47 points

30 votes

Overall karma indicates overall quality.

123 comments2 min readLW link

TASP Ep 3 - Op­ti­mal Poli­cies Tend to Seek Power

QuinnMar 11, 2021, 1:44 AM
24 points

5 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(technical-ai-safety.libsyn.com)

Phy­lac­tery De­ci­sion Theory

BunthutApr 2, 2021, 8:55 PM
14 points

6 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

Pre­dic­tive Cod­ing has been Unified with Backpropagation

lsusrApr 2, 2021, 9:42 PM
166 points

105 votes

Overall karma indicates overall quality.

44 comments2 min readLW link

[Question] What if we could use the the­ory of Mechanism De­sign from Game The­ory as a medium achieve AI Align­ment?

farari7Apr 4, 2021, 12:56 PM
4 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Another (outer) al­ign­ment failure story

paulfchristianoApr 7, 2021, 8:12 PM
210 points

103 votes

Overall karma indicates overall quality.

38 comments12 min readLW link

A Sys­tem For Evolv­ing In­creas­ingly Gen­eral Ar­tifi­cial In­tel­li­gence From Cur­rent Technologies

Tsang Chung ShuApr 8, 2021, 9:37 PM
1 point

5 votes

Overall karma indicates overall quality.

3 comments11 min readLW link

April 2021 Deep Dive: Trans­form­ers and GPT-3

adamShimiMay 1, 2021, 11:18 AM
30 points

17 votes

Overall karma indicates overall quality.

6 comments7 min readLW link

[Question] [time­boxed ex­er­cise] write me your model of AI hu­man-ex­is­ten­tial safety and the al­ign­ment prob­lems in 15 minutes

QuinnMay 4, 2021, 7:10 PM
6 points

4 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Mostly ques­tions about Dumb AI Kernels

HorizonHeldMay 12, 2021, 10:00 PM
1 point

1 vote

Overall karma indicates overall quality.

1 comment9 min readLW link

Thoughts on Iter­ated Distil­la­tion and Amplification

WaddingtonMay 11, 2021, 9:32 PM
9 points

7 votes

Overall karma indicates overall quality.

2 comments20 min readLW link

How do we build or­gani­sa­tions that want to build safe AI?

sxaeMay 12, 2021, 3:08 PM
4 points

3 votes

Overall karma indicates overall quality.

4 comments9 min readLW link

[Question] Who has ar­gued in de­tail that a cur­rent AI sys­tem is phe­nom­e­nally con­scious?

RobboMay 14, 2021, 10:03 PM
3 points

3 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

How I Learned to Stop Wor­ry­ing and Love MUM

WaddingtonMay 20, 2021, 7:57 AM
2 points

2 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

AI Safety Re­search Pro­ject Ideas

Owain_EvansMay 21, 2021, 1:39 PM
58 points

25 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

[Question] How one uses set the­ory for al­ign­ment prob­lem?

Valentin2026May 29, 2021, 12:28 AM
8 points

7 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Reflec­tion of Hier­ar­chi­cal Re­la­tion­ship via Nuanced Con­di­tion­ing of Game The­ory Ap­proach for AI Devel­op­ment and Utilization

Kyoung-cheol KimJun 4, 2021, 7:20 AM
2 points

2 votes

Overall karma indicates overall quality.

2 comments9 min readLW link

Re­view of “Learn­ing Nor­ma­tivity: A Re­search Agenda”

Jun 6, 2021, 1:33 PM
34 points

8 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

Hard­ware for Trans­for­ma­tive AI

MrThinkJun 22, 2021, 6:13 PM
17 points

10 votes

Overall karma indicates overall quality.

7 comments2 min readLW link

Alex Turner’s Re­search, Com­pre­hen­sive In­for­ma­tion Gathering

adamShimiJun 23, 2021, 9:44 AM
15 points

7 votes

Overall karma indicates overall quality.

3 comments3 min readLW link

Dis­cus­sion: Ob­jec­tive Ro­bust­ness and In­ner Align­ment Terminology

Jun 23, 2021, 11:25 PM
70 points

21 votes

Overall karma indicates overall quality.

7 comments9 min readLW link

The Lan­guage of Bird

johnswentworthJun 27, 2021, 4:44 AM
44 points

21 votes

Overall karma indicates overall quality.

9 comments2 min readLW link

[Question] What are some claims or opinions about multi-multi del­e­ga­tion you’ve seen in the meme­plex that you think de­serve scrutiny?

QuinnJun 27, 2021, 5:44 PM
17 points

8 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

An ex­am­i­na­tion of Me­tac­u­lus’ re­solved AI pre­dic­tions and their im­pli­ca­tions for AI timelines

CharlesDJul 20, 2021, 9:08 AM
28 points

15 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

[Question] How should my timelines in­fluence my ca­reer choice?

Tom LieberumAug 3, 2021, 10:14 AM
13 points

9 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

What is the prob­lem?

Carlos RamirezAug 11, 2021, 10:33 PM
7 points

5 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

OpenAI Codex: First Impressions

specbugAug 13, 2021, 4:52 PM
49 points

25 votes

Overall karma indicates overall quality.

8 comments4 min readLW link
(sixeleven.in)

[Question] 1h-vol­un­teers needed for a small AI Safety-re­lated re­search pro­ject

PabloAMCAug 16, 2021, 5:53 PM
2 points

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

Ex­trac­tion of hu­man prefer­ences 👨→🤖

arunraja-hubAug 24, 2021, 4:34 PM
18 points

9 votes

Overall karma indicates overall quality.

2 comments5 min readLW link

Call for re­search on eval­u­at­ing al­ign­ment (fund­ing + ad­vice available)

Beth BarnesAug 31, 2021, 11:28 PM
105 points

34 votes

Overall karma indicates overall quality.

11 comments5 min readLW link

Ob­sta­cles to gra­di­ent hacking

leogaoSep 5, 2021, 10:42 PM
21 points

7 votes

Overall karma indicates overall quality.

11 comments4 min readLW link

[Question] Con­di­tional on the first AGI be­ing al­igned cor­rectly, is a good out­come even still likely?

iamthouthouartiSep 6, 2021, 5:30 PM
2 points

1 vote

Overall karma indicates overall quality.

1 comment1 min readLW link

Dist­in­guish­ing AI takeover scenarios

Sep 8, 2021, 4:19 PM
67 points

31 votes

Overall karma indicates overall quality.

11 comments14 min readLW link

Paths To High-Level Ma­chine Intelligence

Daniel_EthSep 10, 2021, 1:21 PM
67 points

23 votes

Overall karma indicates overall quality.

8 comments33 min readLW link

How truth­ful is GPT-3? A bench­mark for lan­guage models

Owain_EvansSep 16, 2021, 10:09 AM
56 points

25 votes

Overall karma indicates overall quality.

24 comments6 min readLW link

In­ves­ti­gat­ing AI Takeover Scenarios

Sammy MartinSep 17, 2021, 6:47 PM
27 points

11 votes

Overall karma indicates overall quality.

1 comment27 min readLW link

A suffi­ciently para­noid non-Friendly AGI might self-mod­ify it­self to be­come Friendly

RomanSSep 22, 2021, 6:29 AM
5 points

5 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Towards De­con­fus­ing Gra­di­ent Hacking

leogaoOct 24, 2021, 12:43 AM
25 points

13 votes

Overall karma indicates overall quality.

1 comment12 min readLW link

A brief re­view of the rea­sons multi-ob­jec­tive RL could be im­por­tant in AI Safety Research

Ben SmithSep 29, 2021, 5:09 PM
27 points

16 votes

Overall karma indicates overall quality.

8 comments10 min readLW link

Meta learn­ing to gra­di­ent hack

Quintin PopeOct 1, 2021, 7:25 PM
54 points

18 votes

Overall karma indicates overall quality.

11 comments3 min readLW link

Pro­posal: Scal­ing laws for RL generalization

axiomanOct 1, 2021, 9:32 PM
14 points

14 votes

Overall karma indicates overall quality.

10 comments11 min readLW link

A Frame­work of Pre­dic­tion Technologies

isaduanOct 3, 2021, 10:26 AM
8 points

5 votes

Overall karma indicates overall quality.

2 comments9 min readLW link

AI Pre­dic­tion Ser­vices and Risks of War

isaduanOct 3, 2021, 10:26 AM
3 points

2 votes

Overall karma indicates overall quality.

2 comments10 min readLW link

Pos­si­ble Wor­lds af­ter Pre­dic­tion Take-off

isaduanOct 3, 2021, 10:26 AM
5 points

3 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

[Pro­posal] Method of lo­cat­ing use­ful sub­nets in large models

Quintin PopeOct 13, 2021, 8:52 PM
9 points

4 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Com­men­tary on “AGI Safety From First Prin­ci­ples by Richard Ngo, Septem­ber 2020”

Robert KralischOct 14, 2021, 3:11 PM
3 points

2 votes

Overall karma indicates overall quality.

0 comments20 min readLW link

The AGI needs to be honest

rokosbasiliskOct 16, 2021, 7:24 PM
2 points

6 votes

Overall karma indicates overall quality.

12 comments2 min readLW link

“Re­dun­dant” AI Alignment

Mckay JensenOct 16, 2021, 9:32 PM
12 points

4 votes

Overall karma indicates overall quality.

3 comments1 min readLW link
(quevivasbien.github.io)

[MLSN #1]: ICLR Safety Paper Roundup

Dan_HOct 18, 2021, 3:19 PM
59 points

17 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

AMA on Truth­ful AI: Owen Cot­ton-Bar­ratt, Owain Evans & co-authors

Owain_EvansOct 22, 2021, 4:23 PM
31 points

8 votes

Overall karma indicates overall quality.

15 comments1 min readLW link

Hegel vs. GPT-3

BezziOct 27, 2021, 5:55 AM
9 points

9 votes

Overall karma indicates overall quality.

21 comments2 min readLW link

Google an­nounces Path­ways: new gen­er­a­tion mul­ti­task AI Architecture

OzyrusOct 29, 2021, 11:55 AM
6 points

3 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(blog.google)

What is the most evil AI that we could build, to­day?

ThomasJNov 1, 2021, 7:58 PM
−2 points

7 votes

Overall karma indicates overall quality.

14 comments1 min readLW link

Why we need proso­cial agents

Akbir KhanNov 2, 2021, 3:19 PM
6 points

4 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Pos­si­ble re­search di­rec­tions to im­prove the mechanis­tic ex­pla­na­tion of neu­ral networks

delton137Nov 9, 2021, 2:36 AM
29 points

10 votes

Overall karma indicates overall quality.

8 comments9 min readLW link

What are red flags for Neu­ral Net­work suffer­ing?

Marius HobbhahnNov 8, 2021, 12:51 PM
26 points

16 votes

Overall karma indicates overall quality.

15 comments12 min readLW link

Us­ing Brain-Com­puter In­ter­faces to get more data for AI alignment

RobboNov 7, 2021, 12:00 AM
35 points

11 votes

Overall karma indicates overall quality.

10 comments7 min readLW link

Hard­code the AGI to need our ap­proval in­definitely?

MichaelStJulesNov 11, 2021, 7:04 AM
2 points

4 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Stop but­ton: to­wards a causal solution

tailcalledNov 12, 2021, 7:09 PM
23 points

10 votes

Overall karma indicates overall quality.

37 comments9 min readLW link

A FLI post­doc­toral grant ap­pli­ca­tion: AI al­ign­ment via causal anal­y­sis and de­sign of agents

PabloAMCNov 13, 2021, 1:44 AM
4 points

2 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

What would we do if al­ign­ment were fu­tile?

Grant DemareeNov 14, 2021, 8:09 AM
73 points

44 votes

Overall karma indicates overall quality.

43 comments3 min readLW link

At­tempted Gears Anal­y­sis of AGI In­ter­ven­tion Dis­cus­sion With Eliezer

ZviNov 15, 2021, 3:50 AM
204 points

76 votes

Overall karma indicates overall quality.

48 comments16 min readLW link
(thezvi.wordpress.com)

A pos­i­tive case for how we might suc­ceed at pro­saic AI alignment

evhubNov 16, 2021, 1:49 AM
78 points

30 votes

Overall karma indicates overall quality.

47 comments6 min readLW link

Su­per in­tel­li­gent AIs that don’t re­quire alignment

Yair HalberstadtNov 16, 2021, 7:55 PM
10 points

10 votes

Overall karma indicates overall quality.

2 comments6 min readLW link

Some real ex­am­ples of gra­di­ent hacking

Oliver SourbutNov 22, 2021, 12:11 AM
15 points

5 votes

Overall karma indicates overall quality.

8 comments2 min readLW link

[linkpost] Ac­qui­si­tion of Chess Knowl­edge in AlphaZero

Quintin PopeNov 23, 2021, 7:55 AM
8 points

5 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

AI Tracker: mon­i­tor­ing cur­rent and near-fu­ture risks from su­per­scale models

Nov 23, 2021, 7:16 PM
64 points

29 votes

Overall karma indicates overall quality.

13 comments3 min readLW link
(aitracker.org)

AI Safety Needs Great Engineers

Andy JonesNov 23, 2021, 3:40 PM
78 points

54 votes

Overall karma indicates overall quality.

45 comments4 min readLW link

HIRING: In­form and shape a new pro­ject on AI safety at Part­ner­ship on AI

Madhulika SrikumarNov 24, 2021, 8:27 AM
6 points

4 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

How to mea­sure FLOP/​s for Neu­ral Net­works em­piri­cally?

Marius HobbhahnNov 29, 2021, 3:18 PM
16 points

8 votes

Overall karma indicates overall quality.

5 comments7 min readLW link

AI Gover­nance Fun­da­men­tals—Cur­ricu­lum and Application

MauNov 30, 2021, 2:19 AM
17 points

7 votes

Overall karma indicates overall quality.

0 comments16 min readLW link

Be­hav­ior Clon­ing is Miscalibrated

leogaoDec 5, 2021, 1:36 AM
53 points

21 votes

Overall karma indicates overall quality.

3 comments3 min readLW link

ML Align­ment The­ory Pro­gram un­der Evan Hubinger

Dec 6, 2021, 12:03 AM
82 points

39 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

In­for­ma­tion bot­tle­neck for coun­ter­fac­tual corrigibility

tailcalledDec 6, 2021, 5:11 PM
8 points

3 votes

Overall karma indicates overall quality.

1 comment7 min readLW link

Model­ing Failure Modes of High-Level Ma­chine Intelligence

Dec 6, 2021, 1:54 PM
54 points

17 votes

Overall karma indicates overall quality.

1 comment12 min readLW link

Find­ing the mul­ti­ple ground truths of CoinRun and image classification

Stuart_ArmstrongDec 8, 2021, 6:13 PM
15 points

4 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

[Question] What al­ign­ment-re­lated con­cepts should be bet­ter known in the broader ML com­mu­nity?

Lauro LangoscoDec 9, 2021, 8:44 PM
6 points

4 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Un­der­stand­ing Gra­di­ent Hacking

peterbarnettDec 10, 2021, 3:58 PM
30 points

14 votes

Overall karma indicates overall quality.

5 comments30 min readLW link

What’s the back­ward-for­ward FLOP ra­tio for Neu­ral Net­works?

Dec 13, 2021, 8:54 AM
17 points

8 votes

Overall karma indicates overall quality.

8 comments10 min readLW link

My Overview of the AI Align­ment Land­scape: A Bird’s Eye View

Neel NandaDec 15, 2021, 11:44 PM
111 points

62 votes

Overall karma indicates overall quality.

9 comments15 min readLW link

Disen­tan­gling Per­spec­tives On Strat­egy-Steal­ing in AI Safety

shawnghuDec 18, 2021, 8:13 PM
20 points

6 votes

Overall karma indicates overall quality.

1 comment11 min readLW link

De­mand­ing and De­sign­ing Aligned Cog­ni­tive Architectures

Koen.HoltmanDec 21, 2021, 5:32 PM
8 points

3 votes

Overall karma indicates overall quality.

5 comments5 min readLW link

Po­ten­tial gears level ex­pla­na­tions of smooth progress

ryan_greenblattDec 22, 2021, 6:05 PM
4 points

2 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

Trans­former Circuits

evhubDec 22, 2021, 9:09 PM
142 points

58 votes

Overall karma indicates overall quality.

4 comments3 min readLW link
(transformer-circuits.pub)

Gra­di­ent Hack­ing via Schel­ling Goals

Adam ScherlisDec 28, 2021, 8:38 PM
33 points

12 votes

Overall karma indicates overall quality.

4 comments4 min readLW link

Reader-gen­er­ated Essays

Henrik KarlssonJan 3, 2022, 8:56 AM
17 points

13 votes

Overall karma indicates overall quality.

0 comments6 min readLW link
(escapingflatland.substack.com)

Brain Effi­ciency: Much More than You Wanted to Know

jacob_cannellJan 6, 2022, 3:38 AM
195 points

112 votes

Overall karma indicates overall quality.

87 comments28 min readLW link

Un­der­stand­ing the two-head strat­egy for teach­ing ML to an­swer ques­tions honestly

Adam ScherlisJan 11, 2022, 11:24 PM
28 points

9 votes

Overall karma indicates overall quality.

1 comment10 min readLW link

Plan B in AI Safety approach

avturchinJan 13, 2022, 12:03 PM
33 points

17 votes

Overall karma indicates overall quality.

9 comments2 min readLW link

Truth­ful LMs as a warm-up for al­igned AGI

Jacob_HiltonJan 17, 2022, 4:49 PM
65 points

34 votes

Overall karma indicates overall quality.

14 comments13 min readLW link

How I’m think­ing about GPT-N

delton137Jan 17, 2022, 5:11 PM
46 points

31 votes

Overall karma indicates overall quality.

21 comments18 min readLW link

Align­ment Prob­lems All the Way Down

peterbarnettJan 22, 2022, 12:19 AM
26 points

13 votes

Overall karma indicates overall quality.

7 comments10 min readLW link

[Question] How fea­si­ble/​costly would it be to train a very large AI model on dis­tributed clusters of GPUs?

AnonymousJan 25, 2022, 7:20 PM
7 points

5 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Causal­ity, Trans­for­ma­tive AI and al­ign­ment—part I

Marius HobbhahnJan 27, 2022, 4:18 PM
13 points

9 votes

Overall karma indicates overall quality.

11 comments8 min readLW link

2+2: On­tolog­i­cal Framework

LyrialtusFeb 1, 2022, 1:07 AM
−15 points

7 votes

Overall karma indicates overall quality.

2 comments12 min readLW link

QNR prospects are im­por­tant for AI al­ign­ment research

Eric DrexlerFeb 3, 2022, 3:20 PM
82 points

28 votes

Overall karma indicates overall quality.

10 comments11 min readLW link

Paradigm-build­ing: Introduction

Cameron BergFeb 8, 2022, 12:06 AM
25 points

15 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Paradigm-build­ing: The hi­er­ar­chi­cal ques­tion framework

Cameron BergFeb 9, 2022, 4:47 PM
11 points

10 votes

Overall karma indicates overall quality.

16 comments3 min readLW link

Ques­tion 1: Pre­dicted ar­chi­tec­ture of AGI learn­ing al­gorithm(s)

Cameron BergFeb 10, 2022, 5:22 PM
12 points

13 votes

Overall karma indicates overall quality.

1 comment7 min readLW link

Ques­tion 2: Pre­dicted bad out­comes of AGI learn­ing architecture

Cameron BergFeb 11, 2022, 10:23 PM
5 points

6 votes

Overall karma indicates overall quality.

1 comment10 min readLW link

Ques­tion 3: Con­trol pro­pos­als for min­i­miz­ing bad outcomes

Cameron BergFeb 12, 2022, 7:13 PM
5 points

6 votes

Overall karma indicates overall quality.

1 comment7 min readLW link

Ques­tion 4: Im­ple­ment­ing the con­trol proposals

Cameron BergFeb 13, 2022, 5:12 PM
6 points

6 votes

Overall karma indicates overall quality.

2 comments5 min readLW link

Ques­tion 5: The timeline hyperparameter

Cameron BergFeb 14, 2022, 4:38 PM
5 points

4 votes

Overall karma indicates overall quality.

3 comments7 min readLW link

Paradigm-build­ing: Con­clu­sion and prac­ti­cal takeaways

Cameron BergFeb 15, 2022, 4:11 PM
2 points

1 vote

Overall karma indicates overall quality.

1 comment2 min readLW link

How com­plex are my­opic imi­ta­tors?

Vivek HebbarFeb 8, 2022, 12:00 PM
23 points

9 votes

Overall karma indicates overall quality.

1 comment15 min readLW link

Me­tac­u­lus launches con­test for es­says with quan­ti­ta­tive pre­dic­tions about AI

Feb 8, 2022, 4:07 PM
25 points

12 votes

Overall karma indicates overall quality.

2 comments1 min readLW link
(www.metaculus.com)

Hy­poth­e­sis: gra­di­ent de­scent prefers gen­eral circuits

Quintin PopeFeb 8, 2022, 9:12 PM
40 points

23 votes

Overall karma indicates overall quality.

26 comments11 min readLW link

Com­pute Trends Across Three eras of Ma­chine Learning

Feb 16, 2022, 2:18 PM
91 points

46 votes

Overall karma indicates overall quality.

13 comments2 min readLW link

[Question] Is the com­pe­ti­tion/​co­op­er­a­tion be­tween sym­bolic AI and statis­ti­cal AI (ML) about his­tor­i­cal ap­proach to re­search /​ en­g­ineer­ing, or is it more fun­da­men­tally about what in­tel­li­gent agents “are”?

Edward HammondFeb 17, 2022, 11:11 PM
1 point

1 vote

Overall karma indicates overall quality.

1 comment2 min readLW link

HCH and Ad­ver­sar­ial Questions

David UdellFeb 19, 2022, 12:52 AM
15 points

9 votes

Overall karma indicates overall quality.

7 comments26 min readLW link

Thoughts on Danger­ous Learned Optimization

peterbarnettFeb 19, 2022, 10:46 AM
4 points

2 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

Rel­a­tivized Defi­ni­tions as a Method to Sidestep the Löbian Obstacle

homotowatFeb 27, 2022, 6:37 AM
27 points

11 votes

Overall karma indicates overall quality.

4 comments7 min readLW link

What we know about ma­chine learn­ing’s repli­ca­tion crisis

Younes KamelMar 5, 2022, 11:55 PM
35 points

13 votes

Overall karma indicates overall quality.

4 comments6 min readLW link
(youneskamel.substack.com)

Pro­ject­ing com­pute trends in Ma­chine Learning

Mar 7, 2022, 3:32 PM
59 points

26 votes

Overall karma indicates overall quality.

5 comments6 min readLW link

[Sur­vey] Ex­pec­ta­tions of a Post-ASI Order

Lone PineMar 9, 2022, 7:17 PM
5 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

A Longlist of The­o­ries of Im­pact for Interpretability

Neel NandaMar 11, 2022, 2:55 PM
106 points

53 votes

Overall karma indicates overall quality.

29 comments5 min readLW link

New GPT3 Im­pres­sive Ca­pa­bil­ities—In­struc­tGPT3 [1/​2]

simeon_cMar 13, 2022, 10:58 AM
71 points

32 votes

Overall karma indicates overall quality.

10 comments7 min readLW link

Phase tran­si­tions and AGI

Mar 17, 2022, 5:22 PM
44 points

13 votes

Overall karma indicates overall quality.

19 comments9 min readLW link
(www.metaculus.com)

Can we simu­late hu­man evolu­tion to cre­ate a some­what al­igned AGI?

Thomas KwaMar 28, 2022, 10:55 PM
21 points

13 votes

Overall karma indicates overall quality.

7 comments7 min readLW link

Pro­ject In­tro: Selec­tion The­o­rems for Modularity

Apr 4, 2022, 12:59 PM
69 points

28 votes

Overall karma indicates overall quality.

20 comments16 min readLW link

My agenda for re­search into trans­former ca­pa­bil­ities—Introduction

p.b.Apr 5, 2022, 9:23 PM
11 points

6 votes

Overall karma indicates overall quality.

1 comment3 min readLW link

Re­search agenda: Can trans­form­ers do sys­tem 2 think­ing?

p.b.Apr 6, 2022, 1:31 PM
20 points

9 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

PaLM in “Ex­trap­o­lat­ing GPT-N perfor­mance”

Lukas FinnvedenApr 6, 2022, 1:05 PM
80 points

42 votes

Overall karma indicates overall quality.

19 comments2 min readLW link

Re­search agenda—Build­ing a multi-modal chess-lan­guage model

p.b.Apr 7, 2022, 12:25 PM
8 points

4 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

Is GPT3 a Good Ra­tion­al­ist? - In­struc­tGPT3 [2/​2]

simeon_cApr 7, 2022, 1:46 PM
11 points

8 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

Play­ing with DALL·E 2

Dave OrrApr 7, 2022, 6:49 PM
165 points

113 votes

Overall karma indicates overall quality.

116 comments6 min readLW link

Progress Re­port 4: logit lens redux

Nathan Helm-BurgerApr 8, 2022, 6:35 PM
3 points

2 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Hyper­bolic takeoff

Ege ErdilApr 9, 2022, 3:57 PM
17 points

7 votes

Overall karma indicates overall quality.

8 comments10 min readLW link
(www.metaculus.com)

Elicit: Lan­guage Models as Re­search Assistants

Apr 9, 2022, 2:56 PM
70 points

34 votes

Overall karma indicates overall quality.

7 comments13 min readLW link

Is it time to start think­ing about what AI Friendli­ness means?

Victor NovikovApr 11, 2022, 9:32 AM
18 points

9 votes

Overall karma indicates overall quality.

6 comments3 min readLW link

What more com­pute does for brain-like mod­els: re­sponse to Rohin

Nathan Helm-BurgerApr 13, 2022, 3:40 AM
22 points

9 votes

Overall karma indicates overall quality.

14 comments11 min readLW link

Align­ment and Deep Learning

AiyenApr 17, 2022, 12:02 AM
44 points

28 votes

Overall karma indicates overall quality.

35 comments8 min readLW link

[$20K in Prizes] AI Safety Ar­gu­ments Competition

Apr 26, 2022, 4:13 PM
74 points

48 votes

Overall karma indicates overall quality.

543 comments3 min readLW link

SERI ML Align­ment The­ory Schol­ars Pro­gram 2022

Apr 27, 2022, 12:43 AM
56 points

22 votes

Overall karma indicates overall quality.

6 comments3 min readLW link

[Question] What is a train­ing “step” vs. “epi­sode” in ma­chine learn­ing?

Evan R. MurphyApr 28, 2022, 9:53 PM
9 points

4 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Prize for Align­ment Re­search Tasks

Apr 29, 2022, 8:57 AM
63 points

30 votes

Overall karma indicates overall quality.

36 comments10 min readLW link

Quick Thoughts on A.I. Governance

Nicholas KrossApr 30, 2022, 2:49 PM
66 points

29 votes

Overall karma indicates overall quality.

8 comments2 min readLW link
(www.thinkingmuchbetter.com)

What DALL-E 2 can and can­not do

Swimmer963 (Miranda Dixon-Luinenburg) May 1, 2022, 11:51 PM
351 points

189 votes

Overall karma indicates overall quality.

305 comments9 min readLW link

Open Prob­lems in Nega­tive Side Effect Minimization

May 6, 2022, 9:37 AM
12 points

9 votes

Overall karma indicates overall quality.

7 comments17 min readLW link

[Linkpost] diffu­sion mag­ne­tizes man­i­folds (DALL-E 2 in­tu­ition build­ing)

Paul BricmanMay 7, 2022, 11:01 AM
1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link
(paulbricman.com)

Up­dat­ing Utility Functions

May 9, 2022, 9:44 AM
36 points

18 votes

Overall karma indicates overall quality.

7 comments8 min readLW link

Con­di­tions for math­e­mat­i­cal equiv­alence of Stochas­tic Gra­di­ent Des­cent and Nat­u­ral Selection

Oliver SourbutMay 9, 2022, 9:38 PM
54 points

24 votes

Overall karma indicates overall quality.

12 comments10 min readLW link

AI safety should be made more ac­cessible us­ing non text-based media

MassimogMay 10, 2022, 3:14 AM
2 points

4 votes

Overall karma indicates overall quality.

4 comments4 min readLW link

The limits of AI safety via debate

Marius HobbhahnMay 10, 2022, 1:33 PM
28 points

20 votes

Overall karma indicates overall quality.

7 comments10 min readLW link

In­tro­duc­tion to the se­quence: In­ter­pretabil­ity Re­search for the Most Im­por­tant Century

Evan R. MurphyMay 12, 2022, 7:59 PM
16 points

9 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

Gato as the Dawn of Early AGI

David UdellMay 15, 2022, 6:52 AM
84 points

50 votes

Overall karma indicates overall quality.

29 comments12 min readLW link

Is AI Progress Im­pos­si­ble To Pre­dict?

alyssavanceMay 15, 2022, 6:30 PM
276 points

134 votes

Overall karma indicates overall quality.

38 comments2 min readLW link

Deep­Mind’s gen­er­al­ist AI, Gato: A non-tech­ni­cal explainer

May 16, 2022, 9:21 PM
57 points

40 votes

Overall karma indicates overall quality.

6 comments6 min readLW link

Gato’s Gen­er­al­i­sa­tion: Pre­dic­tions and Ex­per­i­ments I’d Like to See

Oliver SourbutMay 18, 2022, 7:15 AM
43 points

21 votes

Overall karma indicates overall quality.

3 comments10 min readLW link

Un­der­stand­ing Gato’s Su­per­vised Re­in­force­ment Learning

lorepieriMay 18, 2022, 11:08 AM
3 points

3 votes

Overall karma indicates overall quality.

5 comments1 min readLW link
(lorenzopieri.com)

A Story of AI Risk: In­struc­tGPT-N

peterbarnettMay 26, 2022, 11:22 PM
24 points

15 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

[Linkpost] A Chi­nese AI op­ti­mized for killing

RomanSJun 3, 2022, 9:17 AM
−2 points

15 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Give the AI safe tools

Adam JermynJun 3, 2022, 5:04 PM
3 points

2 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Towards a For­mal­i­sa­tion of Re­turns on Cog­ni­tive Rein­vest­ment (Part 1)

DragonGodJun 4, 2022, 6:42 PM
17 points

4 votes

Overall karma indicates overall quality.

8 comments13 min readLW link

Give the model a model-builder

Adam JermynJun 6, 2022, 12:21 PM
3 points

2 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

AGI Safety FAQ /​ all-dumb-ques­tions-al­lowed thread

Aryeh EnglanderJun 7, 2022, 5:47 AM
221 points

108 votes

Overall karma indicates overall quality.

515 comments4 min readLW link

Em­bod­i­ment is Indis­pens­able for AGI

P. G. Keerthana GopalakrishnanJun 7, 2022, 9:31 PM
6 points

9 votes

Overall karma indicates overall quality.

1 comment6 min readLW link
(keerthanapg.com)

You Only Get One Shot: an In­tu­ition Pump for Embed­ded Agency

Oliver SourbutJun 9, 2022, 9:38 PM
22 points

7 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

Sum­mary of “AGI Ruin: A List of Lethal­ities”

Stephen McAleeseJun 10, 2022, 10:35 PM
32 points

21 votes

Overall karma indicates overall quality.

2 comments8 min readLW link

Poorly-Aimed Death Rays

Thane RuthenisJun 11, 2022, 6:29 PM
43 points

24 votes

Overall karma indicates overall quality.

5 comments4 min readLW link

ELK Pro­posal—Make the Re­porter care about the Pre­dic­tor’s beliefs

Jun 11, 2022, 10:53 PM
8 points

6 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

Grokking “Semi-in­for­ma­tive pri­ors over AI timelines”

anson.hoJun 12, 2022, 10:17 PM
15 points

9 votes

Overall karma indicates overall quality.

7 comments14 min readLW link

[Question] Favourite new AI pro­duc­tivity tools?

Gabe MJun 15, 2022, 1:08 AM
14 points

10 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Con­tra Hofs­tadter on GPT-3 Nonsense

ricticJun 15, 2022, 9:53 PM
235 points

142 votes

Overall karma indicates overall quality.

22 comments2 min readLW link

[Question] What if LaMDA is in­deed sen­tient /​ self-aware /​ worth hav­ing rights?

RomanSJun 16, 2022, 9:10 AM
22 points

13 votes

Overall karma indicates overall quality.

13 comments1 min readLW link

Ten ex­per­i­ments in mod­u­lar­ity, which we’d like you to run!

Jun 16, 2022, 9:17 AM
59 points

26 votes

Overall karma indicates overall quality.

2 comments9 min readLW link

Align­ment re­search for “meta” purposes

acylhalideJun 16, 2022, 2:03 PM
15 points

11 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

[Question] AI mis­al­ign­ment risk from GPT-like sys­tems?

fiso64Jun 19, 2022, 5:35 PM
10 points

8 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

Half-baked al­ign­ment idea: train­ing to generalize

Aaron BergmanJun 19, 2022, 8:16 PM
7 points

4 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

Get­ting from an un­al­igned AGI to an al­igned AGI?

Tor Økland BarstadJun 21, 2022, 12:36 PM
9 points

8 votes

Overall karma indicates overall quality.

7 comments9 min readLW link

Miti­gat­ing the dam­age from un­al­igned ASI by co­op­er­at­ing with aliens that don’t ex­ist yet

MSRayneJun 21, 2022, 4:12 PM
−8 points

5 votes

Overall karma indicates overall quality.

7 comments6 min readLW link

AI Train­ing Should Allow Opt-Out

alyssavanceJun 23, 2022, 1:33 AM
76 points

31 votes

Overall karma indicates overall quality.

13 comments6 min readLW link

Up­dated Defer­ence is not a strong ar­gu­ment against the util­ity un­cer­tainty ap­proach to alignment

Ivan VendrovJun 24, 2022, 7:32 PM
20 points

14 votes

Overall karma indicates overall quality.

8 comments4 min readLW link

SunPJ in Alenia

FlorianHJun 25, 2022, 7:39 PM
7 points

5 votes

Overall karma indicates overall quality.

19 comments8 min readLW link
(plausiblestuff.com)

Con­di­tion­ing Gen­er­a­tive Models

Adam JermynJun 25, 2022, 10:15 PM
22 points

9 votes

Overall karma indicates overall quality.

18 comments10 min readLW link

Train­ing Trace Pri­ors and Speed Priors

Adam JermynJun 26, 2022, 6:07 PM
17 points

6 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

De­liber­a­tion Every­where: Sim­ple Examples

Oliver SourbutJun 27, 2022, 5:26 PM
14 points

3 votes

Overall karma indicates overall quality.

0 comments15 min readLW link

De­liber­a­tion, Re­ac­tions, and Con­trol: Ten­ta­tive Defi­ni­tions and a Res­tate­ment of In­stru­men­tal Convergence

Oliver SourbutJun 27, 2022, 5:25 PM
10 points

7 votes

Overall karma indicates overall quality.

0 comments11 min readLW link

For­mal Philos­o­phy and Align­ment Pos­si­ble Projects

Daniel HerrmannJun 30, 2022, 10:42 AM
33 points

20 votes

Overall karma indicates overall quality.

5 comments8 min readLW link

Refram­ing the AI Risk

Thane RuthenisJul 1, 2022, 6:44 PM
26 points

11 votes

Overall karma indicates overall quality.

7 comments6 min readLW link

Trends in GPU price-performance

Jul 1, 2022, 3:51 PM
85 points

41 votes

Overall karma indicates overall quality.

10 comments1 min readLW link
(epochai.org)

Fol­low along with Columbia EA’s Ad­vanced AI Safety Fel­low­ship!

RohanSJul 2, 2022, 5:45 PM
3 points

3 votes

Overall karma indicates overall quality.

0 comments2 min readLW link
(forum.effectivealtruism.org)

Can we achieve AGI Align­ment by bal­anc­ing mul­ti­ple hu­man ob­jec­tives?

Ben SmithJul 3, 2022, 2:51 AM
11 points

8 votes

Overall karma indicates overall quality.

1 comment4 min readLW link

We Need a Con­soli­dated List of Bad AI Align­ment Solutions

DoubleJul 4, 2022, 6:54 AM
9 points

8 votes

Overall karma indicates overall quality.

14 comments1 min readLW link

A com­pressed take on re­cent disagreements

kmanJul 4, 2022, 4:39 AM
33 points

13 votes

Overall karma indicates overall quality.

9 comments1 min readLW link

My Most Likely Rea­son to Die Young is AI X-Risk

AISafetyIsNotLongtermistJul 4, 2022, 5:08 PM
61 points

34 votes

Overall karma indicates overall quality.

24 comments4 min readLW link
(forum.effectivealtruism.org)

The cu­ri­ous case of Pretty Good hu­man in­ner/​outer alignment

PavleMihaJul 5, 2022, 7:04 PM
41 points

24 votes

Overall karma indicates overall quality.

45 comments4 min readLW link

In­tro­duc­ing the Fund for Align­ment Re­search (We’re Hiring!)

Jul 6, 2022, 2:07 AM
59 points

27 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Outer vs in­ner mis­al­ign­ment: three framings

Richard_NgoJul 6, 2022, 7:46 PM
43 points

16 votes

Overall karma indicates overall quality.

4 comments9 min readLW link

Re­sponse to Blake Richards: AGI, gen­er­al­ity, al­ign­ment, & loss functions

Steven ByrnesJul 12, 2022, 1:56 PM
59 points

26 votes

Overall karma indicates overall quality.

9 comments15 min readLW link

Goal Align­ment Is Ro­bust To the Sharp Left Turn

Thane RuthenisJul 13, 2022, 8:23 PM
45 points

17 votes

Overall karma indicates overall quality.

15 comments4 min readLW link

De­cep­tion?! I ain’t got time for that!

Paul CologneseJul 18, 2022, 12:06 AM
50 points

17 votes

Overall karma indicates overall quality.

5 comments13 min readLW link

Four ques­tions I ask AI safety researchers

Orpheus16Jul 17, 2022, 5:25 PM
17 points

13 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

A dis­til­la­tion of Evan Hub­inger’s train­ing sto­ries (for SERI MATS)

Daphne_WJul 18, 2022, 3:38 AM
15 points

5 votes

Overall karma indicates overall quality.

1 comment10 min readLW link

Con­di­tion­ing Gen­er­a­tive Models for Alignment

JozdienJul 18, 2022, 7:11 AM
40 points

24 votes

Overall karma indicates overall quality.

8 comments22 min readLW link

In­for­ma­tion the­o­retic model anal­y­sis may not lend much in­sight, but we may have been do­ing them wrong!

Garrett BakerJul 24, 2022, 12:42 AM
7 points

3 votes

Overall karma indicates overall quality.

0 comments10 min readLW link

How to Diver­sify Con­cep­tual Align­ment: the Model Be­hind Refine

adamShimiJul 20, 2022, 10:44 AM
78 points

41 votes

Overall karma indicates overall quality.

11 comments8 min readLW link

Our Ex­ist­ing Solu­tions to AGI Align­ment (semi-safe)

Michael SoareverixJul 21, 2022, 7:00 PM
12 points

9 votes

Overall karma indicates overall quality.

1 comment3 min readLW link

Re­ward is not the op­ti­miza­tion target

TurnTroutJul 25, 2022, 12:03 AM
252 points

120 votes

Overall karma indicates overall quality.

97 comments10 min readLW link

What En­vi­ron­ment Prop­er­ties Select Agents For World-Model­ing?

Thane RuthenisJul 23, 2022, 7:27 PM
24 points

6 votes

Overall karma indicates overall quality.

1 comment12 min readLW link

AGI Safety Needs Peo­ple With All Skil­lsets!

Severin T. SeehrichJul 25, 2022, 1:32 PM
28 points

19 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Con­jec­ture: In­ter­nal In­fo­haz­ard Policy

Jul 29, 2022, 7:07 PM
119 points

59 votes

Overall karma indicates overall quality.

6 comments19 min readLW link

Hu­mans Reflect­ing on HRH

leogaoJul 29, 2022, 9:56 PM
20 points

12 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

[Question] Would “Man­hat­tan Pro­ject” style be benefi­cial or dele­te­ri­ous for AI Align­ment?

Valentin2026Aug 4, 2022, 7:12 PM
5 points

5 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Con­ver­gence Towards World-Models: A Gears-Level Model

Thane RuthenisAug 4, 2022, 11:31 PM
37 points

13 votes

Overall karma indicates overall quality.

1 comment13 min readLW link

How To Go From In­ter­pretabil­ity To Align­ment: Just Re­tar­get The Search

johnswentworthAug 10, 2022, 4:08 PM
143 points

69 votes

Overall karma indicates overall quality.

30 comments3 min readLW link

For­mal­iz­ing Alignment

Marv KAug 10, 2022, 6:50 PM
3 points

2 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

My sum­mary of the al­ign­ment problem

Peter HroššoAug 11, 2022, 7:42 PM
16 points

14 votes

Overall karma indicates overall quality.

3 comments2 min readLW link
(threadreaderapp.com)

Ar­tifi­cial in­tel­li­gence wireheading

Big TonyAug 12, 2022, 3:06 AM
3 points

2 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

In­fant AI Scenario

Nathan1123Aug 12, 2022, 9:20 PM
1 point

2 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

Gra­di­ent de­scent doesn’t se­lect for in­ner search

Ivan VendrovAug 13, 2022, 4:15 AM
36 points

20 votes

Overall karma indicates overall quality.

23 comments4 min readLW link

No short­cuts to knowl­edge: Why AI needs to ease up on scal­ing and learn how to code

YldedlyAug 15, 2022, 8:42 AM
4 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(deoxyribose.github.io)

Mesa-op­ti­miza­tion for goals defined only within a train­ing en­vi­ron­ment is dangerous

Rubi J. HudsonAug 17, 2022, 3:56 AM
6 points

4 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

The longest train­ing run

Aug 17, 2022, 5:18 PM
68 points

36 votes

Overall karma indicates overall quality.

11 comments9 min readLW link
(epochai.org)

Matt Ygle­sias on AI Policy

Grant DemareeAug 17, 2022, 11:57 PM
25 points

14 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(www.slowboring.com)

Epistemic Arte­facts of (con­cep­tual) AI al­ign­ment research

Aug 19, 2022, 5:18 PM
30 points

14 votes

Overall karma indicates overall quality.

1 comment5 min readLW link

A Bite Sized In­tro­duc­tion to ELK

Luk27182Sep 17, 2022, 12:28 AM
5 points

4 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

Bench­mark­ing Pro­pos­als on Risk Scenarios

Paul BricmanAug 20, 2022, 10:01 AM
25 points

10 votes

Overall karma indicates overall quality.

2 comments14 min readLW link

The ‘Bit­ter Les­son’ is Wrong

deepthoughtlifeAug 20, 2022, 4:15 PM
−9 points

13 votes

Overall karma indicates overall quality.

14 comments2 min readLW link

My Plan to Build Aligned Superintelligence

apollonianbluesAug 21, 2022, 1:16 PM
18 points

10 votes

Overall karma indicates overall quality.

7 comments8 min readLW link

Beliefs and Disagree­ments about Au­tomat­ing Align­ment Research

Ian McKenzieAug 24, 2022, 6:37 PM
92 points

39 votes

Overall karma indicates overall quality.

4 comments7 min readLW link

Google AI in­te­grates PaLM with robotics: SayCan up­date [Linkpost]

Evan R. MurphyAug 24, 2022, 8:54 PM
25 points

8 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(sites.research.google)

The Shard The­ory Align­ment Scheme

David UdellAug 25, 2022, 4:52 AM
47 points

18 votes

Overall karma indicates overall quality.

33 comments2 min readLW link

[Question] What would you ex­pect a mas­sive mul­ti­modal on­line fed­er­ated learner to be ca­pa­ble of?

Aryeh EnglanderAug 27, 2022, 5:31 PM
13 points

6 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

(My un­der­stand­ing of) What Every­one in Tech­ni­cal Align­ment is Do­ing and Why

Aug 29, 2022, 1:23 AM
345 points

191 votes

Overall karma indicates overall quality.

83 comments38 min readLW link

Break­ing down the train­ing/​de­ploy­ment dichotomy

Erik JennerAug 28, 2022, 9:45 PM
29 points

14 votes

Overall karma indicates overall quality.

4 comments3 min readLW link

Strat­egy For Con­di­tion­ing Gen­er­a­tive Models

Sep 1, 2022, 4:34 AM
28 points

11 votes

Overall karma indicates overall quality.

4 comments18 min readLW link

Gra­di­ent Hacker De­sign Prin­ci­ples From Biology

johnswentworthSep 1, 2022, 7:03 PM
52 points

19 votes

Overall karma indicates overall quality.

13 comments3 min readLW link

No, hu­man brains are not (much) more effi­cient than computers

Jesse HooglandSep 6, 2022, 1:53 PM
19 points

12 votes

Overall karma indicates overall quality.

16 comments4 min readLW link
(www.jessehoogland.com)

Can “Re­ward Eco­nomics” solve AI Align­ment?

Q HomeSep 7, 2022, 7:58 AM
3 points

4 votes

Overall karma indicates overall quality.

15 comments18 min readLW link

Gen­er­a­tors Of Disagree­ment With AI Alignment

George3d6Sep 7, 2022, 6:15 PM
26 points

14 votes

Overall karma indicates overall quality.

9 comments9 min readLW link
(www.epistem.ink)

Search­ing for Mo­du­lar­ity in Large Lan­guage Models

Sep 8, 2022, 2:25 AM
43 points

19 votes

Overall karma indicates overall quality.

3 comments14 min readLW link

We may be able to see sharp left turns coming

Sep 3, 2022, 2:55 AM
50 points

41 votes

Overall karma indicates overall quality.

26 comments1 min readLW link

Gate­keeper Vic­tory: AI Box Reflection

Sep 9, 2022, 9:38 PM
4 points

3 votes

Overall karma indicates overall quality.

5 comments9 min readLW link

Can you force a neu­ral net­work to keep gen­er­al­iz­ing?

Q HomeSep 12, 2022, 10:14 AM
2 points

3 votes

Overall karma indicates overall quality.

10 comments5 min readLW link

Align­ment via proso­cial brain algorithms

Cameron BergSep 12, 2022, 1:48 PM
42 points

18 votes

Overall karma indicates overall quality.

28 comments6 min readLW link

[Linkpost] A sur­vey on over 300 works about in­ter­pretabil­ity in deep networks

scasperSep 12, 2022, 7:07 PM
96 points

47 votes

Overall karma indicates overall quality.

7 comments2 min readLW link
(arxiv.org)

Try­ing to find the un­der­ly­ing struc­ture of com­pu­ta­tional systems

Matthias G. MayerSep 13, 2022, 9:16 PM
17 points

12 votes

Overall karma indicates overall quality.

9 comments4 min readLW link

[Question] Are Speed Su­per­in­tel­li­gences Fea­si­ble for Modern ML Tech­niques?

DragonGodSep 14, 2022, 12:59 PM
8 points

5 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

The Defen­der’s Ad­van­tage of Interpretability

Marius HobbhahnSep 14, 2022, 2:05 PM
41 points

18 votes

Overall karma indicates overall quality.

4 comments6 min readLW link

When does tech­ni­cal work to re­duce AGI con­flict make a differ­ence?: Introduction

Sep 14, 2022, 7:38 PM
42 points

22 votes

Overall karma indicates overall quality.

3 comments6 min readLW link

ACT-1: Trans­former for Actions

Daniel KokotajloSep 14, 2022, 7:09 PM
52 points

22 votes

Overall karma indicates overall quality.

4 comments1 min readLW link
(www.adept.ai)

[Question] Fore­cast­ing thread: How does AI risk level vary based on timelines?

eliflandSep 14, 2022, 11:56 PM
33 points

13 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

Gen­eral ad­vice for tran­si­tion­ing into The­o­ret­i­cal AI Safety

Martín SotoSep 15, 2022, 5:23 AM
9 points

6 votes

Overall karma indicates overall quality.

0 comments10 min readLW link

Why de­cep­tive al­ign­ment mat­ters for AGI safety

Marius HobbhahnSep 15, 2022, 1:38 PM
48 points

31 votes

Overall karma indicates overall quality.

12 comments13 min readLW link

Un­der­stand­ing Con­jec­ture: Notes from Con­nor Leahy interview

Orpheus16Sep 15, 2022, 6:37 PM
103 points

39 votes

Overall karma indicates overall quality.

24 comments15 min readLW link

or­der­ing ca­pa­bil­ity thresholds

Tamsin LeakeSep 16, 2022, 4:36 PM
27 points

9 votes

Overall karma indicates overall quality.

0 comments4 min readLW link
(carado.moe)

Levels of goals and alignment

zeshenSep 16, 2022, 4:44 PM
27 points

14 votes

Overall karma indicates overall quality.

4 comments6 min readLW link

Katja Grace on Slow­ing Down AI, AI Ex­pert Sur­veys And Es­ti­mat­ing AI Risk

Michaël TrazziSep 16, 2022, 5:45 PM
40 points

19 votes

Overall karma indicates overall quality.

2 comments3 min readLW link
(theinsideview.ai)

Sum­maries: Align­ment Fun­da­men­tals Curriculum

Leon LangSep 18, 2022, 1:08 PM
43 points

28 votes

Overall karma indicates overall quality.

3 comments1 min readLW link
(docs.google.com)

Lev­er­ag­ing Le­gal In­for­mat­ics to Align AI

John NaySep 18, 2022, 8:39 PM
11 points

4 votes

Overall karma indicates overall quality.

0 comments3 min readLW link
(forum.effectivealtruism.org)

Align­ment Org Cheat Sheet

Sep 20, 2022, 5:36 PM
63 points

46 votes

Overall karma indicates overall quality.

6 comments4 min readLW link

Public-fac­ing Cen­sor­ship Is Safety Theater, Caus­ing Rep­u­ta­tional Da­m­age

YitzSep 23, 2022, 5:08 AM
144 points

66 votes

Overall karma indicates overall quality.

42 comments6 min readLW link

Nearcast-based “de­ploy­ment prob­lem” analysis

HoldenKarnofskySep 21, 2022, 6:52 PM
78 points

24 votes

Overall karma indicates overall quality.

2 comments26 min readLW link

Math­e­mat­i­cal Cir­cuits in Neu­ral Networks

Sean OsierSep 22, 2022, 3:48 AM
34 points

18 votes

Overall karma indicates overall quality.

4 comments1 min readLW link
(www.youtube.com)

Un­der­stand­ing In­fra-Bayesi­anism: A Begin­ner-Friendly Video Series

Sep 22, 2022, 1:25 PM
114 points

58 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

In­ter­lude: But Who Op­ti­mizes The Op­ti­mizer?

Paul BricmanSep 23, 2022, 3:30 PM
15 points

3 votes

Overall karma indicates overall quality.

0 comments10 min readLW link

[Question] What Do AI Safety Pitches Not Get About Your Field?

ArisSep 22, 2022, 9:27 PM
28 points

8 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Let’s Com­pare Notes

Shoshannah TekofskySep 22, 2022, 8:47 PM
17 points

11 votes

Overall karma indicates overall quality.

3 comments6 min readLW link

Brain-over-body bi­ases, and the em­bod­ied value prob­lem in AI alignment

geoffreymillerSep 24, 2022, 10:24 PM
10 points

11 votes

Overall karma indicates overall quality.

6 comments25 min readLW link

Brief Notes on Transformers

Adam JermynSep 26, 2022, 2:46 PM
32 points

16 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

You are Un­der­es­ti­mat­ing The Like­li­hood That Con­ver­gent In­stru­men­tal Sub­goals Lead to Aligned AGI

Mark NeyerSep 26, 2022, 2:22 PM
3 points

15 votes

Overall karma indicates overall quality.

6 comments3 min readLW link

7 traps that (we think) new al­ign­ment re­searchers of­ten fall into

Sep 27, 2022, 11:13 PM
157 points

86 votes

Overall karma indicates overall quality.

10 comments4 min readLW link

Threat-Re­sis­tant Bar­gain­ing Me­ga­post: In­tro­duc­ing the ROSE Value

DiffractorSep 28, 2022, 1:20 AM
89 points

28 votes

Overall karma indicates overall quality.

11 comments53 min readLW link

Failure modes in a shard the­ory al­ign­ment plan

Thomas KwaSep 27, 2022, 10:34 PM
24 points

11 votes

Overall karma indicates overall quality.

2 comments7 min readLW link

QAPR 3: in­ter­pretabil­ity-guided train­ing of neu­ral nets

Quintin PopeSep 28, 2022, 4:02 PM
47 points

18 votes

Overall karma indicates overall quality.

2 comments10 min readLW link

[Question] What’s the ac­tual ev­i­dence that AI mar­ket­ing tools are chang­ing prefer­ences in a way that makes them eas­ier to pre­dict?

EmrikOct 1, 2022, 3:21 PM
10 points

6 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

[Question] Any fur­ther work on AI Safety Suc­cess Sto­ries?

KriegerOct 2, 2022, 9:53 AM
7 points

5 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

AI Timelines via Cu­mu­la­tive Op­ti­miza­tion Power: Less Long, More Short

jacob_cannellOct 6, 2022, 12:21 AM
111 points

60 votes

Overall karma indicates overall quality.

32 comments6 min readLW link

con­fu­sion about al­ign­ment requirements

Tamsin LeakeOct 6, 2022, 10:32 AM
28 points

11 votes

Overall karma indicates overall quality.

10 comments3 min readLW link
(carado.moe)

Good on­tolo­gies in­duce com­mu­ta­tive diagrams

Erik JennerOct 9, 2022, 12:06 AM
40 points

18 votes

Overall karma indicates overall quality.

5 comments14 min readLW link

Un­con­trol­lable AI as an Ex­is­ten­tial Risk

Karl von WendtOct 9, 2022, 10:36 AM
19 points

15 votes

Overall karma indicates overall quality.

0 comments20 min readLW link

Ob­jects in Mir­ror Are Closer Than They Ap­pear...

VestoziaOct 11, 2022, 4:34 AM
2 points

5 votes

Overall karma indicates overall quality.

7 comments9 min readLW link

Misal­ign­ment Harms Can Be Caused by Low In­tel­li­gence Systems

DialecticEelOct 11, 2022, 1:39 PM
11 points

6 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Build­ing a trans­former from scratch—AI safety up-skil­ling challenge

Marius HobbhahnOct 12, 2022, 3:40 PM
42 points

21 votes

Overall karma indicates overall quality.

1 comment5 min readLW link

Help out Red­wood Re­search’s in­ter­pretabil­ity team by find­ing heuris­tics im­ple­mented by GPT-2 small

Oct 12, 2022, 9:25 PM
49 points

25 votes

Overall karma indicates overall quality.

11 comments4 min readLW link

Science of Deep Learn­ing—a tech­ni­cal agenda

Marius HobbhahnOct 18, 2022, 2:54 PM
35 points

19 votes

Overall karma indicates overall quality.

7 comments4 min readLW link

Re­sponse to Katja Grace’s AI x-risk counterarguments

Oct 19, 2022, 1:17 AM
75 points

34 votes

Overall karma indicates overall quality.

18 comments15 min readLW link

[Question] What Does AI Align­ment Suc­cess Look Like?

ShmiOct 20, 2022, 12:32 AM
23 points

8 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

AI Re­search Pro­gram Pre­dic­tion Markets

tailcalledOct 20, 2022, 1:42 PM
38 points

18 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

Learn­ing so­cietal val­ues from law as part of an AGI al­ign­ment strategy

John NayOct 21, 2022, 2:03 AM
3 points

12 votes

Overall karma indicates overall quality.

18 comments54 min readLW link

Im­proved Se­cu­rity to Prevent Hacker-AI and Digi­tal Ghosts

Erland WittkotterOct 21, 2022, 10:11 AM
4 points

7 votes

Overall karma indicates overall quality.

3 comments12 min readLW link

What will the scaled up GATO look like? (Up­dated with ques­tions)

Amal Oct 25, 2022, 12:44 PM
33 points

20 votes

Overall karma indicates overall quality.

20 comments1 min readLW link

In­tent al­ign­ment should not be the goal for AGI x-risk reduction

John NayOct 26, 2022, 1:24 AM
−6 points

10 votes

Overall karma indicates overall quality.

10 comments3 min readLW link

Re­sources that (I think) new al­ign­ment re­searchers should know about

Orpheus16Oct 28, 2022, 10:13 PM
69 points

38 votes

Overall karma indicates overall quality.

8 comments4 min readLW link

Boundaries vs Frames

Scott GarrabrantOct 31, 2022, 3:14 PM
47 points

15 votes

Overall karma indicates overall quality.

7 comments7 min readLW link

Ad­ver­sar­ial Poli­cies Beat Pro­fes­sional-Level Go AIs

sanxiynNov 3, 2022, 1:27 PM
31 points

16 votes

Overall karma indicates overall quality.

35 comments1 min readLW link
(goattack.alignmentfund.org)

The Sin­gu­lar Value De­com­po­si­tions of Trans­former Weight Ma­tri­ces are Highly Interpretable

Nov 28, 2022, 12:54 PM
159 points

79 votes

Overall karma indicates overall quality.

27 comments31 min readLW link

Sim­ple Way to Prevent Power-Seek­ing AI

research_prime_spaceDec 7, 2022, 12:26 AM
7 points

3 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

You can still fetch the coffee to­day if you’re dead tomorrow

davidadDec 9, 2022, 2:06 PM
58 points

28 votes

Overall karma indicates overall quality.

15 comments5 min readLW link

Ex­tract­ing and Eval­u­at­ing Causal Direc­tion in LLMs’ Activations

Dec 14, 2022, 2:33 PM
22 points

12 votes

Overall karma indicates overall quality.

2 comments11 min readLW link

Real­ism about rationality

Richard_NgoSep 16, 2018, 10:46 AM
180 points

89 votes

Overall karma indicates overall quality.

145 comments4 min readLW link3 reviews
(thinkingcomplete.blogspot.com)

De­bate on In­stru­men­tal Con­ver­gence be­tween LeCun, Rus­sell, Ben­gio, Zador, and More

Ben PaceOct 4, 2019, 4:08 AM
205 points

90 votes

Overall karma indicates overall quality.

60 comments15 min readLW link2 reviews

The Parable of Pre­dict-O-Matic

abramdemskiOct 15, 2019, 12:49 AM
291 points

133 votes

Overall karma indicates overall quality.

42 comments14 min readLW link2 reviews

2018 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

LarksDec 18, 2018, 4:46 AM
190 points

64 votes

Overall karma indicates overall quality.

26 comments62 min readLW link1 review

An Ortho­dox Case Against Utility Functions

abramdemskiApr 7, 2020, 7:18 PM
128 points

54 votes

Overall karma indicates overall quality.

53 comments8 min readLW link2 reviews

“How con­ser­va­tive” should the par­tial max­imisers be?

Stuart_ArmstrongApr 13, 2020, 3:50 PM
30 points

9 votes

Overall karma indicates overall quality.

8 comments2 min readLW link

[AN #95]: A frame­work for think­ing about how to make AI go well

Rohin ShahApr 15, 2020, 5:10 PM
20 points

6 votes

Overall karma indicates overall quality.

2 comments10 min readLW link
(mailchi.mp)

AI Align­ment Pod­cast: An Overview of Tech­ni­cal AI Align­ment in 2018 and 2019 with Buck Sh­legeris and Ro­hin Shah

Palus AstraApr 16, 2020, 12:50 AM
58 points

15 votes

Overall karma indicates overall quality.

27 comments89 min readLW link

Open ques­tion: are min­i­mal cir­cuits dae­mon-free?

paulfchristianoMay 5, 2018, 10:40 PM
81 points

41 votes

Overall karma indicates overall quality.

70 comments2 min readLW link1 review

Disen­tan­gling ar­gu­ments for the im­por­tance of AI safety

Richard_NgoJan 21, 2019, 12:41 PM
129 points

50 votes

Overall karma indicates overall quality.

23 comments8 min readLW link

In­te­grat­ing Hid­den Vari­ables Im­proves Approximation

johnswentworthApr 16, 2020, 9:43 PM
15 points

3 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

AI Ser­vices as a Re­search Paradigm

VojtaKovarikApr 20, 2020, 1:00 PM
30 points

10 votes

Overall karma indicates overall quality.

12 comments4 min readLW link
(docs.google.com)

Databases of hu­man be­havi­our and prefer­ences?

Stuart_ArmstrongApr 21, 2020, 6:06 PM
10 points

2 votes

Overall karma indicates overall quality.

9 comments1 min readLW link

Critch on ca­reer ad­vice for ju­nior AI-x-risk-con­cerned researchers

Rob BensingerMay 12, 2018, 2:13 AM
117 points

86 votes

Overall karma indicates overall quality.

25 comments4 min readLW link

Refram­ing Impact

TurnTroutSep 20, 2019, 7:03 PM
90 points

41 votes

Overall karma indicates overall quality.

15 comments3 min readLW link1 review

De­scrip­tion vs simu­lated prediction

Richard Korzekwa Apr 22, 2020, 4:40 PM
26 points

7 votes

Overall karma indicates overall quality.

0 comments5 min readLW link
(aiimpacts.org)

Deep­Mind team on speci­fi­ca­tion gaming

JoshuaFoxApr 23, 2020, 8:01 AM
30 points

10 votes

Overall karma indicates overall quality.

2 comments1 min readLW link
(deepmind.com)

[Question] Does Agent-like Be­hav­ior Im­ply Agent-like Ar­chi­tec­ture?

Scott GarrabrantAug 23, 2019, 2:01 AM
54 points

23 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

Risks from Learned Op­ti­miza­tion: Con­clu­sion and Re­lated Work

Jun 7, 2019, 7:53 PM
78 points

26 votes

Overall karma indicates overall quality.

4 comments6 min readLW link

De­cep­tive Alignment

Jun 5, 2019, 8:16 PM
97 points

34 votes

Overall karma indicates overall quality.

11 comments17 min readLW link

The In­ner Align­ment Problem

Jun 4, 2019, 1:20 AM
99 points

35 votes

Overall karma indicates overall quality.

17 comments13 min readLW link

How the MtG Color Wheel Ex­plains AI Safety

Scott GarrabrantFeb 15, 2019, 11:42 PM
57 points

32 votes

Overall karma indicates overall quality.

4 comments6 min readLW link

[Question] How does Gra­di­ent Des­cent In­ter­act with Good­hart?

Scott GarrabrantFeb 2, 2019, 12:14 AM
68 points

21 votes

Overall karma indicates overall quality.

19 comments4 min readLW link

For­mal Open Prob­lem in De­ci­sion Theory

Scott GarrabrantNov 29, 2018, 3:25 AM
35 points

19 votes

Overall karma indicates overall quality.

11 comments4 min readLW link

The Ubiquitous Con­verse Law­vere Problem

Scott GarrabrantNov 29, 2018, 3:16 AM
21 points

10 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Embed­ded Curiosities

Nov 8, 2018, 2:19 PM
88 points

40 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

Sub­sys­tem Alignment

Nov 6, 2018, 4:16 PM
100 points

38 votes

Overall karma indicates overall quality.

12 comments1 min readLW link

Ro­bust Delegation

Nov 4, 2018, 4:38 PM
110 points

40 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

Embed­ded World-Models

Nov 2, 2018, 4:07 PM
87 points

30 votes

Overall karma indicates overall quality.

16 comments1 min readLW link

De­ci­sion Theory

Oct 31, 2018, 6:41 PM
114 points

47 votes

Overall karma indicates overall quality.

46 comments1 min readLW link

(A → B) → A

Scott GarrabrantSep 11, 2018, 10:38 PM
62 points

31 votes

Overall karma indicates overall quality.

11 comments2 min readLW link

His­tory of the Devel­op­ment of Log­i­cal Induction

Scott GarrabrantAug 29, 2018, 3:15 AM
89 points

35 votes

Overall karma indicates overall quality.

4 comments5 min readLW link

Op­ti­miza­tion Amplifies

Scott GarrabrantJun 27, 2018, 1:51 AM
98 points

42 votes

Overall karma indicates overall quality.

12 comments4 min readLW link

What makes coun­ter­fac­tu­als com­pa­rable?

Chris_LeongApr 24, 2020, 10:47 PM
11 points

3 votes

Overall karma indicates overall quality.

6 comments3 min readLW link

New Paper Ex­pand­ing on the Good­hart Taxonomy

Scott GarrabrantMar 14, 2018, 9:01 AM
17 points

12 votes

Overall karma indicates overall quality.

4 comments1 min readLW link
(arxiv.org)

Sources of in­tu­itions and data on AGI

Scott GarrabrantJan 31, 2018, 11:30 PM
84 points

55 votes

Overall karma indicates overall quality.

26 comments3 min readLW link

Corrigibility

paulfchristianoNov 27, 2018, 9:50 PM
52 points

15 votes

Overall karma indicates overall quality.

7 comments6 min readLW link

AI pre­dic­tion case study 5: Omo­hun­dro’s AI drives

Stuart_ArmstrongMar 15, 2013, 9:09 AM
10 points

10 votes

Overall karma indicates overall quality.

5 comments8 min readLW link

Toy model: con­ver­gent in­stru­men­tal goals

Stuart_ArmstrongFeb 25, 2016, 2:03 PM
15 points

9 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

AI-cre­ated pseudo-deontology

Stuart_ArmstrongFeb 12, 2015, 9:11 PM
10 points

9 votes

Overall karma indicates overall quality.

35 comments1 min readLW link

Eth­i­cal Injunctions

Eliezer YudkowskyOct 20, 2008, 11:00 PM
66 points

49 votes

Overall karma indicates overall quality.

76 comments9 min readLW link

Mo­ti­vat­ing Ab­strac­tion-First De­ci­sion Theory

johnswentworthApr 29, 2020, 5:47 PM
42 points

15 votes

Overall karma indicates overall quality.

16 comments5 min readLW link

[AN #97]: Are there his­tor­i­cal ex­am­ples of large, ro­bust dis­con­ti­nu­ities?

Rohin ShahApr 29, 2020, 5:30 PM
15 points

5 votes

Overall karma indicates overall quality.

0 comments10 min readLW link
(mailchi.mp)

My Up­dat­ing Thoughts on AI policy

Ben PaceMar 1, 2020, 7:06 AM
20 points

10 votes

Overall karma indicates overall quality.

1 comment9 min readLW link

Use­ful Does Not Mean Secure

Ben PaceNov 30, 2019, 2:05 AM
46 points

16 votes

Overall karma indicates overall quality.

12 comments11 min readLW link

[Question] What is the al­ter­na­tive to in­tent al­ign­ment called?

Richard_NgoApr 30, 2020, 2:16 AM
12 points

4 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Op­ti­mis­ing So­ciety to Con­strain Risk of War from an Ar­tifi­cial Su­per­in­tel­li­gence

JohnCDraperApr 30, 2020, 10:47 AM
3 points

2 votes

Overall karma indicates overall quality.

1 comment51 min readLW link

Stan­ford En­cy­clo­pe­dia of Philos­o­phy on AI ethics and superintelligence

Kaj_SotalaMay 2, 2020, 7:35 AM
43 points

19 votes

Overall karma indicates overall quality.

19 comments7 min readLW link
(plato.stanford.edu)

[Question] How does iter­ated am­plifi­ca­tion ex­ceed hu­man abil­ities?

riceissaMay 2, 2020, 11:44 PM
19 points

6 votes

Overall karma indicates overall quality.

9 comments2 min readLW link

How uniform is the neo­cor­tex?

zhukeepaMay 4, 2020, 2:16 AM
78 points

37 votes

Overall karma indicates overall quality.

23 comments11 min readLW link1 review

Scott Garrabrant’s prob­lem on re­cov­er­ing Brouwer as a corol­lary of Lawvere

RupertMay 4, 2020, 10:01 AM
26 points

11 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

“AI and Effi­ciency”, OA (44✕ im­prove­ment in CNNs since 2012)

gwernMay 5, 2020, 4:32 PM
47 points

14 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(openai.com)

Com­pet­i­tive safety via gra­dated curricula

Richard_NgoMay 5, 2020, 6:11 PM
38 points

14 votes

Overall karma indicates overall quality.

5 comments5 min readLW link

Model­ing nat­u­ral­ized de­ci­sion prob­lems in lin­ear logic

jessicataMay 6, 2020, 12:15 AM
14 points

5 votes

Overall karma indicates overall quality.

2 comments6 min readLW link
(unstableontology.com)

[AN #98]: Un­der­stand­ing neu­ral net train­ing by see­ing which gra­di­ents were helpful

Rohin ShahMay 6, 2020, 5:10 PM
22 points

6 votes

Overall karma indicates overall quality.

3 comments9 min readLW link
(mailchi.mp)

[Question] Is AI safety re­search less par­alleliz­able than AI re­search?

Mati_RoyMay 10, 2020, 8:43 PM
9 points

3 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Thoughts on im­ple­ment­ing cor­rigible ro­bust alignment

Steven ByrnesNov 26, 2019, 2:06 PM
26 points

8 votes

Overall karma indicates overall quality.

2 comments6 min readLW link

Wire­head­ing is in the eye of the beholder

Stuart_ArmstrongJan 30, 2019, 6:23 PM
26 points

11 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

Wire­head­ing as a po­ten­tial prob­lem with the new im­pact measure

Stuart_ArmstrongSep 25, 2018, 2:15 PM
25 points

8 votes

Overall karma indicates overall quality.

20 comments4 min readLW link

Wire­head­ing and discontinuity

Michele CampoloFeb 18, 2020, 10:49 AM
21 points

6 votes

Overall karma indicates overall quality.

4 comments3 min readLW link

[AN #99]: Dou­bling times for the effi­ciency of AI algorithms

Rohin ShahMay 13, 2020, 5:20 PM
29 points

10 votes

Overall karma indicates overall quality.

0 comments10 min readLW link
(mailchi.mp)

How should AIs up­date a prior over hu­man prefer­ences?

Stuart_ArmstrongMay 15, 2020, 1:14 PM
17 points

5 votes

Overall karma indicates overall quality.

9 comments2 min readLW link

Con­jec­ture Workshop

johnswentworthMay 15, 2020, 10:41 PM
34 points

10 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

Multi-agent safety

Richard_NgoMay 16, 2020, 1:59 AM
31 points

19 votes

Overall karma indicates overall quality.

8 comments5 min readLW link

The Mechanis­tic and Nor­ma­tive Struc­ture of Agency

Gordon Seidoh WorleyMay 18, 2020, 4:03 PM
15 points

6 votes

Overall karma indicates overall quality.

4 comments1 min readLW link
(philpapers.org)

“Star­wink” by Alicorn

Zack_M_DavisMay 18, 2020, 8:17 AM
44 points

16 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(alicorn.elcenia.com)

[AN #100]: What might go wrong if you learn a re­ward func­tion while acting

Rohin ShahMay 20, 2020, 5:30 PM
33 points

8 votes

Overall karma indicates overall quality.

2 comments12 min readLW link
(mailchi.mp)

Prob­a­bil­ities, weights, sums: pretty much the same for re­ward functions

Stuart_ArmstrongMay 20, 2020, 3:19 PM
11 points

2 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

[Question] Source code size vs learned model size in ML and in hu­mans?

riceissaMay 20, 2020, 8:47 AM
11 points

5 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Com­par­ing re­ward learn­ing/​re­ward tam­per­ing formalisms

Stuart_ArmstrongMay 21, 2020, 12:03 PM
9 points

1 vote

Overall karma indicates overall quality.

3 comments3 min readLW link

AGIs as collectives

Richard_NgoMay 22, 2020, 8:36 PM
22 points

14 votes

Overall karma indicates overall quality.

23 comments4 min readLW link

[AN #101]: Why we should rigor­ously mea­sure and fore­cast AI progress

Rohin ShahMay 27, 2020, 5:20 PM
15 points

6 votes

Overall karma indicates overall quality.

0 comments10 min readLW link
(mailchi.mp)

AI Safety Dis­cus­sion Days

Linda LinseforsMay 27, 2020, 4:54 PM
13 points

7 votes

Overall karma indicates overall quality.

1 comment3 min readLW link

Build­ing brain-in­spired AGI is in­finitely eas­ier than un­der­stand­ing the brain

Steven ByrnesJun 2, 2020, 2:13 PM
51 points

23 votes

Overall karma indicates overall quality.

14 comments7 min readLW link

Spar­sity and in­ter­pretabil­ity?

Jun 1, 2020, 1:25 PM
41 points

16 votes

Overall karma indicates overall quality.

3 comments7 min readLW link

GPT-3: A Summary

leogaoJun 2, 2020, 6:14 PM
20 points

9 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(leogao.dev)

Inac­cessible information

paulfchristianoJun 3, 2020, 5:10 AM
84 points

27 votes

Overall karma indicates overall quality.

17 comments14 min readLW link2 reviews
(ai-alignment.com)

[AN #102]: Meta learn­ing by GPT-3, and a list of full pro­pos­als for AI alignment

Rohin ShahJun 3, 2020, 5:20 PM
38 points

11 votes

Overall karma indicates overall quality.

6 comments10 min readLW link
(mailchi.mp)

Feed­back is cen­tral to agency

Alex FlintJun 1, 2020, 12:56 PM
28 points

12 votes

Overall karma indicates overall quality.

1 comment3 min readLW link

Think­ing About Su­per-Hu­man AI: An Ex­am­i­na­tion of Likely Paths and Ul­ti­mate Constitution

meanderingmooseJun 4, 2020, 11:22 PM
−3 points

3 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

Emer­gence and Con­trol: An ex­am­i­na­tion of our abil­ity to gov­ern the be­hav­ior of in­tel­li­gent systems

meanderingmooseJun 5, 2020, 5:10 PM
1 point

1 vote

Overall karma indicates overall quality.

0 comments6 min readLW link

GAN Discrim­i­na­tors Don’t Gen­er­al­ize?

tryactionsJun 8, 2020, 8:36 PM
18 points

6 votes

Overall karma indicates overall quality.

7 comments2 min readLW link

More on dis­am­biguat­ing “dis­con­ti­nu­ity”

Aryeh EnglanderJun 9, 2020, 3:16 PM
16 points

7 votes

Overall karma indicates overall quality.

1 comment3 min readLW link

[AN #103]: ARCHES: an agenda for ex­is­ten­tial safety, and com­bin­ing nat­u­ral lan­guage with deep RL

Rohin ShahJun 10, 2020, 5:20 PM
27 points

10 votes

Overall karma indicates overall quality.

1 comment10 min readLW link
(mailchi.mp)

Dutch-Book­ing CDT: Re­vised Argument

abramdemskiOct 27, 2020, 4:31 AM
50 points

14 votes

Overall karma indicates overall quality.

22 comments16 min readLW link

[Question] List of pub­lic pre­dic­tions of what GPT-X can or can’t do?

Daniel KokotajloJun 14, 2020, 2:25 PM
20 points

11 votes

Overall karma indicates overall quality.

9 comments1 min readLW link

Achiev­ing AI al­ign­ment through de­liber­ate un­cer­tainty in mul­ti­a­gent systems

Florian DietzJun 15, 2020, 12:19 PM
3 points

2 votes

Overall karma indicates overall quality.

10 comments7 min readLW link

Su­per­ex­po­nen­tial His­toric Growth, by David Roodman

Ben PaceJun 15, 2020, 9:49 PM
43 points

14 votes

Overall karma indicates overall quality.

6 comments5 min readLW link
(www.openphilanthropy.org)

Re­lat­ing HCH and Log­i­cal Induction

abramdemskiJun 16, 2020, 10:08 PM
47 points

11 votes

Overall karma indicates overall quality.

4 comments5 min readLW link

Image GPT

Daniel KokotajloJun 18, 2020, 11:41 AM
29 points

14 votes

Overall karma indicates overall quality.

27 comments1 min readLW link
(openai.com)

[AN #104]: The per­ils of in­ac­cessible in­for­ma­tion, and what we can learn about AI al­ign­ment from COVID

Rohin ShahJun 18, 2020, 5:10 PM
19 points

7 votes

Overall karma indicates overall quality.

5 comments8 min readLW link
(mailchi.mp)

[Question] If AI is based on GPT, how to en­sure its safety?

avturchinJun 18, 2020, 8:33 PM
20 points

6 votes

Overall karma indicates overall quality.

11 comments1 min readLW link

What’s Your Cog­ni­tive Al­gorithm?

RaemonJun 18, 2020, 10:16 PM
71 points

23 votes

Overall karma indicates overall quality.

23 comments13 min readLW link

Rele­vant pre-AGI possibilities

Daniel KokotajloJun 20, 2020, 10:52 AM
38 points

13 votes

Overall karma indicates overall quality.

7 comments19 min readLW link
(aiimpacts.org)

Plau­si­ble cases for HRAD work, and lo­cat­ing the crux in the “re­al­ism about ra­tio­nal­ity” debate

riceissaJun 22, 2020, 1:10 AM
85 points

28 votes

Overall karma indicates overall quality.

15 comments10 min readLW link

The In­dex­ing Problem

johnswentworthJun 22, 2020, 7:11 PM
35 points

8 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

[Question] Re­quest­ing feed­back/​ad­vice: what Type The­ory to study for AI safety?

rvnntJun 23, 2020, 5:03 PM
7 points

3 votes

Overall karma indicates overall quality.

4 comments3 min readLW link

Lo­cal­ity of goals

adamShimiJun 22, 2020, 9:56 PM
16 points

7 votes

Overall karma indicates overall quality.

8 comments6 min readLW link

[Question] What is “In­stru­men­tal Cor­rigi­bil­ity”?

joebernsteinJun 23, 2020, 8:24 PM
4 points

3 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Models, myths, dreams, and Cheshire cat grins

Stuart_ArmstrongJun 24, 2020, 10:50 AM
21 points

8 votes

Overall karma indicates overall quality.

7 comments2 min readLW link

[AN #105]: The eco­nomic tra­jec­tory of hu­man­ity, and what we might mean by optimization

Rohin ShahJun 24, 2020, 5:30 PM
24 points

7 votes

Overall karma indicates overall quality.

3 comments11 min readLW link
(mailchi.mp)

There’s an Awe­some AI Ethics List and it’s a lit­tle thin

AABoylesJun 25, 2020, 1:43 PM
13 points

5 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(github.com)

GPT-3 Fic­tion Samples

gwernJun 25, 2020, 4:12 PM
63 points

21 votes

Overall karma indicates overall quality.

18 comments1 min readLW link
(www.gwern.net)

Walk­through: The Trans­former Ar­chi­tec­ture [Part 1/​2]

Matthew BarnettJul 30, 2019, 1:54 PM
35 points

15 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

Ro­bust­ness as a Path to AI Alignment

abramdemskiOct 10, 2017, 8:14 AM
45 points

23 votes

Overall karma indicates overall quality.

9 comments9 min readLW link

Rad­i­cal Prob­a­bil­ism [Tran­script]

Jun 26, 2020, 10:14 PM
46 points

15 votes

Overall karma indicates overall quality.

12 comments6 min readLW link

AI safety via mar­ket making

evhubJun 26, 2020, 11:07 PM
55 points

23 votes

Overall karma indicates overall quality.

45 comments11 min readLW link

[Question] Have gen­eral de­com­posers been for­mal­ized?

QuinnJun 27, 2020, 6:09 PM
8 points

4 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Gary Mar­cus vs Cor­ti­cal Uniformity

Steven ByrnesJun 28, 2020, 6:18 PM
18 points

11 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

Web AI dis­cus­sion Groups

Donald HobsonJun 30, 2020, 11:22 AM
11 points

5 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Com­par­ing AI Align­ment Ap­proaches to Min­i­mize False Pos­i­tive Risk

Gordon Seidoh WorleyJun 30, 2020, 7:34 PM
5 points

2 votes

Overall karma indicates overall quality.

0 comments9 min readLW link

AvE: As­sis­tance via Empowerment

FactorialCodeJun 30, 2020, 10:07 PM
12 points

2 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(arxiv.org)

Evan Hub­inger on In­ner Align­ment, Outer Align­ment, and Pro­pos­als for Build­ing Safe Ad­vanced AI

Palus AstraJul 1, 2020, 5:30 PM
35 points

11 votes

Overall karma indicates overall quality.

4 comments67 min readLW link

[AN #106]: Eval­u­at­ing gen­er­al­iza­tion abil­ity of learned re­ward models

Rohin ShahJul 1, 2020, 5:20 PM
14 points

4 votes

Overall karma indicates overall quality.

2 comments11 min readLW link
(mailchi.mp)

The “AI De­bate” Debate

michaelcohenJul 2, 2020, 10:16 AM
20 points

10 votes

Overall karma indicates overall quality.

20 comments3 min readLW link

Idea: Imi­ta­tion/​Value Learn­ing AIXI

Past AccountJul 3, 2020, 5:10 PM
3 points

1 vote

Overall karma indicates overall quality.

6 comments1 min readLW link

Split­ting De­bate up into Two Subsystems

NandiJul 3, 2020, 8:11 PM
13 points

8 votes

Overall karma indicates overall quality.

5 comments4 min readLW link

AI Un­safety via Non-Zero-Sum Debate

VojtaKovarikJul 3, 2020, 10:03 PM
25 points

11 votes

Overall karma indicates overall quality.

10 comments5 min readLW link

Clas­sify­ing games like the Pri­soner’s Dilemma

philhJul 4, 2020, 5:10 PM
100 points

35 votes

Overall karma indicates overall quality.

28 comments6 min readLW link1 review
(reasonableapproximation.net)

AI-Feyn­man as a bench­mark for what we should be aiming for

Faustus2Jul 4, 2020, 9:24 AM
8 points

7 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

Learn­ing the prior

paulfchristianoJul 5, 2020, 9:00 PM
79 points

23 votes

Overall karma indicates overall quality.

29 comments8 min readLW link
(ai-alignment.com)

Bet­ter pri­ors as a safety problem

paulfchristianoJul 5, 2020, 9:20 PM
64 points

20 votes

Overall karma indicates overall quality.

7 comments5 min readLW link
(ai-alignment.com)

[Question] How far is AGI?

Roko JelavićJul 5, 2020, 5:58 PM
6 points

4 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Clas­sify­ing speci­fi­ca­tion prob­lems as var­i­ants of Good­hart’s Law

VikaAug 19, 2019, 8:40 PM
70 points

18 votes

Overall karma indicates overall quality.

5 comments5 min readLW link1 review

New safety re­search agenda: scal­able agent al­ign­ment via re­ward modeling

VikaNov 20, 2018, 5:29 PM
34 points

13 votes

Overall karma indicates overall quality.

13 comments1 min readLW link
(medium.com)

De­sign­ing agent in­cen­tives to avoid side effects

Mar 11, 2019, 8:55 PM
29 points

6 votes

Overall karma indicates overall quality.

0 comments2 min readLW link
(medium.com)

Dis­cus­sion on the ma­chine learn­ing ap­proach to AI safety

VikaNov 1, 2018, 8:54 PM
26 points

13 votes

Overall karma indicates overall quality.

3 comments4 min readLW link

Speci­fi­ca­tion gam­ing ex­am­ples in AI

VikaApr 3, 2018, 12:30 PM
43 points

22 votes

Overall karma indicates overall quality.

9 comments1 min readLW link2 reviews

[Question] (an­swered: yes) Has any­one writ­ten up a con­sid­er­a­tion of Downs’s “Para­dox of Vot­ing” from the per­spec­tive of MIRI-ish de­ci­sion the­o­ries (UDT, FDT, or even just EDT)?

Jameson QuinnJul 6, 2020, 6:26 PM
10 points

6 votes

Overall karma indicates overall quality.

24 comments1 min readLW link

New Deep­Mind AI Safety Re­search Blog

VikaSep 27, 2018, 4:28 PM
43 points

17 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(medium.com)

Con­test: $1,000 for good ques­tions to ask to an Or­a­cle AI

Stuart_ArmstrongJul 31, 2019, 6:48 PM
57 points

32 votes

Overall karma indicates overall quality.

156 comments3 min readLW link

De­con­fus­ing Hu­man Values Re­search Agenda v1

Gordon Seidoh WorleyMar 23, 2020, 4:25 PM
27 points

11 votes

Overall karma indicates overall quality.

12 comments4 min readLW link

[Question] How “hon­est” is GPT-3?

abramdemskiJul 8, 2020, 7:38 PM
72 points

27 votes

Overall karma indicates overall quality.

18 comments5 min readLW link

What does it mean to ap­ply de­ci­sion the­ory?

abramdemskiJul 8, 2020, 8:31 PM
51 points

17 votes

Overall karma indicates overall quality.

5 comments8 min readLW link

AI Re­search Con­sid­er­a­tions for Hu­man Ex­is­ten­tial Safety (ARCHES)

habrykaJul 9, 2020, 2:49 AM
60 points

16 votes

Overall karma indicates overall quality.

8 comments1 min readLW link
(arxiv.org)

The Un­rea­son­able Effec­tive­ness of Deep Learning

Richard_NgoSep 30, 2018, 3:48 PM
85 points

31 votes

Overall karma indicates overall quality.

5 comments13 min readLW link
(thinkingcomplete.blogspot.com)

mAIry’s room: AI rea­son­ing to solve philo­soph­i­cal problems

Stuart_ArmstrongMar 5, 2019, 8:24 PM
92 points

30 votes

Overall karma indicates overall quality.

41 comments6 min readLW link2 reviews

Failures of an em­bod­ied AIXI

So8resJun 15, 2014, 6:29 PM
48 points

31 votes

Overall karma indicates overall quality.

46 comments12 min readLW link

The Prob­lem with AIXI

Rob BensingerMar 18, 2014, 1:55 AM
43 points

29 votes

Overall karma indicates overall quality.

78 comments23 min readLW link

Ver­sions of AIXI can be ar­bi­trar­ily stupid

Stuart_ArmstrongAug 10, 2015, 1:23 PM
29 points

22 votes

Overall karma indicates overall quality.

59 comments1 min readLW link

Reflec­tive AIXI and Anthropics

DiffractorSep 24, 2018, 2:15 AM
17 points

8 votes

Overall karma indicates overall quality.

13 comments8 min readLW link

AIXI and Ex­is­ten­tial Despair

paulfchristianoDec 8, 2011, 8:03 PM
23 points

23 votes

Overall karma indicates overall quality.

38 comments6 min readLW link

How to make AIXI-tl in­ca­pable of learning

itaibn0Jan 27, 2014, 12:05 AM
7 points

9 votes

Overall karma indicates overall quality.

5 comments2 min readLW link

Help re­quest: What is the Kol­mogorov com­plex­ity of com­putable ap­prox­i­ma­tions to AIXI?

AnnaSalamonDec 5, 2010, 10:23 AM
9 points

6 votes

Overall karma indicates overall quality.

9 comments1 min readLW link

“AIXIjs: A Soft­ware Demo for Gen­eral Re­in­force­ment Learn­ing”, As­lanides 2017

gwernMay 29, 2017, 9:09 PM
7 points

4 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(arxiv.org)

Can AIXI be trained to do any­thing a hu­man can?

Stuart_ArmstrongOct 20, 2014, 1:12 PM
5 points

6 votes

Overall karma indicates overall quality.

9 comments2 min readLW link

Shap­ing eco­nomic in­cen­tives for col­lab­o­ra­tive AGI

Kaj_SotalaJun 29, 2018, 4:26 PM
45 points

13 votes

Overall karma indicates overall quality.

15 comments4 min readLW link

Is the Star Trek Fed­er­a­tion re­ally in­ca­pable of build­ing AI?

Kaj_SotalaMar 18, 2018, 10:30 AM
19 points

12 votes

Overall karma indicates overall quality.

4 comments2 min readLW link
(kajsotala.fi)

Some con­cep­tual high­lights from “Disjunc­tive Sce­nar­ios of Catas­trophic AI Risk”

Kaj_SotalaFeb 12, 2018, 12:30 PM
33 points

21 votes

Overall karma indicates overall quality.

4 comments6 min readLW link
(kajsotala.fi)

Mis­con­cep­tions about con­tin­u­ous takeoff

Matthew BarnettOct 8, 2019, 9:31 PM
79 points

36 votes

Overall karma indicates overall quality.

38 comments4 min readLW link

Dist­in­guish­ing defi­ni­tions of takeoff

Matthew BarnettFeb 14, 2020, 12:16 AM
60 points

25 votes

Overall karma indicates overall quality.

6 comments6 min readLW link

Book re­view: Ar­tifi­cial In­tel­li­gence Safety and Security

PeterMcCluskeyDec 8, 2018, 3:47 AM
27 points

9 votes

Overall karma indicates overall quality.

3 comments8 min readLW link
(www.bayesianinvestor.com)

Why AI may not foom

John_MaxwellMar 24, 2013, 8:11 AM
29 points

37 votes

Overall karma indicates overall quality.

81 comments12 min readLW link

Hu­mans Who Are Not Con­cen­trat­ing Are Not Gen­eral Intelligences

sarahconstantinFeb 25, 2019, 8:40 PM
181 points

103 votes

Overall karma indicates overall quality.

35 comments6 min readLW link1 review
(srconstantin.wordpress.com)

The Hacker Learns to Trust

Ben PaceJun 22, 2019, 12:27 AM
80 points

30 votes

Overall karma indicates overall quality.

18 comments8 min readLW link
(medium.com)

Book Re­view: Hu­man Compatible

Scott AlexanderJan 31, 2020, 5:20 AM
77 points

34 votes

Overall karma indicates overall quality.

6 comments16 min readLW link
(slatestarcodex.com)

SSC Jour­nal Club: AI Timelines

Scott AlexanderJun 8, 2017, 7:00 PM
12 points

11 votes

Overall karma indicates overall quality.

15 comments8 min readLW link

Ar­gu­ments against my­opic training

Richard_NgoJul 9, 2020, 4:07 PM
56 points

19 votes

Overall karma indicates overall quality.

39 comments12 min readLW link

On mo­ti­va­tions for MIRI’s highly re­li­able agent de­sign research

jessicataJan 29, 2017, 7:34 PM
27 points

11 votes

Overall karma indicates overall quality.

1 comment5 min readLW link

Why is the im­pact penalty time-in­con­sis­tent?

Stuart_ArmstrongJul 9, 2020, 5:26 PM
16 points

5 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

My cur­rent take on the Paul-MIRI dis­agree­ment on al­ignabil­ity of messy AI

jessicataJan 29, 2017, 8:52 PM
21 points

11 votes

Overall karma indicates overall quality.

0 comments10 min readLW link

Ben Go­ertzel: The Sin­gu­lar­ity In­sti­tute’s Scary Idea (and Why I Don’t Buy It)

Paul CrowleyOct 30, 2010, 9:31 AM
42 points

40 votes

Overall karma indicates overall quality.

442 comments1 min readLW link

An An­a­lytic Per­spec­tive on AI Alignment

DanielFilanMar 1, 2020, 4:10 AM
54 points

18 votes

Overall karma indicates overall quality.

45 comments8 min readLW link
(danielfilan.com)

Mechanis­tic Trans­parency for Ma­chine Learning

DanielFilanJul 11, 2018, 12:34 AM
54 points

23 votes

Overall karma indicates overall quality.

9 comments4 min readLW link

A model I use when mak­ing plans to re­duce AI x-risk

Ben PaceJan 19, 2018, 12:21 AM
69 points

56 votes

Overall karma indicates overall quality.

41 comments6 min readLW link

AI Re­searchers On AI Risk

Scott AlexanderMay 22, 2015, 11:16 AM
18 points

13 votes

Overall karma indicates overall quality.

0 comments16 min readLW link

Mini ad­vent cal­en­dar of Xrisks: Ar­tifi­cial Intelligence

Stuart_ArmstrongDec 7, 2012, 11:26 AM
5 points

6 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

For FAI: Is “Molec­u­lar Nan­otech­nol­ogy” putting our best foot for­ward?

leplenJun 22, 2013, 4:44 AM
79 points

73 votes

Overall karma indicates overall quality.

118 comments3 min readLW link

UFAI can­not be the Great Filter

ThrasymachusDec 22, 2012, 11:26 AM
59 points

38 votes

Overall karma indicates overall quality.

92 comments3 min readLW link

Don’t Fear The Filter

Scott AlexanderMay 29, 2014, 12:45 AM
11 points

13 votes

Overall karma indicates overall quality.

18 comments6 min readLW link

The Great Filter is early, or AI is hard

Stuart_ArmstrongAug 29, 2014, 4:17 PM
32 points

28 votes

Overall karma indicates overall quality.

76 comments1 min readLW link

Talk: Key Is­sues In Near-Term AI Safety Research

Aryeh EnglanderJul 10, 2020, 6:36 PM
22 points

8 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Mesa-Op­ti­miz­ers vs “Steered Op­ti­miz­ers”

Steven ByrnesJul 10, 2020, 4:49 PM
45 points

15 votes

Overall karma indicates overall quality.

7 comments8 min readLW link

AlphaS­tar: Im­pres­sive for RL progress, not for AGI progress

orthonormalNov 2, 2019, 1:50 AM
113 points

62 votes

Overall karma indicates overall quality.

58 comments2 min readLW link1 review

The Catas­trophic Con­ver­gence Conjecture

TurnTroutFeb 14, 2020, 9:16 PM
44 points

14 votes

Overall karma indicates overall quality.

15 comments8 min readLW link

[Question] How well can the GPT ar­chi­tec­ture solve the par­ity task?

FactorialCodeJul 11, 2020, 7:02 PM
19 points

6 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Sun­day July 12 — talks by Scott Garrabrant, Alexflint, alexei, Stu­art_Armstrong

Jul 8, 2020, 12:27 AM
19 points

4 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

[Link] Word-vec­tor based DL sys­tem achieves hu­man par­ity in ver­bal IQ tests

jacob_cannellJun 13, 2015, 11:38 PM
17 points

10 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

The Power of Intelligence

Eliezer YudkowskyJan 1, 2007, 8:00 PM
66 points

48 votes

Overall karma indicates overall quality.

4 comments4 min readLW link

Com­ments on CAIS

Richard_NgoJan 12, 2019, 3:20 PM
76 points

28 votes

Overall karma indicates overall quality.

14 comments7 min readLW link

[Question] What are CAIS’ bold­est near/​medium-term pre­dic­tions?

Bird ConceptMar 28, 2019, 1:14 PM
31 points

10 votes

Overall karma indicates overall quality.

17 comments1 min readLW link

Drexler on AI Risk

PeterMcCluskeyFeb 1, 2019, 5:11 AM
34 points

18 votes

Overall karma indicates overall quality.

10 comments9 min readLW link
(www.bayesianinvestor.com)

Six AI Risk/​Strat­egy Ideas

Wei DaiAug 27, 2019, 12:40 AM
64 points

29 votes

Overall karma indicates overall quality.

18 comments4 min readLW link1 review

New re­port: In­tel­li­gence Ex­plo­sion Microeconomics

Eliezer YudkowskyApr 29, 2013, 11:14 PM
72 points

49 votes

Overall karma indicates overall quality.

251 comments3 min readLW link

Book re­view: Hu­man Compatible

PeterMcCluskeyJan 19, 2020, 3:32 AM
37 points

11 votes

Overall karma indicates overall quality.

2 comments5 min readLW link
(www.bayesianinvestor.com)

Thoughts on “Hu­man-Com­pat­i­ble”

TurnTroutOct 10, 2019, 5:24 AM
63 points

33 votes

Overall karma indicates overall quality.

35 comments5 min readLW link

Book Re­view: The AI Does Not Hate You

PeterMcCluskeyOct 28, 2019, 5:45 PM
26 points

10 votes

Overall karma indicates overall quality.

0 comments5 min readLW link
(www.bayesianinvestor.com)

[Link] Book Re­view: ‘The AI Does Not Hate You’ by Tom Chivers (Scott Aaron­son)

eigenOct 7, 2019, 6:16 PM
19 points

7 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Book Re­view: Life 3.0: Be­ing Hu­man in the Age of Ar­tifi­cial Intelligence

J Thomas MorosJan 18, 2018, 5:18 PM
8 points

6 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(ferocioustruth.com)

Book Re­view: Weapons of Math Destruction

ZviJun 4, 2017, 9:20 PM
1 point

0 votes

Overall karma indicates overall quality.

0 comments16 min readLW link

DARPA Digi­tal Tu­tor: Four Months to To­tal Tech­ni­cal Ex­per­tise?

SebastianG Jul 6, 2020, 11:34 PM
200 points

108 votes

Overall karma indicates overall quality.

19 comments7 min readLW link

Paper: Su­per­in­tel­li­gence as a Cause or Cure for Risks of Astro­nom­i­cal Suffering

Kaj_SotalaJan 3, 2018, 2:39 PM
1 point

1 vote

Overall karma indicates overall quality.

6 comments1 min readLW link
(www.informatica.si)

Prevent­ing s-risks via in­dex­i­cal un­cer­tainty, acausal trade and dom­i­na­tion in the multiverse

avturchinSep 27, 2018, 10:09 AM
11 points

6 votes

Overall karma indicates overall quality.

6 comments4 min readLW link

Pre­face to CLR’s Re­search Agenda on Co­op­er­a­tion, Con­flict, and TAI

JesseCliftonDec 13, 2019, 9:02 PM
59 points

25 votes

Overall karma indicates overall quality.

10 comments2 min readLW link

Sec­tions 1 & 2: In­tro­duc­tion, Strat­egy and Governance

JesseCliftonDec 17, 2019, 9:27 PM
34 points

11 votes

Overall karma indicates overall quality.

5 comments14 min readLW link

Sec­tions 3 & 4: Cred­i­bil­ity, Peace­ful Bar­gain­ing Mechanisms

JesseCliftonDec 17, 2019, 9:46 PM
19 points

7 votes

Overall karma indicates overall quality.

2 comments12 min readLW link

Sec­tions 5 & 6: Con­tem­po­rary Ar­chi­tec­tures, Hu­mans in the Loop

JesseCliftonDec 20, 2019, 3:52 AM
27 points

7 votes

Overall karma indicates overall quality.

4 comments10 min readLW link

Sec­tion 7: Foun­da­tions of Ra­tional Agency

JesseCliftonDec 22, 2019, 2:05 AM
14 points

3 votes

Overall karma indicates overall quality.

4 comments8 min readLW link

What counts as defec­tion?

TurnTroutJul 12, 2020, 10:03 PM
81 points

23 votes

Overall karma indicates overall quality.

21 comments5 min readLW link1 review

The Com­mit­ment Races problem

Daniel KokotajloAug 23, 2019, 1:58 AM
122 points

56 votes

Overall karma indicates overall quality.

39 comments5 min readLW link

Align­ment Newslet­ter #36

Rohin ShahDec 12, 2018, 1:10 AM
21 points

6 votes

Overall karma indicates overall quality.

0 comments11 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #47

Rohin ShahMar 4, 2019, 4:30 AM
18 points

5 votes

Overall karma indicates overall quality.

0 comments8 min readLW link
(mailchi.mp)

Un­der­stand­ing “Deep Dou­ble Des­cent”

evhubDec 6, 2019, 12:00 AM
135 points

67 votes

Overall karma indicates overall quality.

51 comments5 min readLW link4 reviews

[LINK] Strong AI Startup Raises $15M

olalondeAug 21, 2012, 8:47 PM
24 points

20 votes

Overall karma indicates overall quality.

13 comments1 min readLW link

An­nounc­ing the AI Align­ment Prize

cousin_itNov 3, 2017, 3:47 PM
95 points

69 votes

Overall karma indicates overall quality.

78 comments1 min readLW link

I’m leav­ing AI al­ign­ment – you bet­ter stay

rmoehnMar 12, 2020, 5:58 AM
150 points

62 votes

Overall karma indicates overall quality.

19 comments5 min readLW link

New pa­per: AGI Agent Safety by Iter­a­tively Im­prov­ing the Utility Function

Koen.HoltmanJul 15, 2020, 2:05 PM
21 points

9 votes

Overall karma indicates overall quality.

2 comments6 min readLW link

[Question] How should AI de­bate be judged?

abramdemskiJul 15, 2020, 10:20 PM
49 points

13 votes

Overall karma indicates overall quality.

27 comments6 min readLW link

Align­ment pro­pos­als and com­plex­ity classes

evhubJul 16, 2020, 12:27 AM
33 points

11 votes

Overall karma indicates overall quality.

26 comments13 min readLW link

[AN #107]: The con­ver­gent in­stru­men­tal sub­goals of goal-di­rected agents

Rohin ShahJul 16, 2020, 6:47 AM
13 points

4 votes

Overall karma indicates overall quality.

1 comment8 min readLW link
(mailchi.mp)

[AN #108]: Why we should scru­ti­nize ar­gu­ments for AI risk

Rohin ShahJul 16, 2020, 6:47 AM
19 points

7 votes

Overall karma indicates overall quality.

6 comments12 min readLW link
(mailchi.mp)

En­vi­ron­ments as a bot­tle­neck in AGI development

Richard_NgoJul 17, 2020, 5:02 AM
36 points

17 votes

Overall karma indicates overall quality.

19 comments6 min readLW link

[Question] Can an agent use in­ter­ac­tive proofs to check the al­ign­ment of suc­ce­sors?

PabloAMCJul 17, 2020, 7:07 PM
7 points

4 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Les­sons on AI Takeover from the conquistadors

Jul 17, 2020, 10:35 PM
58 points

17 votes

Overall karma indicates overall quality.

30 comments5 min readLW link

What Would I Do? Self-pre­dic­tion in Sim­ple Algorithms

Scott GarrabrantJul 20, 2020, 4:27 AM
54 points

16 votes

Overall karma indicates overall quality.

13 comments5 min readLW link

Wri­teup: Progress on AI Safety via Debate

Feb 5, 2020, 9:04 PM
94 points

31 votes

Overall karma indicates overall quality.

18 comments33 min readLW link

Oper­a­tional­iz­ing Interpretability

lifelonglearnerJul 20, 2020, 5:22 AM
20 points

5 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Learn­ing Values in Practice

Stuart_ArmstrongJul 20, 2020, 6:38 PM
24 points

7 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Par­allels Between AI Safety by De­bate and Ev­i­dence Law

CullenJul 20, 2020, 10:52 PM
10 points

5 votes

Overall karma indicates overall quality.

1 comment2 min readLW link
(cullenokeefe.com)

The Redis­cov­ery of In­te­ri­or­ity in Ma­chine Learning

DanBJul 21, 2020, 5:02 AM
5 points

3 votes

Overall karma indicates overall quality.

4 comments1 min readLW link
(danburfoot.net)

The “AI Dun­geons” Dragon Model is heav­ily path de­pen­dent (test­ing GPT-3 on ethics)

Rafael HarthJul 21, 2020, 12:14 PM
44 points

17 votes

Overall karma indicates overall quality.

9 comments6 min readLW link

How good is hu­man­ity at co­or­di­na­tion?

BuckJul 21, 2020, 8:01 PM
78 points

34 votes

Overall karma indicates overall quality.

44 comments3 min readLW link

Align­ment As A Bot­tle­neck To Use­ful­ness Of GPT-3

johnswentworthJul 21, 2020, 8:02 PM
111 points

54 votes

Overall karma indicates overall quality.

57 comments3 min readLW link

$1000 bounty for OpenAI to show whether GPT3 was “de­liber­ately” pre­tend­ing to be stupi­der than it is

Bird ConceptJul 21, 2020, 6:42 PM
59 points

27 votes

Overall karma indicates overall quality.

40 comments2 min readLW link
(twitter.com)

[Preprint] The Com­pu­ta­tional Limits of Deep Learning

Gordon Seidoh WorleyJul 21, 2020, 9:25 PM
9 points

3 votes

Overall karma indicates overall quality.

2 comments1 min readLW link
(arxiv.org)

[AN #109]: Teach­ing neu­ral nets to gen­er­al­ize the way hu­mans would

Rohin ShahJul 22, 2020, 5:10 PM
17 points

4 votes

Overall karma indicates overall quality.

3 comments9 min readLW link
(mailchi.mp)

Re­search agenda for AI safety and a bet­ter civilization

agilecavemanJul 22, 2020, 6:35 AM
12 points

6 votes

Overall karma indicates overall quality.

2 comments16 min readLW link

Weak HCH ac­cesses EXP

evhubJul 22, 2020, 10:36 PM
14 points

4 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

GPT-3 Gems

TurnTroutJul 23, 2020, 12:46 AM
33 points

20 votes

Overall karma indicates overall quality.

10 comments48 min readLW link

Op­ti­miz­ing ar­bi­trary ex­pres­sions with a lin­ear num­ber of queries to a Log­i­cal In­duc­tion Or­a­cle (Car­toon Guide)

Donald HobsonJul 23, 2020, 9:37 PM
3 points

5 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

[Question] Con­struct a port­fo­lio to profit from AI progress.

sapphireJul 25, 2020, 8:18 AM
29 points

12 votes

Overall karma indicates overall quality.

13 comments1 min readLW link

Think­ing soberly about the con­text and con­se­quences of Friendly AI

Mitchell_PorterOct 16, 2012, 4:33 AM
21 points

42 votes

Overall karma indicates overall quality.

39 comments1 min readLW link

Goal re­ten­tion dis­cus­sion with Eliezer

Max TegmarkSep 4, 2014, 10:23 PM
93 points

61 votes

Overall karma indicates overall quality.

26 comments6 min readLW link

[Question] Where do peo­ple dis­cuss do­ing things with GPT-3?

skybrianJul 26, 2020, 2:31 PM
2 points

1 vote

Overall karma indicates overall quality.

7 comments1 min readLW link

You Can Prob­a­bly Am­plify GPT3 Directly

Past AccountJul 26, 2020, 9:58 PM
34 points

15 votes

Overall karma indicates overall quality.

14 comments6 min readLW link

[up­dated] how does gpt2′s train­ing cor­pus cap­ture in­ter­net dis­cus­sion? not well

nostalgebraistJul 27, 2020, 10:30 PM
25 points

13 votes

Overall karma indicates overall quality.

3 comments2 min readLW link
(nostalgebraist.tumblr.com)

Agen­tic Lan­guage Model Memes

FactorialCodeAug 1, 2020, 6:03 PM
16 points

6 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

A com­mu­nity-cu­rated repos­i­tory of in­ter­est­ing GPT-3 stuff

Rudi CJul 28, 2020, 2:16 PM
8 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(github.com)

[Question] Does the lot­tery ticket hy­poth­e­sis sug­gest the scal­ing hy­poth­e­sis?

Daniel KokotajloJul 28, 2020, 7:52 PM
14 points

6 votes

Overall karma indicates overall quality.

17 comments1 min readLW link

[Question] To what ex­tent are the scal­ing prop­er­ties of Trans­former net­works ex­cep­tional?

abramdemskiJul 28, 2020, 8:06 PM
30 points

10 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

[Question] What hap­pens to var­i­ance as neu­ral net­work train­ing is scaled? What does it im­ply about “lot­tery tick­ets”?

abramdemskiJul 28, 2020, 8:22 PM
25 points

6 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

[Question] How will in­ter­net fo­rums like LW be able to defend against GPT-style spam?

ChristianKlJul 28, 2020, 8:12 PM
14 points

5 votes

Overall karma indicates overall quality.

18 comments1 min readLW link

Pre­dic­tions for GPT-N

hippkeJul 29, 2020, 1:16 AM
36 points

21 votes

Overall karma indicates overall quality.

31 comments1 min readLW link

An­nounce­ment: AI al­ign­ment prize win­ners and next round

cousin_itJan 15, 2018, 2:33 PM
80 points

64 votes

Overall karma indicates overall quality.

68 comments2 min readLW link

Jeff Hawk­ins on neu­ro­mor­phic AGI within 20 years

Steven ByrnesJul 15, 2019, 7:16 PM
167 points

68 votes

Overall karma indicates overall quality.

24 comments12 min readLW link

Cas­cades, Cy­cles, In­sight...

Eliezer YudkowskyNov 24, 2008, 9:33 AM
31 points

21 votes

Overall karma indicates overall quality.

31 comments8 min readLW link

...Re­cur­sion, Magic

Eliezer YudkowskyNov 25, 2008, 9:10 AM
27 points

20 votes

Overall karma indicates overall quality.

28 comments5 min readLW link

Refer­ences & Re­sources for LessWrong

XiXiDuOct 10, 2010, 2:54 PM
153 points

125 votes

Overall karma indicates overall quality.

106 comments20 min readLW link

[Question] A game de­signed to beat AI?

Long tryMar 17, 2020, 3:51 AM
13 points

6 votes

Overall karma indicates overall quality.

29 comments1 min readLW link

Truly Part Of You

Eliezer YudkowskyNov 21, 2007, 2:18 AM
149 points

116 votes

Overall karma indicates overall quality.

59 comments4 min readLW link

[AN #110]: Learn­ing fea­tures from hu­man feed­back to en­able re­ward learning

Rohin ShahJul 29, 2020, 5:20 PM
13 points

4 votes

Overall karma indicates overall quality.

2 comments10 min readLW link
(mailchi.mp)

Struc­tured Tasks for Lan­guage Models

Past AccountJul 29, 2020, 2:17 PM
5 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

En­gag­ing Se­ri­ously with Short Timelines

sapphireJul 29, 2020, 7:21 PM
43 points

30 votes

Overall karma indicates overall quality.

23 comments3 min readLW link

What Failure Looks Like: Distill­ing the Discussion

Ben PaceJul 29, 2020, 9:49 PM
79 points

24 votes

Overall karma indicates overall quality.

14 comments7 min readLW link

Learn­ing the prior and generalization

evhubJul 29, 2020, 10:49 PM
34 points

12 votes

Overall karma indicates overall quality.

16 comments4 min readLW link

[Question] Is the work on AI al­ign­ment rele­vant to GPT?

Richard_KennawayJul 30, 2020, 12:23 PM
20 points

8 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Ver­ifi­ca­tion and Transparency

DanielFilanAug 8, 2019, 1:50 AM
34 points

17 votes

Overall karma indicates overall quality.

6 comments2 min readLW link
(danielfilan.com)

Robin Han­son on Lump­iness of AI Services

DanielFilanFeb 17, 2019, 11:08 PM
15 points

6 votes

Overall karma indicates overall quality.

2 comments2 min readLW link
(www.overcomingbias.com)

One Way to Think About ML Transparency

Matthew BarnettSep 2, 2019, 11:27 PM
26 points

9 votes

Overall karma indicates overall quality.

28 comments5 min readLW link

What is In­ter­pretabil­ity?

Mar 17, 2020, 8:23 PM
34 points

13 votes

Overall karma indicates overall quality.

0 comments11 min readLW link

Re­laxed ad­ver­sar­ial train­ing for in­ner alignment

evhubSep 10, 2019, 11:03 PM
61 points

19 votes

Overall karma indicates overall quality.

28 comments1 min readLW link

Con­clu­sion to ‘Refram­ing Im­pact’

TurnTroutFeb 28, 2020, 4:05 PM
39 points

14 votes

Overall karma indicates overall quality.

17 comments2 min readLW link

Bayesian Evolv­ing-to-Extinction

abramdemskiFeb 14, 2020, 11:55 PM
38 points

16 votes

Overall karma indicates overall quality.

13 comments5 min readLW link

Do Suffi­ciently Ad­vanced Agents Use Logic?

abramdemskiSep 13, 2019, 7:53 PM
41 points

16 votes

Overall karma indicates overall quality.

11 comments9 min readLW link

World State is the Wrong Ab­strac­tion for Impact

TurnTroutOct 1, 2019, 9:03 PM
62 points

19 votes

Overall karma indicates overall quality.

19 comments2 min readLW link

At­tain­able Utility Preser­va­tion: Concepts

TurnTroutFeb 17, 2020, 5:20 AM
38 points

11 votes

Overall karma indicates overall quality.

20 comments1 min readLW link

At­tain­able Utility Preser­va­tion: Em­piri­cal Results

Feb 22, 2020, 12:38 AM
61 points

14 votes

Overall karma indicates overall quality.

8 comments10 min readLW link1 review

How Low Should Fruit Hang Be­fore We Pick It?

TurnTroutFeb 25, 2020, 2:08 AM
28 points

9 votes

Overall karma indicates overall quality.

9 comments12 min readLW link

At­tain­able Utility Preser­va­tion: Scal­ing to Superhuman

TurnTroutFeb 27, 2020, 12:52 AM
28 points

11 votes

Overall karma indicates overall quality.

21 comments8 min readLW link

Rea­sons for Ex­cite­ment about Im­pact of Im­pact Mea­sure Research

TurnTroutFeb 27, 2020, 9:42 PM
33 points

12 votes

Overall karma indicates overall quality.

8 comments4 min readLW link

Power as Easily Ex­ploitable Opportunities

TurnTroutAug 1, 2020, 2:14 AM
30 points

9 votes

Overall karma indicates overall quality.

5 comments6 min readLW link

[Question] Would AGIs par­ent young AGIs?

Vishrut AryaAug 2, 2020, 12:57 AM
3 points

5 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

If I were a well-in­ten­tioned AI… I: Image classifier

Stuart_ArmstrongFeb 26, 2020, 12:39 PM
35 points

17 votes

Overall karma indicates overall quality.

4 comments5 min readLW link

Non-Con­se­quen­tial­ist Co­op­er­a­tion?

abramdemskiJan 11, 2019, 9:15 AM
48 points

20 votes

Overall karma indicates overall quality.

15 comments7 min readLW link

Cu­ri­os­ity Killed the Cat and the Asymp­tot­i­cally Op­ti­mal Agent

michaelcohenFeb 20, 2020, 5:28 PM
27 points

12 votes

Overall karma indicates overall quality.

15 comments1 min readLW link

If I were a well-in­ten­tioned AI… IV: Mesa-optimising

Stuart_ArmstrongMar 2, 2020, 12:16 PM
26 points

8 votes

Overall karma indicates overall quality.

2 comments6 min readLW link

Re­sponse to Oren Etz­ioni’s “How to know if ar­tifi­cial in­tel­li­gence is about to de­stroy civ­i­liza­tion”

Daniel KokotajloFeb 27, 2020, 6:10 PM
27 points

9 votes

Overall karma indicates overall quality.

5 comments8 min readLW link

Clar­ify­ing Power-Seek­ing and In­stru­men­tal Convergence

TurnTroutDec 20, 2019, 7:59 PM
42 points

16 votes

Overall karma indicates overall quality.

7 comments3 min readLW link

How im­por­tant are MDPs for AGI (Safety)?

michaelcohenMar 26, 2020, 8:32 PM
14 points

7 votes

Overall karma indicates overall quality.

8 comments2 min readLW link

Syn­the­siz­ing am­plifi­ca­tion and debate

evhubFeb 5, 2020, 10:53 PM
33 points

15 votes

Overall karma indicates overall quality.

10 comments4 min readLW link

is gpt-3 few-shot ready for real ap­pli­ca­tions?

nostalgebraistAug 3, 2020, 7:50 PM
31 points

12 votes

Overall karma indicates overall quality.

5 comments9 min readLW link
(nostalgebraist.tumblr.com)

In­ter­pretabil­ity in ML: A Broad Overview

lifelonglearnerAug 4, 2020, 7:03 PM
52 points

21 votes

Overall karma indicates overall quality.

5 comments15 min readLW link

In­finite Data/​Com­pute Ar­gu­ments in Alignment

johnswentworthAug 4, 2020, 8:21 PM
49 points

18 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

Four Ways An Im­pact Mea­sure Could Help Alignment

Matthew BarnettAug 8, 2019, 12:10 AM
21 points

25 votes

Overall karma indicates overall quality.

1 comment8 min readLW link

Un­der­stand­ing Re­cent Im­pact Measures

Matthew BarnettAug 7, 2019, 4:57 AM
16 points

6 votes

Overall karma indicates overall quality.

6 comments7 min readLW link

A Sur­vey of Early Im­pact Measures

Matthew BarnettAug 6, 2019, 1:22 AM
23 points

8 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

Op­ti­miza­tion Reg­u­lariza­tion through Time Penalty

Linda LinseforsJan 1, 2019, 1:05 PM
11 points

6 votes

Overall karma indicates overall quality.

4 comments3 min readLW link

Stable Poin­t­ers to Value III: Re­cur­sive Quantilization

abramdemskiJul 21, 2018, 8:06 AM
19 points

10 votes

Overall karma indicates overall quality.

4 comments4 min readLW link

Thoughts on Quantilizers

Stuart_ArmstrongJun 2, 2017, 4:24 PM
2 points

2 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Quan­tiliz­ers max­i­mize ex­pected util­ity sub­ject to a con­ser­va­tive cost constraint

jessicataSep 28, 2015, 2:17 AM
25 points

5 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Quan­tilal con­trol for finite MDPs

Vanessa KosoyApr 12, 2018, 9:21 AM
14 points

6 votes

Overall karma indicates overall quality.

0 comments13 min readLW link

The limits of corrigibility

Stuart_ArmstrongApr 10, 2018, 10:49 AM
27 points

14 votes

Overall karma indicates overall quality.

9 comments4 min readLW link

Align­ment Newslet­ter #16: 07/​23/​18

Rohin ShahJul 23, 2018, 4:20 PM
42 points

11 votes

Overall karma indicates overall quality.

0 comments12 min readLW link
(mailchi.mp)

Mea­sur­ing hard­ware overhang

hippkeAug 5, 2020, 7:59 PM
106 points

38 votes

Overall karma indicates overall quality.

14 comments4 min readLW link

[AN #111]: The Cir­cuits hy­pothe­ses for deep learning

Rohin ShahAug 5, 2020, 5:40 PM
23 points

10 votes

Overall karma indicates overall quality.

0 comments9 min readLW link
(mailchi.mp)

Self-Fulfilling Prophe­cies Aren’t Always About Self-Awareness

John_MaxwellNov 18, 2019, 11:11 PM
14 points

7 votes

Overall karma indicates overall quality.

7 comments4 min readLW link

The Good­hart Game

John_MaxwellNov 18, 2019, 11:22 PM
13 points

8 votes

Overall karma indicates overall quality.

5 comments5 min readLW link

Why don’t sin­gu­lar­i­tar­i­ans bet on the cre­ation of AGI by buy­ing stocks?

John_MaxwellMar 11, 2020, 4:27 PM
43 points

23 votes

Overall karma indicates overall quality.

20 comments4 min readLW link

The Dual­ist Pre­dict-O-Matic ($100 prize)

John_MaxwellOct 17, 2019, 6:45 AM
16 points

6 votes

Overall karma indicates overall quality.

35 comments5 min readLW link

[Question] What AI safety prob­lems need solv­ing for safe AI re­search as­sis­tants?

John_MaxwellNov 5, 2019, 2:09 AM
14 points

4 votes

Overall karma indicates overall quality.

13 comments1 min readLW link

Refin­ing the Evolu­tion­ary Anal­ogy to AI

lberglundAug 7, 2020, 11:13 PM
9 points

6 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

The Fu­sion Power Gen­er­a­tor Scenario

johnswentworthAug 8, 2020, 6:31 PM
136 points

68 votes

Overall karma indicates overall quality.

29 comments3 min readLW link

[Question] How much is known about the “in­fer­ence rules” of log­i­cal in­duc­tion?

Eigil RischelAug 8, 2020, 10:45 AM
11 points

6 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

If I were a well-in­ten­tioned AI… II: Act­ing in a world

Stuart_ArmstrongFeb 27, 2020, 11:58 AM
20 points

7 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

If I were a well-in­ten­tioned AI… III: Ex­tremal Goodhart

Stuart_ArmstrongFeb 28, 2020, 11:24 AM
22 points

9 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Towards a For­mal­i­sa­tion of Log­i­cal Counterfactuals

BunthutAug 8, 2020, 10:14 PM
6 points

4 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

[Question] 10/​50/​90% chance of GPT-N Trans­for­ma­tive AI?

human_generated_textAug 9, 2020, 12:10 AM
24 points

9 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

[Question] Can we ex­pect more value from AI al­ign­ment than from an ASI with the goal of run­ning al­ter­nate tra­jec­to­ries of our uni­verse?

Maxime RichéAug 9, 2020, 5:17 PM
2 points

4 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

In defense of Or­a­cle (“Tool”) AI research

Steven ByrnesAug 7, 2019, 7:14 PM
21 points

12 votes

Overall karma indicates overall quality.

11 comments4 min readLW link

How GPT-N will es­cape from its AI-box

hippkeAug 12, 2020, 7:34 PM
7 points

7 votes

Overall karma indicates overall quality.

9 comments1 min readLW link

Strong im­pli­ca­tion of prefer­ence uncertainty

Stuart_ArmstrongAug 12, 2020, 7:02 PM
20 points

5 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

[AN #112]: Eng­ineer­ing a Safer World

Rohin ShahAug 13, 2020, 5:20 PM
25 points

11 votes

Overall karma indicates overall quality.

2 comments12 min readLW link
(mailchi.mp)

Room and Board for Peo­ple Self-Learn­ing ML or Do­ing In­de­pen­dent ML Research

SamuelKnocheAug 14, 2020, 5:19 PM
7 points

5 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Talk and Q&A—Dan Hendrycks—Paper: Align­ing AI With Shared Hu­man Values. On Dis­cord at Aug 28, 2020 8:00-10:00 AM GMT+8.

wassnameAug 14, 2020, 11:57 PM
1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

Search ver­sus design

Alex FlintAug 16, 2020, 4:53 PM
89 points

29 votes

Overall karma indicates overall quality.

41 comments36 min readLW link1 review

Work on Se­cu­rity In­stead of Friendli­ness?

Wei DaiJul 21, 2012, 6:28 PM
51 points

41 votes

Overall karma indicates overall quality.

107 comments2 min readLW link

Goal-Direct­ed­ness: What Suc­cess Looks Like

adamShimiAug 16, 2020, 6:33 PM
9 points

3 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

[Question] A way to beat su­per­ra­tional/​EDT agents?

Abhimanyu Pallavi SudhirAug 17, 2020, 2:33 PM
5 points

7 votes

Overall karma indicates overall quality.

13 comments1 min readLW link

Learn­ing hu­man prefer­ences: op­ti­mistic and pes­simistic scenarios

Stuart_ArmstrongAug 18, 2020, 1:05 PM
27 points

7 votes

Overall karma indicates overall quality.

6 comments6 min readLW link

Mesa-Search vs Mesa-Control

abramdemskiAug 18, 2020, 6:51 PM
54 points

17 votes

Overall karma indicates overall quality.

45 comments7 min readLW link

Why we want un­bi­ased learn­ing processes

Stuart_ArmstrongFeb 20, 2018, 2:48 PM
13 points

9 votes

Overall karma indicates overall quality.

3 comments3 min readLW link

In­tu­itive ex­am­ples of re­ward func­tion learn­ing?

Stuart_ArmstrongMar 6, 2018, 4:54 PM
7 points

5 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

Open-Cat­e­gory Classification

TurnTroutMar 28, 2018, 2:49 PM
13 points

10 votes

Overall karma indicates overall quality.

6 comments10 min readLW link

Look­ing for ad­ver­sar­ial col­lab­o­ra­tors to test our De­bate protocol

Beth BarnesAug 19, 2020, 3:15 AM
52 points

18 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Walk­through of ‘For­mal­iz­ing Con­ver­gent In­stru­men­tal Goals’

TurnTroutFeb 26, 2018, 2:20 AM
10 points

6 votes

Overall karma indicates overall quality.

2 comments10 min readLW link

Am­bi­guity Detection

TurnTroutMar 1, 2018, 4:23 AM
11 points

9 votes

Overall karma indicates overall quality.

9 comments4 min readLW link

Pe­nal­iz­ing Im­pact via At­tain­able Utility Preservation

TurnTroutDec 28, 2018, 9:46 PM
24 points

10 votes

Overall karma indicates overall quality.

0 comments3 min readLW link
(arxiv.org)

What You See Isn’t Always What You Want

TurnTroutSep 13, 2019, 4:17 AM
30 points

10 votes

Overall karma indicates overall quality.

12 comments3 min readLW link

[Question] In­stru­men­tal Oc­cam?

abramdemskiJan 31, 2020, 7:27 PM
30 points

11 votes

Overall karma indicates overall quality.

15 comments1 min readLW link

Com­pact vs. Wide Models

VaniverJul 16, 2018, 4:09 AM
31 points

14 votes

Overall karma indicates overall quality.

5 comments3 min readLW link

Alex Ir­pan: “My AI Timelines Have Sped Up”

VaniverAug 19, 2020, 4:23 PM
43 points

14 votes

Overall karma indicates overall quality.

20 comments1 min readLW link
(www.alexirpan.com)

[AN #113]: Check­ing the eth­i­cal in­tu­itions of large lan­guage models

Rohin ShahAug 19, 2020, 5:10 PM
23 points

6 votes

Overall karma indicates overall quality.

0 comments9 min readLW link
(mailchi.mp)

AI safety as feather­less bipeds *with broad flat nails*

Stuart_ArmstrongAug 19, 2020, 10:22 AM
37 points

19 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Time Magaz­ine has an ar­ti­cle about the Sin­gu­lar­ity...

RaemonFeb 11, 2011, 2:20 AM
40 points

28 votes

Overall karma indicates overall quality.

13 comments1 min readLW link

How rapidly are GPUs im­prov­ing in price perfor­mance?

gallabytesNov 25, 2018, 7:54 PM
31 points

9 votes

Overall karma indicates overall quality.

9 comments1 min readLW link
(mediangroup.org)

Our val­ues are un­der­defined, change­able, and manipulable

Stuart_ArmstrongNov 2, 2017, 11:09 AM
25 points

12 votes

Overall karma indicates overall quality.

6 comments3 min readLW link

[Question] What fund­ing sources ex­ist for tech­ni­cal AI safety re­search?

johnswentworthOct 1, 2019, 3:30 PM
26 points

10 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Hu­mans can drive cars

ApprenticeJan 30, 2014, 11:55 AM
53 points

41 votes

Overall karma indicates overall quality.

89 comments2 min readLW link

A Less Wrong sin­gu­lar­ity ar­ti­cle?

Kaj_SotalaNov 17, 2009, 2:15 PM
31 points

36 votes

Overall karma indicates overall quality.

215 comments1 min readLW link

The Bayesian Tyrant

abramdemskiAug 20, 2020, 12:08 AM
132 points

65 votes

Overall karma indicates overall quality.

20 comments6 min readLW link1 review

Con­cept Safety: Pro­duc­ing similar AI-hu­man con­cept spaces

Kaj_SotalaApr 14, 2015, 8:39 PM
50 points

33 votes

Overall karma indicates overall quality.

45 comments8 min readLW link

[LINK] What should a rea­son­able per­son be­lieve about the Sin­gu­lar­ity?

Kaj_SotalaJan 13, 2011, 9:32 AM
38 points

30 votes

Overall karma indicates overall quality.

14 comments2 min readLW link

The many ways AIs be­have badly

Stuart_ArmstrongApr 24, 2018, 11:40 AM
10 points

8 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

July 2020 gw­ern.net newsletter

gwernAug 20, 2020, 4:39 PM
29 points

6 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(www.gwern.net)

Do what we mean vs. do what we say

Rohin ShahAug 30, 2018, 10:03 PM
34 points

15 votes

Overall karma indicates overall quality.

14 comments1 min readLW link

[Question] What’s a De­com­pos­able Align­ment Topic?

Logan RiggsAug 21, 2020, 10:57 PM
26 points

11 votes

Overall karma indicates overall quality.

16 comments1 min readLW link

Tools ver­sus agents

Stuart_ArmstrongMay 16, 2012, 1:00 PM
47 points

32 votes

Overall karma indicates overall quality.

39 comments5 min readLW link

An un­al­igned benchmark

paulfchristianoNov 17, 2018, 3:51 PM
31 points

10 votes

Overall karma indicates overall quality.

0 comments9 min readLW link

Fol­low­ing hu­man norms

Rohin ShahJan 20, 2019, 11:59 PM
30 points

13 votes

Overall karma indicates overall quality.

10 comments5 min readLW link

nos­talge­braist: Re­cur­sive Good­hart’s Law

Kaj_SotalaAug 26, 2020, 11:07 AM
53 points

19 votes

Overall karma indicates overall quality.

27 comments1 min readLW link
(nostalgebraist.tumblr.com)

[AN #114]: The­ory-in­spired safety solu­tions for pow­er­ful Bayesian RL agents

Rohin ShahAug 26, 2020, 5:20 PM
21 points

7 votes

Overall karma indicates overall quality.

3 comments8 min readLW link
(mailchi.mp)

[Question] How hard would it be to change GPT-3 in a way that al­lows au­dio?

ChristianKlAug 28, 2020, 2:42 PM
8 points

3 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Safe Scram­bling?

HoagyAug 29, 2020, 2:31 PM
3 points

1 vote

Overall karma indicates overall quality.

1 comment2 min readLW link

(Hu­mor) AI Align­ment Crit­i­cal Failure Table

Kaj_SotalaAug 31, 2020, 7:51 PM
24 points

11 votes

Overall karma indicates overall quality.

2 comments1 min readLW link
(sl4.org)

What is am­bi­tious value learn­ing?

Rohin ShahNov 1, 2018, 4:20 PM
49 points

23 votes

Overall karma indicates overall quality.

28 comments2 min readLW link

The easy goal in­fer­ence prob­lem is still hard

paulfchristianoNov 3, 2018, 2:41 PM
50 points

22 votes

Overall karma indicates overall quality.

19 comments4 min readLW link

[AN #115]: AI safety re­search prob­lems in the AI-GA framework

Rohin ShahSep 2, 2020, 5:10 PM
19 points

6 votes

Overall karma indicates overall quality.

16 comments6 min readLW link
(mailchi.mp)

Emo­tional valence vs RL re­ward: a video game analogy

Steven ByrnesSep 3, 2020, 3:28 PM
12 points

9 votes

Overall karma indicates overall quality.

6 comments4 min readLW link

Us­ing GPT-N to Solve In­ter­pretabil­ity of Neu­ral Net­works: A Re­search Agenda

Sep 3, 2020, 6:27 PM
67 points

21 votes

Overall karma indicates overall quality.

12 comments2 min readLW link

“Learn­ing to Sum­ma­rize with Hu­man Feed­back”—OpenAI

[deleted]Sep 7, 2020, 5:59 PM
57 points

16 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

[AN #116]: How to make ex­pla­na­tions of neu­rons compositional

Rohin ShahSep 9, 2020, 5:20 PM
21 points

8 votes

Overall karma indicates overall quality.

2 comments9 min readLW link
(mailchi.mp)

Safer sand­box­ing via col­lec­tive separation

Richard_NgoSep 9, 2020, 7:49 PM
24 points

8 votes

Overall karma indicates overall quality.

6 comments4 min readLW link

[Question] Do mesa-op­ti­mizer risk ar­gu­ments rely on the train-test paradigm?

Ben CottierSep 10, 2020, 3:36 PM
12 points

7 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

Safety via se­lec­tion for obedience

Richard_NgoSep 10, 2020, 10:04 AM
31 points

14 votes

Overall karma indicates overall quality.

1 comment5 min readLW link

How Much Com­pu­ta­tional Power Does It Take to Match the Hu­man Brain?

habrykaSep 12, 2020, 6:38 AM
44 points

15 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(www.openphilanthropy.org)

De­ci­sion The­ory is multifaceted

Michele CampoloSep 13, 2020, 10:30 PM
7 points

4 votes

Overall karma indicates overall quality.

12 comments8 min readLW link

AI Safety Dis­cus­sion Day

Linda LinseforsSep 15, 2020, 2:40 PM
20 points

4 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

[AN #117]: How neu­ral nets would fare un­der the TEVV framework

Rohin ShahSep 16, 2020, 5:20 PM
27 points

6 votes

Overall karma indicates overall quality.

0 comments7 min readLW link
(mailchi.mp)

Ap­ply­ing the Coun­ter­fac­tual Pri­soner’s Dilemma to Log­i­cal Uncertainty

Chris_LeongSep 16, 2020, 10:34 AM
9 points

2 votes

Overall karma indicates overall quality.

5 comments2 min readLW link

Ar­tifi­cial In­tel­li­gence: A Modern Ap­proach (4th edi­tion) on the Align­ment Problem

Zack_M_DavisSep 17, 2020, 2:23 AM
72 points

35 votes

Overall karma indicates overall quality.

12 comments5 min readLW link
(aima.cs.berkeley.edu)

The “Backchain­ing to Lo­cal Search” Tech­nique in AI Alignment

adamShimiSep 18, 2020, 3:05 PM
28 points

10 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

Draft re­port on AI timelines

Ajeya CotraSep 18, 2020, 11:47 PM
207 points

84 votes

Overall karma indicates overall quality.

56 comments1 min readLW link1 review

Why GPT wants to mesa-op­ti­mize & how we might change this

John_MaxwellSep 19, 2020, 1:48 PM
55 points

21 votes

Overall karma indicates overall quality.

32 comments9 min readLW link

My (Mis)Ad­ven­tures With Al­gorith­mic Ma­chine Learning

AHartNtknSep 20, 2020, 5:31 AM
16 points

6 votes

Overall karma indicates overall quality.

4 comments41 min readLW link

[Question] What AI com­pa­nies would be most likely to have a pos­i­tive long-term im­pact on the world as a re­sult of in­vest­ing in them?

MikkWSep 21, 2020, 11:41 PM
8 points

4 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

An­thro­po­mor­phi­sa­tion vs value learn­ing: type 1 vs type 2 errors

Stuart_ArmstrongSep 22, 2020, 10:46 AM
16 points

5 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

AI Ad­van­tages [Gems from the Wiki]

Sep 22, 2020, 10:44 PM
22 points

10 votes

Overall karma indicates overall quality.

7 comments2 min readLW link
(www.lesswrong.com)

A long re­ply to Ben Garfinkel on Scru­ti­niz­ing Clas­sic AI Risk Arguments

Søren ElverlinSep 27, 2020, 5:51 PM
17 points

7 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

De­hu­man­i­sa­tion *er­rors*

Stuart_ArmstrongSep 23, 2020, 9:51 AM
13 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

[AN #118]: Risks, solu­tions, and pri­ori­ti­za­tion in a world with many AI systems

Rohin ShahSep 23, 2020, 6:20 PM
15 points

6 votes

Overall karma indicates overall quality.

6 comments10 min readLW link
(mailchi.mp)

[Question] David Deutsch on Univer­sal Ex­plain­ers and AI

alanfSep 24, 2020, 7:50 AM
3 points

3 votes

Overall karma indicates overall quality.

8 comments2 min readLW link

KL Diver­gence as Code Patch­ing Efficiency

Past AccountSep 27, 2020, 4:06 PM
17 points

6 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

[Question] What to do with imi­ta­tion hu­mans, other than ask­ing them what the right thing to do is?

Charlie SteinerSep 27, 2020, 9:51 PM
10 points

3 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

[Question] What De­ci­sion The­ory is Im­plied By Pre­dic­tive Pro­cess­ing?

johnswentworthSep 28, 2020, 5:20 PM
55 points

17 votes

Overall karma indicates overall quality.

17 comments1 min readLW link

AGI safety from first prin­ci­ples: Superintelligence

Richard_NgoSep 28, 2020, 7:53 PM
80 points

32 votes

Overall karma indicates overall quality.

6 comments9 min readLW link

AGI safety from first prin­ci­ples: Introduction

Richard_NgoSep 28, 2020, 7:53 PM
109 points

51 votes

Overall karma indicates overall quality.

18 comments2 min readLW link1 review

[Question] Ex­am­ples of self-gov­er­nance to re­duce tech­nol­ogy risk?

JiaSep 29, 2020, 7:31 PM
10 points

3 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

AGI safety from first prin­ci­ples: Goals and Agency

Richard_NgoSep 29, 2020, 7:06 PM
70 points

26 votes

Overall karma indicates overall quality.

15 comments15 min readLW link

“Un­su­per­vised” trans­la­tion as an (in­tent) al­ign­ment problem

paulfchristianoSep 30, 2020, 12:50 AM
61 points

20 votes

Overall karma indicates overall quality.

15 comments4 min readLW link
(ai-alignment.com)

[AN #119]: AI safety when agents are shaped by en­vi­ron­ments, not rewards

Rohin ShahSep 30, 2020, 5:10 PM
11 points

3 votes

Overall karma indicates overall quality.

0 comments11 min readLW link
(mailchi.mp)

AGI safety from first prin­ci­ples: Control

Richard_NgoOct 2, 2020, 9:51 PM
61 points

23 votes

Overall karma indicates overall quality.

4 comments9 min readLW link

AI race con­sid­er­a­tions in a re­port by the U.S. House Com­mit­tee on Armed Services

NunoSempereOct 4, 2020, 12:11 PM
42 points

26 votes

Overall karma indicates overall quality.

4 comments13 min readLW link

[Question] Is there any work on in­cor­po­rat­ing aleatoric un­cer­tainty and/​or in­her­ent ran­dom­ness into AIXI?

David Scott Krueger (formerly: capybaralet)Oct 4, 2020, 8:10 AM
9 points

3 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

AGI safety from first prin­ci­ples: Conclusion

Richard_NgoOct 4, 2020, 11:06 PM
65 points

25 votes

Overall karma indicates overall quality.

4 comments3 min readLW link

Univer­sal Eudaimonia

hg00Oct 5, 2020, 1:45 PM
19 points

11 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

The Align­ment Prob­lem: Ma­chine Learn­ing and Hu­man Values

Rohin ShahOct 6, 2020, 5:41 PM
120 points

46 votes

Overall karma indicates overall quality.

7 comments6 min readLW link1 review
(www.amazon.com)

[AN #120]: Trac­ing the in­tel­lec­tual roots of AI and AI alignment

Rohin ShahOct 7, 2020, 5:10 PM
13 points

4 votes

Overall karma indicates overall quality.

4 comments10 min readLW link
(mailchi.mp)

[Question] Brain­storm­ing pos­i­tive vi­sions of AI

jungofthewonOct 7, 2020, 4:09 PM
52 points

16 votes

Overall karma indicates overall quality.

25 comments1 min readLW link

[Question] How can an AI demon­strate purely through chat that it is an AI, and not a hu­man?

hugh.mannOct 7, 2020, 5:53 PM
3 points

2 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

[Question] Why isn’t JS a pop­u­lar lan­guage for deep learn­ing?

Will ClarkOct 8, 2020, 2:36 PM
12 points

7 votes

Overall karma indicates overall quality.

21 comments1 min readLW link

[Question] If GPT-6 is hu­man-level AGI but costs $200 per page of out­put, what would hap­pen?

Daniel KokotajloOct 9, 2020, 12:00 PM
28 points

13 votes

Overall karma indicates overall quality.

30 comments1 min readLW link

[Question] Shouldn’t there be a Chi­nese trans­la­tion of Hu­man Com­pat­i­ble?

mako yassOct 9, 2020, 8:47 AM
18 points

9 votes

Overall karma indicates overall quality.

13 comments1 min readLW link

Ideal­ized Fac­tored Cognition

Rafael HarthNov 30, 2020, 6:49 PM
34 points

9 votes

Overall karma indicates overall quality.

6 comments11 min readLW link

[Question] Re­views of the book ‘The Align­ment Prob­lem’

Mati_RoyOct 11, 2020, 7:41 AM
8 points

3 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

[Question] Re­views of TV show NeXt (about AI safety)

Mati_RoyOct 11, 2020, 4:31 AM
25 points

9 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

The Achilles Heel Hy­poth­e­sis for AI

scasperOct 13, 2020, 2:35 PM
20 points

9 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Toy Prob­lem: De­tec­tive Story Alignment

johnswentworthOct 13, 2020, 9:02 PM
34 points

11 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

[Question] Does any­one worry about A.I. fo­rums like this where they re­in­force each other’s bi­ases/​ are led by big tech?

misabella16Oct 13, 2020, 3:14 PM
4 points

3 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

[AN #121]: Fore­cast­ing trans­for­ma­tive AI timelines us­ing biolog­i­cal anchors

Rohin ShahOct 14, 2020, 5:20 PM
27 points

8 votes

Overall karma indicates overall quality.

5 comments14 min readLW link
(mailchi.mp)

Gra­di­ent hacking

evhubOct 16, 2019, 12:53 AM
99 points

35 votes

Overall karma indicates overall quality.

39 comments3 min readLW link2 reviews

Im­pact mea­sure­ment and value-neu­tral­ity verification

evhubOct 15, 2019, 12:06 AM
31 points

10 votes

Overall karma indicates overall quality.

13 comments6 min readLW link

Outer al­ign­ment and imi­ta­tive amplification

evhubJan 10, 2020, 12:26 AM
24 points

8 votes

Overall karma indicates overall quality.

11 comments9 min readLW link

Safe ex­plo­ra­tion and corrigibility

evhubDec 28, 2019, 11:12 PM
17 points

8 votes

Overall karma indicates overall quality.

4 comments4 min readLW link

[Question] What are some non-purely-sam­pling ways to do deep RL?

evhubDec 5, 2019, 12:09 AM
15 points

5 votes

Overall karma indicates overall quality.

9 comments2 min readLW link

More vari­a­tions on pseudo-alignment

evhubNov 4, 2019, 11:24 PM
26 points

7 votes

Overall karma indicates overall quality.

8 comments3 min readLW link

Towards an em­piri­cal in­ves­ti­ga­tion of in­ner alignment

evhubSep 23, 2019, 8:43 PM
44 points

13 votes

Overall karma indicates overall quality.

9 comments6 min readLW link

Are min­i­mal cir­cuits de­cep­tive?

evhubSep 7, 2019, 6:11 PM
66 points

19 votes

Overall karma indicates overall quality.

11 comments8 min readLW link

Con­crete ex­per­i­ments in in­ner alignment

evhubSep 6, 2019, 10:16 PM
63 points

23 votes

Overall karma indicates overall quality.

12 comments6 min readLW link

Towards a mechanis­tic un­der­stand­ing of corrigibility

evhubAug 22, 2019, 11:20 PM
44 points

18 votes

Overall karma indicates overall quality.

26 comments6 min readLW link

A Con­crete Pro­posal for Ad­ver­sar­ial IDA

evhubMar 26, 2019, 7:50 PM
16 points

6 votes

Overall karma indicates overall quality.

5 comments5 min readLW link

Nuances with as­crip­tion universality

evhubFeb 12, 2019, 11:38 PM
20 points

7 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

Box in­ver­sion hypothesis

Jan KulveitOct 20, 2020, 4:20 PM
59 points

23 votes

Overall karma indicates overall quality.

4 comments3 min readLW link

[Question] Has any­one re­searched speci­fi­ca­tion gam­ing with biolog­i­cal an­i­mals?

David Scott Krueger (formerly: capybaralet)Oct 21, 2020, 12:20 AM
9 points

5 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Sun­day Oc­to­ber 25, 12:00PM (PT) — Scott Garrabrant on “Carte­sian Frames”

Ben PaceOct 21, 2020, 3:27 AM
48 points

11 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

[Question] Could we use recom­mender sys­tems to figure out hu­man val­ues?

Olga BabeevaOct 20, 2020, 9:35 PM
7 points

3 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

[Question] When was the term “AI al­ign­ment” coined?

David Scott Krueger (formerly: capybaralet)Oct 21, 2020, 6:27 PM
11 points

4 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

[AN #122]: Ar­gu­ing for AGI-driven ex­is­ten­tial risk from first principles

Rohin ShahOct 21, 2020, 5:10 PM
28 points

8 votes

Overall karma indicates overall quality.

0 comments9 min readLW link
(mailchi.mp)

[Question] What’s the differ­ence be­tween GAI and a gov­ern­ment?

DirectedEvolutionOct 21, 2020, 11:04 PM
11 points

4 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Mo­ral AI: Options

ManfredJul 11, 2015, 9:46 PM
14 points

10 votes

Overall karma indicates overall quality.

6 comments4 min readLW link

Can few-shot learn­ing teach AI right from wrong?

Charlie SteinerJul 20, 2018, 7:45 AM
13 points

5 votes

Overall karma indicates overall quality.

3 comments6 min readLW link

Some Com­ments on Stu­art Arm­strong’s “Re­search Agenda v0.9”

Charlie SteinerJul 8, 2019, 7:03 PM
21 points

8 votes

Overall karma indicates overall quality.

12 comments4 min readLW link

The Ar­tifi­cial In­ten­tional Stance

Charlie SteinerJul 27, 2019, 7:00 AM
12 points

5 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

What’s the dream for giv­ing nat­u­ral lan­guage com­mands to AI?

Charlie SteinerOct 8, 2019, 1:42 PM
8 points

3 votes

Overall karma indicates overall quality.

8 comments7 min readLW link

Su­per­vised learn­ing of out­puts in the brain

Steven ByrnesOct 26, 2020, 2:32 PM
27 points

11 votes

Overall karma indicates overall quality.

9 comments10 min readLW link

Hu­mans are stun­ningly ra­tio­nal and stun­ningly irrational

Stuart_ArmstrongOct 23, 2020, 2:13 PM
21 points

9 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

Re­ply to Je­bari and Lund­borg on Ar­tifi­cial Superintelligence

Richard_NgoOct 25, 2020, 1:50 PM
31 points

11 votes

Overall karma indicates overall quality.

4 comments5 min readLW link
(thinkingcomplete.blogspot.com)

Ad­di­tive Oper­a­tions on Carte­sian Frames

Scott GarrabrantOct 26, 2020, 3:12 PM
61 points

18 votes

Overall karma indicates overall quality.

6 comments11 min readLW link

Se­cu­rity Mind­set and Take­off Speeds

DanielFilanOct 27, 2020, 3:20 AM
54 points

14 votes

Overall karma indicates overall quality.

23 comments8 min readLW link
(danielfilan.com)

Biex­ten­sional Equivalence

Scott GarrabrantOct 28, 2020, 2:07 PM
43 points

11 votes

Overall karma indicates overall quality.

13 comments10 min readLW link

Draft pa­pers for REALab and De­cou­pled Ap­proval on tampering

Oct 28, 2020, 4:01 PM
47 points

15 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

[AN #123]: In­fer­ring what is valuable in or­der to al­ign recom­mender systems

Rohin ShahOct 28, 2020, 5:00 PM
20 points

6 votes

Overall karma indicates overall quality.

1 comment8 min readLW link
(mailchi.mp)

“Scal­ing Laws for Au­tore­gres­sive Gen­er­a­tive Model­ing”, Henighan et al 2020 {OA}

gwernOct 29, 2020, 1:45 AM
26 points

8 votes

Overall karma indicates overall quality.

11 comments1 min readLW link
(arxiv.org)

Con­trol­lables and Ob­serv­ables, Revisited

Scott GarrabrantOct 29, 2020, 4:38 PM
34 points

5 votes

Overall karma indicates overall quality.

5 comments8 min readLW link

AI risk hub in Sin­ga­pore?

Daniel KokotajloOct 29, 2020, 11:45 AM
57 points

21 votes

Overall karma indicates overall quality.

18 comments4 min readLW link

Func­tors and Coarse Worlds

Scott GarrabrantOct 30, 2020, 3:19 PM
50 points

14 votes

Overall karma indicates overall quality.

4 comments8 min readLW link

[Question] Re­sponses to Chris­ti­ano on take­off speeds?

Richard_NgoOct 30, 2020, 3:16 PM
29 points

11 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

/​r/​MLS­cal­ing: new sub­red­dit for NN scal­ing re­search/​discussion

gwernOct 30, 2020, 8:50 PM
20 points

8 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(www.reddit.com)

“In­ner Align­ment Failures” Which Are Ac­tu­ally Outer Align­ment Failures

johnswentworthOct 31, 2020, 8:18 PM
61 points

22 votes

Overall karma indicates overall quality.

38 comments5 min readLW link

Au­to­mated in­tel­li­gence is not AI

KatjaGraceNov 1, 2020, 11:30 PM
54 points

20 votes

Overall karma indicates overall quality.

10 comments2 min readLW link
(meteuphoric.com)

Con­fu­ci­anism in AI Alignment

johnswentworthNov 2, 2020, 9:16 PM
33 points

14 votes

Overall karma indicates overall quality.

28 comments6 min readLW link

[AN #124]: Prov­ably safe ex­plo­ra­tion through shielding

Rohin ShahNov 4, 2020, 6:20 PM
13 points

5 votes

Overall karma indicates overall quality.

0 comments9 min readLW link
(mailchi.mp)

Defin­ing ca­pa­bil­ity and al­ign­ment in gra­di­ent descent

Edouard HarrisNov 5, 2020, 2:36 PM
22 points

9 votes

Overall karma indicates overall quality.

6 comments10 min readLW link

Sub-Sums and Sub-Tensors

Scott GarrabrantNov 5, 2020, 6:06 PM
34 points

6 votes

Overall karma indicates overall quality.

4 comments8 min readLW link

Mul­ti­plica­tive Oper­a­tions on Carte­sian Frames

Scott GarrabrantNov 3, 2020, 7:27 PM
34 points

9 votes

Overall karma indicates overall quality.

23 comments12 min readLW link

Subagents of Carte­sian Frames

Scott GarrabrantNov 2, 2020, 10:02 PM
48 points

13 votes

Overall karma indicates overall quality.

5 comments8 min readLW link

[Question] What con­sid­er­a­tions in­fluence whether I have more in­fluence over short or long timelines?

Daniel KokotajloNov 5, 2020, 7:56 PM
27 points

12 votes

Overall karma indicates overall quality.

30 comments1 min readLW link

Ad­di­tive and Mul­ti­plica­tive Subagents

Scott GarrabrantNov 6, 2020, 2:26 PM
20 points

5 votes

Overall karma indicates overall quality.

7 comments12 min readLW link

Com­mit­ting, As­sum­ing, Ex­ter­nal­iz­ing, and Internalizing

Scott GarrabrantNov 9, 2020, 4:59 PM
31 points

8 votes

Overall karma indicates overall quality.

25 comments10 min readLW link

Build­ing AGI Us­ing Lan­guage Models

leogaoNov 9, 2020, 4:33 PM
11 points

6 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(leogao.dev)

Why You Should Care About Goal-Directedness

adamShimiNov 9, 2020, 12:48 PM
37 points

12 votes

Overall karma indicates overall quality.

15 comments9 min readLW link

Clar­ify­ing in­ner al­ign­ment terminology

evhubNov 9, 2020, 8:40 PM
98 points

34 votes

Overall karma indicates overall quality.

17 comments3 min readLW link1 review

Eight Defi­ni­tions of Observability

Scott GarrabrantNov 10, 2020, 11:37 PM
34 points

8 votes

Overall karma indicates overall quality.

26 comments12 min readLW link

[AN #125]: Neu­ral net­work scal­ing laws across mul­ti­ple modalities

Rohin ShahNov 11, 2020, 6:20 PM
25 points

9 votes

Overall karma indicates overall quality.

7 comments9 min readLW link
(mailchi.mp)

Time in Carte­sian Frames

Scott GarrabrantNov 11, 2020, 8:25 PM
48 points

11 votes

Overall karma indicates overall quality.

16 comments7 min readLW link

Learn­ing Nor­ma­tivity: A Re­search Agenda

abramdemskiNov 11, 2020, 9:59 PM
76 points

21 votes

Overall karma indicates overall quality.

18 comments19 min readLW link

[Question] Any work on hon­ey­pots (to de­tect treach­er­ous turn at­tempts)?

David Scott Krueger (formerly: capybaralet)Nov 12, 2020, 5:41 AM
17 points

7 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Misal­ign­ment and mi­suse: whose val­ues are man­i­fest?

KatjaGraceNov 13, 2020, 10:10 AM
42 points

9 votes

Overall karma indicates overall quality.

7 comments2 min readLW link
(meteuphoric.com)

A Self-Embed­ded Prob­a­bil­is­tic Model

johnswentworthNov 13, 2020, 8:36 PM
30 points

4 votes

Overall karma indicates overall quality.

2 comments5 min readLW link

TU Darm­stadt, Com­puter Science Master’s with a fo­cus on Ma­chine Learning

Master Programs ML/AINov 14, 2020, 3:50 PM
6 points

5 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

EPF Lau­sanne, ML re­lated MSc programs

Master Programs ML/AINov 14, 2020, 3:51 PM
3 points

3 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

ETH Zurich, ML re­lated MSc programs

Master Programs ML/AINov 14, 2020, 3:49 PM
3 points

3 votes

Overall karma indicates overall quality.

0 comments10 min readLW link

Univer­sity of Oxford, Master’s Statis­ti­cal Science

Master Programs ML/AINov 14, 2020, 3:51 PM
3 points

3 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

Univer­sity of Ed­in­burgh, Master’s Ar­tifi­cial Intelligence

Master Programs ML/AINov 14, 2020, 3:49 PM
4 points

3 votes

Overall karma indicates overall quality.

0 comments12 min readLW link

Univer­sity of Am­s­ter­dam (UvA), Master’s Ar­tifi­cial Intelligence

Master Programs ML/AINov 14, 2020, 3:49 PM
16 points

9 votes

Overall karma indicates overall quality.

6 comments21 min readLW link

Univer­sity of Tübin­gen, Master’s Ma­chine Learning

Master Programs ML/AINov 14, 2020, 3:50 PM
14 points

10 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

A guide to Iter­ated Am­plifi­ca­tion & Debate

Rafael HarthNov 15, 2020, 5:14 PM
68 points

20 votes

Overall karma indicates overall quality.

10 comments15 min readLW link

Solomonoff In­duc­tion and Sleep­ing Beauty

ikeNov 17, 2020, 2:28 AM
7 points

2 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

The Poin­t­ers Prob­lem: Hu­man Values Are A Func­tion Of Hu­mans’ La­tent Variables

johnswentworthNov 18, 2020, 5:47 PM
104 points

42 votes

Overall karma indicates overall quality.

43 comments11 min readLW link2 reviews

The ethics of AI for the Rout­ledge En­cy­clo­pe­dia of Philosophy

Stuart_ArmstrongNov 18, 2020, 5:55 PM
45 points

11 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

Per­sua­sion Tools: AI takeover with­out AGI or agency?

Daniel KokotajloNov 20, 2020, 4:54 PM
74 points

35 votes

Overall karma indicates overall quality.

24 comments11 min readLW link1 review

UDT might not pay a Coun­ter­fac­tual Mugger

winwonceNov 21, 2020, 11:27 PM
5 points

3 votes

Overall karma indicates overall quality.

18 comments2 min readLW link

Chang­ing the AI race pay­off matrix

GurkenglasNov 22, 2020, 10:25 PM
7 points

2 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Syn­tax, se­man­tics, and sym­bol ground­ing, simplified

Stuart_ArmstrongNov 23, 2020, 4:12 PM
30 points

14 votes

Overall karma indicates overall quality.

4 comments9 min readLW link

Com­men­tary on AGI Safety from First Principles

Richard_NgoNov 23, 2020, 9:37 PM
80 points

25 votes

Overall karma indicates overall quality.

4 comments54 min readLW link

[Question] Cri­tiques of the Agent Foun­da­tions agenda?

JsevillamolNov 24, 2020, 4:11 PM
16 points

7 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

[Question] How should OpenAI com­mu­ni­cate about the com­mer­cial perfor­mances of the GPT-3 API?

Maxime RichéNov 24, 2020, 8:34 AM
2 points

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

[AN #126]: Avoid­ing wire­head­ing by de­cou­pling ac­tion feed­back from ac­tion effects

Rohin ShahNov 26, 2020, 11:20 PM
24 points

8 votes

Overall karma indicates overall quality.

1 comment10 min readLW link
(mailchi.mp)

[Question] Is this a good way to bet on short timelines?

Daniel KokotajloNov 28, 2020, 12:51 PM
16 points

8 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

Pre­face to the Se­quence on Fac­tored Cognition

Rafael HarthNov 30, 2020, 6:49 PM
35 points

8 votes

Overall karma indicates overall quality.

7 comments2 min readLW link

[Linkpost] AlphaFold: a solu­tion to a 50-year-old grand challenge in biology

adamShimiNov 30, 2020, 5:33 PM
54 points

28 votes

Overall karma indicates overall quality.

22 comments1 min readLW link
(deepmind.com)

What is “pro­tein fold­ing”? A brief explanation

jasoncrawfordDec 1, 2020, 2:46 AM
69 points

36 votes

Overall karma indicates overall quality.

9 comments4 min readLW link
(rootsofprogress.org)

[Question] In a mul­ti­po­lar sce­nario, how do peo­ple ex­pect sys­tems to be trained to in­ter­act with sys­tems de­vel­oped by other labs?

JesseCliftonDec 1, 2020, 8:04 PM
11 points

6 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

[AN #127]: Re­think­ing agency: Carte­sian frames as a for­mal­iza­tion of ways to carve up the world into an agent and its environment

Rohin ShahDec 2, 2020, 6:20 PM
46 points

12 votes

Overall karma indicates overall quality.

0 comments13 min readLW link
(mailchi.mp)

Beyond 175 billion pa­ram­e­ters: Can we an­ti­ci­pate fu­ture GPT-X Ca­pa­bil­ities?

bakztfutureDec 4, 2020, 11:42 PM
−1 points

2 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

Thoughts on Robin Han­son’s AI Im­pacts interview

Steven ByrnesNov 24, 2019, 1:40 AM
25 points

16 votes

Overall karma indicates overall quality.

3 comments7 min readLW link

[RXN#7] Rus­sian x-risks newslet­ter fall 2020

avturchinDec 5, 2020, 4:28 PM
12 points

5 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

The AI Safety Game (UPDATED)

Daniel KokotajloDec 5, 2020, 10:27 AM
44 points

20 votes

Overall karma indicates overall quality.

9 comments3 min readLW link

Values Form a Shift­ing Land­scape (and why you might care)

VojtaKovarikDec 5, 2020, 11:56 PM
28 points

10 votes

Overall karma indicates overall quality.

6 comments4 min readLW link

AI Prob­lems Shared by Non-AI Systems

VojtaKovarikDec 5, 2020, 10:15 PM
7 points

3 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

Chance that “AI safety ba­si­cally [doesn’t need] to be solved, we’ll just solve it by de­fault un­less we’re com­pletely com­pletely care­less”

Dec 8, 2020, 9:08 PM
27 points

12 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Min­i­mal Maps, Semi-De­ci­sions, and Neu­ral Representations

Past AccountDec 6, 2020, 3:15 PM
30 points

6 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

Launch­ing the Fore­cast­ing AI Progress Tournament

TamayDec 7, 2020, 2:08 PM
20 points

8 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(www.metaculus.com)

[AN #128]: Pri­ori­tiz­ing re­search on AI ex­is­ten­tial safety based on its ap­pli­ca­tion to gov­er­nance demands

Rohin ShahDec 9, 2020, 6:20 PM
16 points

6 votes

Overall karma indicates overall quality.

2 comments10 min readLW link
(mailchi.mp)

Sum­mary of AI Re­search Con­sid­er­a­tions for Hu­man Ex­is­ten­tial Safety (ARCHES)

peterbarnettDec 9, 2020, 11:28 PM
10 points

5 votes

Overall karma indicates overall quality.

0 comments13 min readLW link

Clar­ify­ing Fac­tored Cognition

Rafael HarthDec 13, 2020, 8:02 PM
23 points

4 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

Ho­mo­gene­ity vs. het­ero­gene­ity in AI take­off scenarios

evhubDec 16, 2020, 1:37 AM
95 points

27 votes

Overall karma indicates overall quality.

48 comments4 min readLW link

LBIT Proofs 8: Propo­si­tions 53-58

DiffractorDec 16, 2020, 3:29 AM
7 points

2 votes

Overall karma indicates overall quality.

0 comments18 min readLW link

LBIT Proofs 6: Propo­si­tions 39-47

DiffractorDec 16, 2020, 3:33 AM
7 points

2 votes

Overall karma indicates overall quality.

0 comments23 min readLW link

LBIT Proofs 5: Propo­si­tions 29-38

DiffractorDec 16, 2020, 3:35 AM
7 points

2 votes

Overall karma indicates overall quality.

0 comments21 min readLW link

LBIT Proofs 3: Propo­si­tions 19-22

DiffractorDec 16, 2020, 3:40 AM
7 points

2 votes

Overall karma indicates overall quality.

0 comments17 min readLW link

LBIT Proofs 2: Propo­si­tions 10-18

DiffractorDec 16, 2020, 3:45 AM
7 points

2 votes

Overall karma indicates overall quality.

0 comments20 min readLW link

LBIT Proofs 1: Propo­si­tions 1-9

DiffractorDec 16, 2020, 3:48 AM
7 points

2 votes

Overall karma indicates overall quality.

0 comments25 min readLW link

LBIT Proofs 4: Propo­si­tions 22-28

DiffractorDec 16, 2020, 3:38 AM
7 points

2 votes

Overall karma indicates overall quality.

0 comments17 min readLW link

LBIT Proofs 7: Propo­si­tions 48-52

DiffractorDec 16, 2020, 3:31 AM
7 points

2 votes

Overall karma indicates overall quality.

0 comments20 min readLW link

Less Ba­sic In­framea­sure Theory

DiffractorDec 16, 2020, 3:52 AM
22 points

4 votes

Overall karma indicates overall quality.

1 comment61 min readLW link

[AN #129]: Ex­plain­ing dou­ble de­scent by mea­sur­ing bias and variance

Rohin ShahDec 16, 2020, 6:10 PM
14 points

5 votes

Overall karma indicates overall quality.

1 comment7 min readLW link
(mailchi.mp)

Ma­chine learn­ing could be fun­da­men­tally unexplainable

George3d6Dec 16, 2020, 1:32 PM
26 points

19 votes

Overall karma indicates overall quality.

15 comments15 min readLW link
(cerebralab.com)

Beta test GPT-3 based re­search assistant

jungofthewonDec 16, 2020, 1:42 PM
34 points

12 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

[Question] How long till In­verse AlphaFold?

Daniel KokotajloDec 17, 2020, 7:56 PM
41 points

15 votes

Overall karma indicates overall quality.

18 comments1 min readLW link

Hier­ar­chi­cal plan­ning: con­text agents

Charlie SteinerDec 19, 2020, 11:24 AM
21 points

10 votes

Overall karma indicates overall quality.

6 comments9 min readLW link

[Question] Is there a com­mu­nity al­igned with the idea of cre­at­ing species of AGI sys­tems for them to be­come our suc­ces­sors?

iamhefestoDec 20, 2020, 7:06 PM
−2 points

4 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

Intuition

Rafael HarthDec 20, 2020, 9:49 PM
26 points

4 votes

Overall karma indicates overall quality.

1 comment6 min readLW link

2020 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

LarksDec 21, 2020, 3:27 PM
137 points

46 votes

Overall karma indicates overall quality.

14 comments68 min readLW link

TAI Safety Biblio­graphic Database

JessRiedelDec 22, 2020, 5:42 PM
70 points

24 votes

Overall karma indicates overall quality.

10 comments17 min readLW link

An­nounc­ing AXRP, the AI X-risk Re­search Podcast

DanielFilanDec 23, 2020, 8:00 PM
54 points

19 votes

Overall karma indicates overall quality.

6 comments1 min readLW link
(danielfilan.com)

[AN #130]: A new AI x-risk pod­cast, and re­views of the field

Rohin ShahDec 24, 2020, 6:20 PM
8 points

4 votes

Overall karma indicates overall quality.

0 comments7 min readLW link
(mailchi.mp)

Can we model tech­nolog­i­cal sin­gu­lar­ity as the phase tran­si­tion?

Valentin2026Dec 26, 2020, 3:20 AM
4 points

4 votes

Overall karma indicates overall quality.

3 comments4 min readLW link

AGI Align­ment Should Solve Cor­po­rate Alignment

magfrumpDec 27, 2020, 2:23 AM
19 points

9 votes

Overall karma indicates overall quality.

6 comments6 min readLW link

Against GDP as a met­ric for timelines and take­off speeds

Daniel KokotajloDec 29, 2020, 5:42 PM
131 points

43 votes

Overall karma indicates overall quality.

16 comments14 min readLW link1 review

AXRP Epi­sode 3 - Ne­go­tiable Re­in­force­ment Learn­ing with An­drew Critch

DanielFilanDec 29, 2020, 8:45 PM
26 points

7 votes

Overall karma indicates overall quality.

0 comments27 min readLW link

AXRP Epi­sode 1 - Ad­ver­sar­ial Poli­cies with Adam Gleave

DanielFilanDec 29, 2020, 8:41 PM
12 points

4 votes

Overall karma indicates overall quality.

5 comments33 min readLW link

AXRP Epi­sode 2 - Learn­ing Hu­man Bi­ases with Ro­hin Shah

DanielFilanDec 29, 2020, 8:43 PM
13 points

4 votes

Overall karma indicates overall quality.

0 comments35 min readLW link

Dario Amodei leaves OpenAI

Daniel KokotajloDec 29, 2020, 7:31 PM
69 points

23 votes

Overall karma indicates overall quality.

12 comments1 min readLW link

[Question] What Are Some Alter­na­tive Ap­proaches to Un­der­stand­ing Agency/​In­tel­li­gence?

intersticeDec 29, 2020, 11:21 PM
15 points

10 votes

Overall karma indicates overall quality.

12 comments1 min readLW link

Why Neu­ral Net­works Gen­er­al­ise, and Why They Are (Kind of) Bayesian

Joar SkalseDec 29, 2020, 1:33 PM
67 points

26 votes

Overall karma indicates overall quality.

58 comments1 min readLW link1 review

De­bate Minus Fac­tored Cognition

abramdemskiDec 29, 2020, 10:59 PM
37 points

10 votes

Overall karma indicates overall quality.

42 comments11 min readLW link

[AN #131]: For­mal­iz­ing the ar­gu­ment of ig­nored at­tributes in a util­ity function

Rohin ShahDec 31, 2020, 6:20 PM
13 points

4 votes

Overall karma indicates overall quality.

4 comments9 min readLW link
(mailchi.mp)

Reflec­tions on Larks’ 2020 AI al­ign­ment liter­a­ture review

Alex FlintJan 1, 2021, 10:53 PM
79 points

25 votes

Overall karma indicates overall quality.

8 comments6 min readLW link

Men­tal sub­agent im­pli­ca­tions for AI Safety

moridinamaelJan 3, 2021, 6:59 PM
11 points

3 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

The Na­tional Defense Autho­riza­tion Act Con­tains AI Provisions

ryan_bJan 5, 2021, 3:51 PM
30 points

11 votes

Overall karma indicates overall quality.

24 comments1 min readLW link

The Poin­t­ers Prob­lem: Clar­ifi­ca­tions/​Variations

abramdemskiJan 5, 2021, 5:29 PM
50 points

15 votes

Overall karma indicates overall quality.

14 comments18 min readLW link

[AN #132]: Com­plex and sub­tly in­cor­rect ar­gu­ments as an ob­sta­cle to debate

Rohin ShahJan 6, 2021, 6:20 PM
19 points

6 votes

Overall karma indicates overall quality.

1 comment19 min readLW link
(mailchi.mp)

Out-of-body rea­son­ing (OOBR)

Jon ZeroJan 9, 2021, 4:10 PM
5 points

4 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Re­view of Soft Take­off Can Still Lead to DSA

Daniel KokotajloJan 10, 2021, 6:10 PM
75 points

17 votes

Overall karma indicates overall quality.

15 comments6 min readLW link

Re­view of ‘De­bate on In­stru­men­tal Con­ver­gence be­tween LeCun, Rus­sell, Ben­gio, Zador, and More’

TurnTroutJan 12, 2021, 3:57 AM
40 points

13 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

[AN #133]: Build­ing ma­chines that can co­op­er­ate (with hu­mans, in­sti­tu­tions, or other ma­chines)

Rohin ShahJan 13, 2021, 6:10 PM
14 points

5 votes

Overall karma indicates overall quality.

0 comments9 min readLW link
(mailchi.mp)

An Ex­plo­ra­tory Toy AI Take­off Model

niplavJan 13, 2021, 6:13 PM
10 points

4 votes

Overall karma indicates overall quality.

3 comments12 min readLW link

Some re­cent sur­vey pa­pers on (mostly near-term) AI safety, se­cu­rity, and assurance

Aryeh EnglanderJan 13, 2021, 9:50 PM
11 points

5 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

Thoughts on Ia­son Gabriel’s Ar­tifi­cial In­tel­li­gence, Values, and Alignment

Alex FlintJan 14, 2021, 12:58 PM
35 points

12 votes

Overall karma indicates overall quality.

14 comments4 min readLW link

Why I’m ex­cited about Debate

Richard_NgoJan 15, 2021, 11:37 PM
73 points

36 votes

Overall karma indicates overall quality.

12 comments7 min readLW link

Ex­cerpt from Ar­bital Solomonoff in­duc­tion dialogue

Richard_NgoJan 17, 2021, 3:49 AM
36 points

8 votes

Overall karma indicates overall quality.

6 comments5 min readLW link
(arbital.com)

Short sum­mary of mAIry’s room

Stuart_ArmstrongJan 18, 2021, 6:11 PM
26 points

8 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

DALL-E does sym­bol grounding

p.b.Jan 17, 2021, 9:20 PM
6 points

4 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Some thoughts on risks from nar­row, non-agen­tic AI

Richard_NgoJan 19, 2021, 12:04 AM
35 points

14 votes

Overall karma indicates overall quality.

21 comments16 min readLW link

Against the Back­ward Ap­proach to Goal-Directedness

adamShimiJan 19, 2021, 6:46 PM
19 points

9 votes

Overall karma indicates overall quality.

6 comments4 min readLW link

[AN #134]: Un­der­speci­fi­ca­tion as a cause of frag­ility to dis­tri­bu­tion shift

Rohin ShahJan 21, 2021, 6:10 PM
13 points

4 votes

Overall karma indicates overall quality.

0 comments7 min readLW link
(mailchi.mp)

Coun­ter­fac­tual con­trol incentives

Stuart_ArmstrongJan 21, 2021, 4:54 PM
21 points

8 votes

Overall karma indicates overall quality.

10 comments9 min readLW link

Policy re­stric­tions and Se­cret keep­ing AI

Donald HobsonJan 24, 2021, 8:59 PM
6 points

1 vote

Overall karma indicates overall quality.

3 comments3 min readLW link

FC fi­nal: Can Fac­tored Cog­ni­tion schemes scale?

Rafael HarthJan 24, 2021, 10:18 PM
15 points

5 votes

Overall karma indicates overall quality.

0 comments17 min readLW link

[AN #135]: Five prop­er­ties of goal-di­rected systems

Rohin ShahJan 27, 2021, 6:10 PM
33 points

7 votes

Overall karma indicates overall quality.

0 comments8 min readLW link
(mailchi.mp)

AMA on EA Fo­rum: Ajeya Co­tra, re­searcher at Open Phil

Ajeya CotraJan 29, 2021, 11:05 PM
23 points

7 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(forum.effectivealtruism.org)

Play with neu­ral net

KatjaGraceJan 30, 2021, 10:50 AM
17 points

5 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(worldspiritsockpuppet.com)

A Cri­tique of Non-Obstruction

Joe CollmanFeb 3, 2021, 8:45 AM
13 points

4 votes

Overall karma indicates overall quality.

10 comments4 min readLW link

Dist­in­guish­ing claims about train­ing vs deployment

Richard_NgoFeb 3, 2021, 11:30 AM
61 points

25 votes

Overall karma indicates overall quality.

30 comments9 min readLW link

Graph­i­cal World Models, Coun­ter­fac­tu­als, and Ma­chine Learn­ing Agents

Koen.HoltmanFeb 17, 2021, 11:07 AM
6 points

3 votes

Overall karma indicates overall quality.

2 comments10 min readLW link

OpenAI: “Scal­ing Laws for Trans­fer”, Her­nan­dez et al.

Lukas FinnvedenFeb 4, 2021, 12:49 PM
13 points

6 votes

Overall karma indicates overall quality.

3 comments1 min readLW link
(arxiv.org)

Evolu­tions Build­ing Evolu­tions: Lay­ers of Gen­er­ate and Test

plexFeb 5, 2021, 6:21 PM
11 points

6 votes

Overall karma indicates overall quality.

1 comment6 min readLW link

Episte­mol­ogy of HCH

adamShimiFeb 9, 2021, 11:46 AM
16 points

8 votes

Overall karma indicates overall quality.

2 comments10 min readLW link

[Question] Math­e­mat­i­cal Models of Progress?

abramdemskiFeb 16, 2021, 12:21 AM
28 points

8 votes

Overall karma indicates overall quality.

8 comments2 min readLW link

[Question] Sugges­tions of posts on the AF to review

adamShimiFeb 16, 2021, 12:40 PM
56 points

14 votes

Overall karma indicates overall quality.

20 comments1 min readLW link

Disen­tan­gling Cor­rigi­bil­ity: 2015-2021

Koen.HoltmanFeb 16, 2021, 6:01 PM
17 points

11 votes

Overall karma indicates overall quality.

20 comments9 min readLW link

Carte­sian frames as gen­er­al­ised models

Stuart_ArmstrongFeb 16, 2021, 4:09 PM
20 points

4 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

[AN #138]: Why AI gov­er­nance should find prob­lems rather than just solv­ing them

Rohin ShahFeb 17, 2021, 6:50 PM
12 points

4 votes

Overall karma indicates overall quality.

0 comments9 min readLW link
(mailchi.mp)

Safely con­trol­ling the AGI agent re­ward function

Koen.HoltmanFeb 17, 2021, 2:47 PM
7 points

3 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

AXRP Epi­sode 4 - Risks from Learned Op­ti­miza­tion with Evan Hubinger

DanielFilanFeb 18, 2021, 12:03 AM
41 points

11 votes

Overall karma indicates overall quality.

10 comments86 min readLW link

Utility Max­i­miza­tion = De­scrip­tion Length Minimization

johnswentworthFeb 18, 2021, 6:04 PM
183 points

74 votes

Overall karma indicates overall quality.

40 comments5 min readLW link

Google’s Eth­i­cal AI team and AI Safety

magfrumpFeb 20, 2021, 9:42 AM
12 points

17 votes

Overall karma indicates overall quality.

16 comments7 min readLW link

AI Safety Begin­ners Meetup (Euro­pean Time)

Linda LinseforsFeb 20, 2021, 1:20 PM
8 points

3 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Min­i­mal Map Constraints

Past AccountFeb 21, 2021, 5:49 PM
6 points

2 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

[AN #139]: How the sim­plic­ity of re­al­ity ex­plains the suc­cess of neu­ral nets

Rohin ShahFeb 24, 2021, 6:30 PM
26 points

10 votes

Overall karma indicates overall quality.

6 comments12 min readLW link
(mailchi.mp)

My Thoughts on the Ap­per­cep­tion Engine

J BostockFeb 25, 2021, 7:43 PM
4 points

3 votes

Overall karma indicates overall quality.

1 comment3 min readLW link

The Case for Pri­vacy Optimism

bmgarfinkelMar 10, 2020, 8:30 PM
43 points

9 votes

Overall karma indicates overall quality.

1 comment32 min readLW link
(benmgarfinkel.wordpress.com)

[Question] How might cryp­tocur­ren­cies af­fect AGI timelines?

Dawn DrescherFeb 28, 2021, 7:16 PM
13 points

7 votes

Overall karma indicates overall quality.

40 comments2 min readLW link

Fun with +12 OOMs of Compute

Daniel KokotajloMar 1, 2021, 1:30 PM
212 points

100 votes

Overall karma indicates overall quality.

78 comments12 min readLW link1 review

Links for Feb 2021

ikeMar 1, 2021, 5:13 AM
6 points

1 vote

Overall karma indicates overall quality.

0 comments6 min readLW link
(misinfounderload.substack.com)

In­tro­duc­tion to Re­in­force­ment Learning

Dr. BirdbrainFeb 28, 2021, 11:03 PM
4 points

3 votes

Overall karma indicates overall quality.

1 comment3 min readLW link

Cu­ri­os­ity about Align­ing Values

esweetMar 3, 2021, 12:22 AM
3 points

2 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

How does bee learn­ing com­pare with ma­chine learn­ing?

leniMar 4, 2021, 1:59 AM
62 points

21 votes

Overall karma indicates overall quality.

15 comments24 min readLW link

Some re­cent in­ter­views with AI/​math lu­mi­nar­ies.

fowlertmMar 4, 2021, 1:26 AM
2 points

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

A Semitech­ni­cal In­tro­duc­tory Dialogue on Solomonoff Induction

Eliezer YudkowskyMar 4, 2021, 5:27 PM
127 points

50 votes

Overall karma indicates overall quality.

34 comments54 min readLW link

Con­nect­ing the good reg­u­la­tor the­o­rem with se­man­tics and sym­bol grounding

Stuart_ArmstrongMar 4, 2021, 2:35 PM
11 points

2 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

[AN #140]: The­o­ret­i­cal mod­els that pre­dict scal­ing laws

Rohin ShahMar 4, 2021, 6:10 PM
45 points

11 votes

Overall karma indicates overall quality.

0 comments10 min readLW link
(mailchi.mp)

Take­aways from the In­tel­li­gence Ris­ing RPG

Mar 5, 2021, 10:27 AM
50 points

25 votes

Overall karma indicates overall quality.

8 comments12 min readLW link

GPT-3 and the fu­ture of knowl­edge work

fowlertmMar 5, 2021, 5:40 PM
16 points

12 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

The case for al­ign­ing nar­rowly su­per­hu­man models

Ajeya CotraMar 5, 2021, 10:29 PM
187 points

70 votes

Overall karma indicates overall quality.

74 comments38 min readLW link

MIRI com­ments on Co­tra’s “Case for Align­ing Nar­rowly Su­per­hu­man Models”

Rob BensingerMar 5, 2021, 11:43 PM
136 points

48 votes

Overall karma indicates overall quality.

13 comments26 min readLW link

[Question] What are the biggest cur­rent im­pacts of AI?

Sam ClarkeMar 7, 2021, 9:44 PM
15 points

6 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

CLR’s re­cent work on multi-agent systems

JesseCliftonMar 9, 2021, 2:28 AM
54 points

21 votes

Overall karma indicates overall quality.

1 comment13 min readLW link

De-con­fus­ing my­self about Pas­cal’s Mug­ging and New­comb’s Problem

DirectedEvolutionMar 9, 2021, 8:45 PM
7 points

8 votes

Overall karma indicates overall quality.

1 comment3 min readLW link

Open Prob­lems with Myopia

Mar 10, 2021, 6:38 PM
57 points

18 votes

Overall karma indicates overall quality.

16 comments8 min readLW link

[AN #141]: The case for prac­tic­ing al­ign­ment work on GPT-3 and other large models

Rohin ShahMar 10, 2021, 6:30 PM
27 points

8 votes

Overall karma indicates overall quality.

4 comments8 min readLW link
(mailchi.mp)

[Link] Whit­tle­stone et al., The So­cietal Im­pli­ca­tions of Deep Re­in­force­ment Learning

Aryeh EnglanderMar 10, 2021, 6:13 PM
11 points

5 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(jair.org)

Four Mo­ti­va­tions for Learn­ing Normativity

abramdemskiMar 11, 2021, 8:13 PM
42 points

9 votes

Overall karma indicates overall quality.

7 comments5 min readLW link

[Question] What’s a good way to test ba­sic ma­chine learn­ing code?

KennyMar 11, 2021, 9:27 PM
5 points

1 vote

Overall karma indicates overall quality.

9 comments1 min readLW link

[Video] In­tel­li­gence and Stu­pidity: The Orthog­o­nal­ity Thesis

plexMar 13, 2021, 12:32 AM
5 points

3 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(www.youtube.com)

AI x-risk re­duc­tion: why I chose academia over industry

David Scott Krueger (formerly: capybaralet)Mar 14, 2021, 5:25 PM
56 points

36 votes

Overall karma indicates overall quality.

14 comments3 min readLW link

[Question] Par­tial-Con­scious­ness as se­man­tic/​sym­bolic rep­re­sen­ta­tional lan­guage model trained on NN

Joe KwonMar 16, 2021, 6:51 PM
2 points

2 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

[AN #142]: The quest to un­der­stand a net­work well enough to reim­ple­ment it by hand

Rohin ShahMar 17, 2021, 5:10 PM
34 points

9 votes

Overall karma indicates overall quality.

4 comments8 min readLW link
(mailchi.mp)

In­ter­mit­tent Distil­la­tions #1

Mark XuMar 17, 2021, 5:15 AM
25 points

9 votes

Overall karma indicates overall quality.

1 comment10 min readLW link

HCH Spec­u­la­tion Post #2A

Charlie SteinerMar 17, 2021, 1:26 PM
42 points

8 votes

Overall karma indicates overall quality.

7 comments9 min readLW link

The Age of Imag­i­na­tive Machines

Yuli_BanMar 18, 2021, 12:35 AM
10 points

5 votes

Overall karma indicates overall quality.

1 comment11 min readLW link

Gen­er­al­iz­ing POWER to multi-agent games

Mar 22, 2021, 2:41 AM
52 points

14 votes

Overall karma indicates overall quality.

17 comments7 min readLW link

My re­search methodology

paulfchristianoMar 22, 2021, 9:20 PM
148 points

46 votes

Overall karma indicates overall quality.

36 comments16 min readLW link
(ai-alignment.com)

“In­fra-Bayesi­anism with Vanessa Kosoy” – Watch/​Dis­cuss Party

Ben PaceMar 22, 2021, 11:44 PM
27 points

9 votes

Overall karma indicates overall quality.

45 comments1 min readLW link

Prefer­ences and bi­ases, the in­for­ma­tion argument

Stuart_ArmstrongMar 23, 2021, 12:44 PM
14 points

5 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

[AN #143]: How to make em­bed­ded agents that rea­son prob­a­bil­is­ti­cally about their environments

Rohin ShahMar 24, 2021, 5:20 PM
13 points

5 votes

Overall karma indicates overall quality.

3 comments8 min readLW link
(mailchi.mp)

Toy model of prefer­ence, bias, and ex­tra information

Stuart_ArmstrongMar 24, 2021, 10:14 AM
9 points

1 vote

Overall karma indicates overall quality.

0 comments4 min readLW link

On lan­guage mod­el­ing and fu­ture ab­stract rea­son­ing research

alexlyzhovMar 25, 2021, 5:43 PM
3 points

2 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(docs.google.com)

In­framea­sures and Do­main Theory

DiffractorMar 28, 2021, 9:19 AM
27 points

8 votes

Overall karma indicates overall quality.

3 comments33 min readLW link

In­fra-Do­main Proofs 2

DiffractorMar 28, 2021, 9:15 AM
13 points

2 votes

Overall karma indicates overall quality.

0 comments21 min readLW link

In­fra-Do­main proofs 1

DiffractorMar 28, 2021, 9:16 AM
13 points

2 votes

Overall karma indicates overall quality.

0 comments23 min readLW link

Sce­nar­ios and Warn­ing Signs for Ajeya’s Ag­gres­sive, Con­ser­va­tive, and Best Guess AI Timelines

Kevin LiuMar 29, 2021, 1:38 AM
25 points

14 votes

Overall karma indicates overall quality.

1 comment9 min readLW link
(kliu.io)

[Question] How do we pre­pare for fi­nal crunch time?

Eli TyreMar 30, 2021, 5:47 AM
116 points

49 votes

Overall karma indicates overall quality.

30 comments8 min readLW link1 review

[Question] TAI?

Logan ZoellnerMar 30, 2021, 12:41 PM
12 points

11 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

A use for Clas­si­cal AI—Ex­pert Systems

GlpusnaMar 31, 2021, 2:37 AM
1 point

1 vote

Overall karma indicates overall quality.

2 comments2 min readLW link

What Mul­tipo­lar Failure Looks Like, and Ro­bust Agent-Ag­nos­tic Pro­cesses (RAAPs)

Andrew_CritchMar 31, 2021, 11:50 PM
203 points

89 votes

Overall karma indicates overall quality.

60 comments22 min readLW link

AI and the Prob­a­bil­ity of Conflict

tonyoconnorApr 1, 2021, 7:00 AM
8 points

7 votes

Overall karma indicates overall quality.

10 comments8 min readLW link

“AI and Com­pute” trend isn’t pre­dic­tive of what is happening

alexlyzhovApr 2, 2021, 12:44 AM
133 points

56 votes

Overall karma indicates overall quality.

15 comments1 min readLW link

[AN #144]: How lan­guage mod­els can also be fine­tuned for non-lan­guage tasks

Rohin ShahApr 2, 2021, 5:20 PM
19 points

8 votes

Overall karma indicates overall quality.

0 comments6 min readLW link
(mailchi.mp)

2012 Robin Han­son com­ment on “In­tel­li­gence Ex­plo­sion: Ev­i­dence and Im­port”

Rob BensingerApr 2, 2021, 4:26 PM
28 points

8 votes

Overall karma indicates overall quality.

4 comments3 min readLW link

My take on Michael Littman on “The HCI of HAI”

Alex FlintApr 2, 2021, 7:51 PM
59 points

20 votes

Overall karma indicates overall quality.

4 comments7 min readLW link

[Question] How do scal­ing laws work for fine-tun­ing?

Daniel KokotajloApr 4, 2021, 12:18 PM
24 points

6 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

Avert­ing suffer­ing with sen­tience throt­tlers (pro­posal)

QuinnApr 5, 2021, 10:54 AM
8 points

5 votes

Overall karma indicates overall quality.

7 comments3 min readLW link

Reflec­tive Bayesianism

abramdemskiApr 6, 2021, 7:48 PM
58 points

15 votes

Overall karma indicates overall quality.

27 comments13 min readLW link

[Question] What will GPT-4 be in­ca­pable of?

Michaël TrazziApr 6, 2021, 7:57 PM
34 points

13 votes

Overall karma indicates overall quality.

32 comments1 min readLW link

I Trained a Neu­ral Net­work to Play Helltaker

lsusrApr 7, 2021, 8:24 AM
29 points

15 votes

Overall karma indicates overall quality.

5 comments3 min readLW link

[AN #145]: Our three year an­niver­sary!

Rohin ShahApr 9, 2021, 5:48 PM
19 points

6 votes

Overall karma indicates overall quality.

0 comments8 min readLW link
(mailchi.mp)

Align­ment Newslet­ter Three Year Retrospective

Rohin ShahApr 7, 2021, 2:39 PM
55 points

23 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Which coun­ter­fac­tu­als should an AI fol­low?

Stuart_ArmstrongApr 7, 2021, 4:47 PM
19 points

5 votes

Overall karma indicates overall quality.

5 comments7 min readLW link

Solv­ing the whole AGI con­trol prob­lem, ver­sion 0.0001

Steven ByrnesApr 8, 2021, 3:14 PM
60 points

29 votes

Overall karma indicates overall quality.

7 comments26 min readLW link

The Ja­panese Quiz: a Thought Ex­per­i­ment of Statis­ti­cal Epistemology

DanBApr 8, 2021, 5:37 PM
11 points

7 votes

Overall karma indicates overall quality.

0 comments9 min readLW link

A pos­si­ble prefer­ence algorithm

Stuart_ArmstrongApr 8, 2021, 6:25 PM
22 points

4 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

If you don’t de­sign for ex­trap­o­la­tion, you’ll ex­trap­o­late poorly—pos­si­bly fatally

Stuart_ArmstrongApr 8, 2021, 6:10 PM
17 points

6 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

AXRP Epi­sode 6 - De­bate and Imi­ta­tive Gen­er­al­iza­tion with Beth Barnes

DanielFilanApr 8, 2021, 9:20 PM
24 points

7 votes

Overall karma indicates overall quality.

3 comments59 min readLW link

My Cur­rent Take on Counterfactuals

abramdemskiApr 9, 2021, 5:51 PM
53 points

17 votes

Overall karma indicates overall quality.

57 comments25 min readLW link

Opinions on In­ter­pretable Ma­chine Learn­ing and 70 Sum­maries of Re­cent Papers

Apr 9, 2021, 7:19 PM
139 points

46 votes

Overall karma indicates overall quality.

16 comments102 min readLW link

Why un­rig­gable *al­most* im­plies uninfluenceable

Stuart_ArmstrongApr 9, 2021, 5:07 PM
11 points

2 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

In­ter­mit­tent Distil­la­tions #2

Mark XuApr 14, 2021, 6:47 AM
32 points

9 votes

Overall karma indicates overall quality.

4 comments9 min readLW link

Test Cases for Im­pact Reg­u­lari­sa­tion Methods

DanielFilanFeb 6, 2019, 9:50 PM
58 points

19 votes

Overall karma indicates overall quality.

5 comments12 min readLW link
(danielfilan.com)

Su­per­ra­tional Agents Kelly Bet In­fluence!

abramdemskiApr 16, 2021, 10:08 PM
41 points

17 votes

Overall karma indicates overall quality.

5 comments5 min readLW link

Defin­ing “op­ti­mizer”

ChantielApr 17, 2021, 3:38 PM
9 points

6 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Alex Flint on “A soft­ware en­g­ineer’s per­spec­tive on log­i­cal in­duc­tion”

RaemonApr 17, 2021, 6:56 AM
21 points

4 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

[Question] Pa­ram­e­ter count of ML sys­tems through time?

JsevillamolApr 19, 2021, 12:54 PM
31 points

15 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Gra­da­tions of In­ner Align­ment Obstacles

abramdemskiApr 20, 2021, 10:18 PM
80 points

21 votes

Overall karma indicates overall quality.

22 comments9 min readLW link

Where are in­ten­tions to be found?

Alex FlintApr 21, 2021, 12:51 AM
44 points

10 votes

Overall karma indicates overall quality.

12 comments9 min readLW link

[AN #147]: An overview of the in­ter­pretabil­ity landscape

Rohin ShahApr 21, 2021, 5:10 PM
14 points

5 votes

Overall karma indicates overall quality.

2 comments7 min readLW link
(mailchi.mp)

NTK/​GP Models of Neu­ral Nets Can’t Learn Features

intersticeApr 22, 2021, 3:01 AM
31 points

12 votes

Overall karma indicates overall quality.

33 comments3 min readLW link

[Question] Is there any­thing that can stop AGI de­vel­op­ment in the near term?

Wulky WilkinsenApr 22, 2021, 8:37 PM
5 points

5 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Prob­a­bil­ity the­ory and log­i­cal in­duc­tion as lenses

Alex FlintApr 23, 2021, 2:41 AM
43 points

13 votes

Overall karma indicates overall quality.

7 comments6 min readLW link

Nat­u­ral­ism and AI alignment

Michele CampoloApr 24, 2021, 4:16 PM
5 points

8 votes

Overall karma indicates overall quality.

12 comments8 min readLW link

Mal­i­cious non-state ac­tors and AI safety

ketiApr 25, 2021, 3:21 AM
2 points

3 votes

Overall karma indicates overall quality.

13 comments2 min readLW link

An­nounc­ing the Align­ment Re­search Center

paulfchristianoApr 26, 2021, 11:30 PM
177 points

69 votes

Overall karma indicates overall quality.

6 comments1 min readLW link
(ai-alignment.com)

[Linkpost] Treach­er­ous turns in the wild

Mark XuApr 26, 2021, 10:51 PM
31 points

14 votes

Overall karma indicates overall quality.

6 comments1 min readLW link
(lukemuehlhauser.com)

FAQ: Ad­vice for AI Align­ment Researchers

Rohin ShahApr 26, 2021, 6:59 PM
67 points

27 votes

Overall karma indicates overall quality.

2 comments1 min readLW link
(rohinshah.com)

Pit­falls of the agent model

Alex FlintApr 27, 2021, 10:19 PM
19 points

8 votes

Overall karma indicates overall quality.

4 comments20 min readLW link

[AN #148]: An­a­lyz­ing gen­er­al­iza­tion across more axes than just ac­cu­racy or loss

Rohin ShahApr 28, 2021, 6:30 PM
24 points

6 votes

Overall karma indicates overall quality.

5 comments11 min readLW link
(mailchi.mp)

AMA: Paul Chris­ti­ano, al­ign­ment researcher

paulfchristianoApr 28, 2021, 6:55 PM
117 points

42 votes

Overall karma indicates overall quality.

198 comments1 min readLW link

25 Min Talk on Me­taEth­i­cal.AI with Ques­tions from Stu­art Armstrong

June KuApr 29, 2021, 3:38 PM
21 points

8 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

Low-stakes alignment

paulfchristianoApr 30, 2021, 12:10 AM
70 points

25 votes

Overall karma indicates overall quality.

9 comments7 min readLW link1 review
(ai-alignment.com)

[Weekly Event] Align­ment Re­searcher Coffee Time (in Walled Gar­den)

adamShimiMay 2, 2021, 12:59 PM
37 points

11 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Pars­ing Abram on Gra­da­tions of In­ner Align­ment Obstacles

Alex FlintMay 4, 2021, 5:44 PM
22 points

6 votes

Overall karma indicates overall quality.

4 comments6 min readLW link

Mun­dane solu­tions to ex­otic problems

paulfchristianoMay 4, 2021, 6:20 PM
56 points

16 votes

Overall karma indicates overall quality.

8 comments5 min readLW link
(ai-alignment.com)

April 15, 2040

NisanMay 4, 2021, 9:18 PM
97 points

42 votes

Overall karma indicates overall quality.

19 comments2 min readLW link

[AN #149]: The newslet­ter’s ed­i­to­rial policy

Rohin ShahMay 5, 2021, 5:10 PM
19 points

6 votes

Overall karma indicates overall quality.

3 comments8 min readLW link
(mailchi.mp)

Pars­ing Chris Min­gard on Neu­ral Networks

Alex FlintMay 6, 2021, 10:16 PM
67 points

21 votes

Overall karma indicates overall quality.

27 comments6 min readLW link

Life and ex­pand­ing steer­able consequences

Alex FlintMay 7, 2021, 6:33 PM
46 points

15 votes

Overall karma indicates overall quality.

3 comments4 min readLW link

Do­main The­ory and the Pri­soner’s Dilemma: FairBot

GurkenglasMay 7, 2021, 7:33 AM
14 points

6 votes

Overall karma indicates overall quality.

5 comments2 min readLW link

Pre-Train­ing + Fine-Tun­ing Fa­vors Deception

Mark XuMay 8, 2021, 6:36 PM
27 points

8 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

[Event] Weekly Align­ment Re­search Coffee Time (05/​10)

adamShimiMay 9, 2021, 11:05 AM
16 points

7 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

[Question] Is driv­ing worth the risk?

Adam ZernerMay 11, 2021, 5:04 AM
26 points

14 votes

Overall karma indicates overall quality.

29 comments7 min readLW link

Yam­polskiy on AI Risk Skepticism

Gordon Seidoh WorleyMay 11, 2021, 2:50 PM
15 points

6 votes

Overall karma indicates overall quality.

5 comments1 min readLW link
(www.researchgate.net)

Hu­man pri­ors, fea­tures and mod­els, lan­guages, and Sol­monoff induction

Stuart_ArmstrongMay 10, 2021, 10:55 AM
16 points

8 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

[AN #150]: The sub­types of Co­op­er­a­tive AI research

Rohin ShahMay 12, 2021, 5:20 PM
15 points

6 votes

Overall karma indicates overall quality.

0 comments6 min readLW link
(mailchi.mp)

Un­der­stand­ing the Lot­tery Ticket Hy­poth­e­sis

Alex FlintMay 14, 2021, 12:25 AM
50 points

21 votes

Overall karma indicates overall quality.

9 comments8 min readLW link

Con­cern­ing not get­ting lost

Alex FlintMay 14, 2021, 7:38 PM
50 points

29 votes

Overall karma indicates overall quality.

9 comments4 min readLW link

[Event] Weekly Align­ment Re­search Coffee Time (05/​17)

adamShimiMay 15, 2021, 10:07 PM
7 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Op­ti­miz­ers: To Define or not to Define

J BostockMay 16, 2021, 7:55 PM
4 points

2 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

In­ter­mit­tent Distil­la­tions #3

Mark XuMay 15, 2021, 7:13 AM
19 points

6 votes

Overall karma indicates overall quality.

1 comment11 min readLW link

AXRP Epi­sode 7 - Side Effects with Vic­to­ria Krakovna

DanielFilanMay 14, 2021, 3:50 AM
34 points

9 votes

Overall karma indicates overall quality.

6 comments43 min readLW link

Sav­ing Time

Scott GarrabrantMay 18, 2021, 8:11 PM
131 points

50 votes

Overall karma indicates overall quality.

19 comments4 min readLW link

[Question] Are there any meth­ods for NNs or other ML sys­tems to get in­for­ma­tion from knock­out-like or as­say-like ex­per­i­ments?

J BostockMay 18, 2021, 9:33 PM
2 points

1 vote

Overall karma indicates overall quality.

1 comment1 min readLW link

SGD’s Bias

johnswentworthMay 18, 2021, 11:19 PM
60 points

24 votes

Overall karma indicates overall quality.

16 comments3 min readLW link

This Sun­day, 12PM PT: Scott Garrabrant on “Finite Fac­tored Sets”

RaemonMay 19, 2021, 1:48 AM
33 points

7 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

[AN #151]: How spar­sity in the fi­nal layer makes a neu­ral net debuggable

Rohin ShahMay 19, 2021, 5:20 PM
19 points

5 votes

Overall karma indicates overall quality.

0 comments6 min readLW link
(mailchi.mp)

The Vari­a­tional Char­ac­ter­i­za­tion of KL-Diver­gence, Er­ror Catas­tro­phes, and Generalization

Past AccountMay 20, 2021, 8:57 PM
38 points

10 votes

Overall karma indicates overall quality.

5 comments3 min readLW link

Or­a­cles, In­form­ers, and Controllers

ozziegooenMay 25, 2021, 2:16 PM
15 points

7 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

Knowl­edge is not just map/​ter­ri­tory resemblance

Alex FlintMay 25, 2021, 5:58 PM
28 points

8 votes

Overall karma indicates overall quality.

4 comments3 min readLW link

MDP mod­els are de­ter­mined by the agent ar­chi­tec­ture and the en­vi­ron­men­tal dynamics

TurnTroutMay 26, 2021, 12:14 AM
23 points

8 votes

Overall karma indicates overall quality.

34 comments3 min readLW link

[Question] List of good AI safety pro­ject ideas?

Aryeh EnglanderMay 26, 2021, 10:36 PM
24 points

11 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

AXRP Epi­sode 7.5 - Fore­cast­ing Trans­for­ma­tive AI from Biolog­i­cal An­chors with Ajeya Cotra

DanielFilanMay 28, 2021, 12:20 AM
24 points

6 votes

Overall karma indicates overall quality.

1 comment67 min readLW link

Pre­dict re­sponses to the “ex­is­ten­tial risk from AI” survey

Rob BensingerMay 28, 2021, 1:32 AM
44 points

16 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

Teach­ing ML to an­swer ques­tions hon­estly in­stead of pre­dict­ing hu­man answers

paulfchristianoMay 28, 2021, 5:30 PM
53 points

13 votes

Overall karma indicates overall quality.

18 comments16 min readLW link
(ai-alignment.com)

The blue-min­imis­ing robot and model splintering

Stuart_ArmstrongMay 28, 2021, 3:09 PM
13 points

8 votes

Overall karma indicates overall quality.

4 comments3 min readLW link1 review

[Question] Use of GPT-3 for iden­ti­fy­ing Phish­ing and other email based at­tacks?

jmhMay 29, 2021, 5:11 PM
6 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

[Event] Weekly Align­ment Re­search Coffee Time

adamShimiMay 29, 2021, 1:26 PM
12 points

7 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

What is the most effec­tive way to donate to AGI XRisk miti­ga­tion?

JoshuaFoxMay 30, 2021, 11:08 AM
44 points

13 votes

Overall karma indicates overall quality.

11 comments1 min readLW link

“Ex­is­ten­tial risk from AI” sur­vey results

Rob BensingerJun 1, 2021, 8:02 PM
56 points

23 votes

Overall karma indicates overall quality.

8 comments11 min readLW link

April 2021 Gw­ern.net newsletter

gwernJun 3, 2021, 3:13 PM
20 points

7 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(www.gwern.net)

The un­der­ly­ing model of a morphism

Stuart_ArmstrongJun 4, 2021, 10:29 PM
10 points

2 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

We need a stan­dard set of com­mu­nity ad­vice for how to fi­nan­cially pre­pare for AGI

GeneSmithJun 7, 2021, 7:24 AM
50 points

32 votes

Overall karma indicates overall quality.

53 comments5 min readLW link

Some AI Gover­nance Re­search Ideas

Jun 7, 2021, 2:40 PM
29 points

9 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

Big pic­ture of pha­sic dopamine

Steven ByrnesJun 8, 2021, 1:07 PM
59 points

35 votes

Overall karma indicates overall quality.

18 comments36 min readLW link

Bayeswatch 6: Mechwarrior

lsusrJun 7, 2021, 8:20 PM
47 points

33 votes

Overall karma indicates overall quality.

8 comments2 min readLW link

Spec­u­la­tions against GPT-n writ­ing al­ign­ment papers

Donald HobsonJun 7, 2021, 9:13 PM
31 points

12 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

The re­verse Good­hart problem

Stuart_ArmstrongJun 8, 2021, 3:48 PM
16 points

6 votes

Overall karma indicates overall quality.

22 comments1 min readLW link

Against intelligence

George3d6Jun 8, 2021, 1:03 PM
12 points

8 votes

Overall karma indicates overall quality.

17 comments10 min readLW link
(cerebralab.com)

Danger­ous op­ti­mi­sa­tion in­cludes var­i­ance minimisation

Stuart_ArmstrongJun 8, 2021, 11:34 AM
32 points

12 votes

Overall karma indicates overall quality.

5 comments2 min readLW link

Sur­vey on AI ex­is­ten­tial risk scenarios

Jun 8, 2021, 5:12 PM
60 points

26 votes

Overall karma indicates overall quality.

11 comments7 min readLW link

AXRP Epi­sode 8 - As­sis­tance Games with Dy­lan Had­field-Menell

DanielFilanJun 8, 2021, 11:20 PM
22 points

5 votes

Overall karma indicates overall quality.

1 comment71 min readLW link

“De­ci­sion Trans­former” (Tool AIs are se­cret Agent AIs)

gwernJun 9, 2021, 1:06 AM
37 points

16 votes

Overall karma indicates overall quality.

4 comments1 min readLW link
(sites.google.com)

Evan Hub­inger on Ho­mo­gene­ity in Take­off Speeds, Learned Op­ti­miza­tion and Interpretability

Michaël TrazziJun 8, 2021, 7:20 PM
28 points

7 votes

Overall karma indicates overall quality.

0 comments55 min readLW link

A naive al­ign­ment strat­egy and op­ti­mism about generalization

paulfchristianoJun 10, 2021, 12:10 AM
44 points

15 votes

Overall karma indicates overall quality.

4 comments3 min readLW link
(ai-alignment.com)

Knowl­edge is not just mu­tual information

Alex FlintJun 10, 2021, 1:01 AM
27 points

7 votes

Overall karma indicates overall quality.

6 comments4 min readLW link

The Ap­pren­tice Experiment

johnswentworthJun 10, 2021, 3:29 AM
148 points

56 votes

Overall karma indicates overall quality.

11 comments4 min readLW link

[Question] ML is now au­tomat­ing parts of chip R&D. How big a deal is this?

Daniel KokotajloJun 10, 2021, 9:51 AM
45 points

19 votes

Overall karma indicates overall quality.

17 comments1 min readLW link

Oh No My AI (Filk)

Gordon Seidoh WorleyJun 11, 2021, 3:05 PM
42 points

24 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

May 2021 Gw­ern.net newsletter

gwernJun 11, 2021, 2:13 PM
31 points

12 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(www.gwern.net)

[Question] What other prob­lems would a suc­cess­ful AI safety al­gorithm solve?

DirectedEvolutionJun 13, 2021, 9:07 PM
12 points

4 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Avoid­ing the in­stru­men­tal policy by hid­ing in­for­ma­tion about humans

paulfchristianoJun 13, 2021, 8:00 PM
31 points

10 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

An­swer­ing ques­tions hon­estly given world-model mismatches

paulfchristianoJun 13, 2021, 6:00 PM
34 points

10 votes

Overall karma indicates overall quality.

2 comments16 min readLW link
(ai-alignment.com)

Vignettes Work­shop (AI Im­pacts)

Daniel KokotajloJun 15, 2021, 12:05 PM
47 points

15 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Three Paths to Ex­is­ten­tial Risk from AI

harsimonyJun 16, 2021, 1:37 AM
1 point

6 votes

Overall karma indicates overall quality.

2 comments1 min readLW link
(harsimony.wordpress.com)

[AN #152]: How we’ve over­es­ti­mated few-shot learn­ing capabilities

Rohin ShahJun 16, 2021, 5:20 PM
22 points

6 votes

Overall karma indicates overall quality.

6 comments8 min readLW link
(mailchi.mp)

AI-Based Code Gen­er­a­tion Us­ing GPT-J-6B

Tomás B.Jun 16, 2021, 3:05 PM
21 points

14 votes

Overall karma indicates overall quality.

15 comments1 min readLW link
(minimaxir.com)

In­suffi­cient Values

Jun 16, 2021, 2:33 PM
29 points

14 votes

Overall karma indicates overall quality.

15 comments5 min readLW link

[Question] Pros and cons of work­ing on near-term tech­ni­cal AI safety and assurance

Aryeh EnglanderJun 17, 2021, 8:17 PM
11 points

4 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

Non-poi­sonous cake: an­thropic up­dates are normal

Stuart_ArmstrongJun 18, 2021, 2:51 PM
27 points

14 votes

Overall karma indicates overall quality.

11 comments2 min readLW link

Knowl­edge is not just pre­cip­i­ta­tion of action

Alex FlintJun 18, 2021, 11:26 PM
21 points

4 votes

Overall karma indicates overall quality.

6 comments7 min readLW link

I’m no longer sure that I buy dutch book ar­gu­ments and this makes me skep­ti­cal of the “util­ity func­tion” abstraction

Eli TyreJun 22, 2021, 3:53 AM
45 points

24 votes

Overall karma indicates overall quality.

29 comments4 min readLW link

Fre­quent ar­gu­ments about alignment

John SchulmanJun 23, 2021, 12:46 AM
95 points

41 votes

Overall karma indicates overall quality.

16 comments5 min readLW link

Em­piri­cal Ob­ser­va­tions of Ob­jec­tive Ro­bust­ness Failures

Jun 23, 2021, 11:23 PM
63 points

21 votes

Overall karma indicates overall quality.

5 comments9 min readLW link

[AN #153]: Ex­per­i­ments that demon­strate failures of ob­jec­tive robustness

Rohin ShahJun 26, 2021, 5:10 PM
25 points

7 votes

Overall karma indicates overall quality.

1 comment8 min readLW link
(mailchi.mp)

An­throp­ics and Embed­ded Agency

dadadarrenJun 26, 2021, 1:45 AM
7 points

4 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

Deep limi­ta­tions? Ex­am­in­ing ex­pert dis­agree­ment over deep learning

Richard_NgoJun 27, 2021, 12:55 AM
17 points

8 votes

Overall karma indicates overall quality.

5 comments1 min readLW link
(link.springer.com)

Finite Fac­tored Sets: LW tran­script with run­ning commentary

Jun 27, 2021, 4:02 PM
30 points

7 votes

Overall karma indicates overall quality.

0 comments51 min readLW link

Brute force search­ing for alignment

Donald HobsonJun 27, 2021, 9:54 PM
23 points

9 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

How teams went about their re­search at AI Safety Camp edi­tion 5

RemmeltJun 28, 2021, 3:15 PM
24 points

12 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

Search by abstraction

p.b.Jun 29, 2021, 8:56 PM
4 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

[Question] Is there a “co­her­ent de­ci­sions im­ply con­sis­tent util­ities”-style ar­gu­ment for non-lex­i­co­graphic prefer­ences?

TetraspaceJun 29, 2021, 7:14 PM
3 points

4 votes

Overall karma indicates overall quality.

20 comments1 min readLW link

Try­ing to ap­prox­i­mate Statis­ti­cal Models as Scor­ing Tables

JsevillamolJun 29, 2021, 5:20 PM
18 points

6 votes

Overall karma indicates overall quality.

2 comments9 min readLW link

Do in­co­her­ent en­tities have stronger rea­son to be­come more co­her­ent than less?

KatjaGraceJun 30, 2021, 5:50 AM
46 points

16 votes

Overall karma indicates overall quality.

5 comments4 min readLW link
(worldspiritsockpuppet.com)

[AN #154]: What eco­nomic growth the­ory has to say about trans­for­ma­tive AI

Rohin ShahJun 30, 2021, 5:20 PM
12 points

3 votes

Overall karma indicates overall quality.

0 comments9 min readLW link
(mailchi.mp)

Progress on Causal In­fluence Diagrams

tom4everittJun 30, 2021, 3:34 PM
71 points

26 votes

Overall karma indicates overall quality.

6 comments9 min readLW link

Could Ad­vanced AI Drive Ex­plo­sive Eco­nomic Growth?

Matthew BarnettJun 30, 2021, 10:17 PM
15 points

5 votes

Overall karma indicates overall quality.

4 comments2 min readLW link
(www.openphilanthropy.org)

Ex­per­i­men­tally eval­u­at­ing whether hon­esty generalizes

paulfchristianoJul 1, 2021, 5:47 PM
99 points

36 votes

Overall karma indicates overall quality.

23 comments9 min readLW link

Should VS Would and New­comb’s Paradox

dadadarrenJul 3, 2021, 11:45 PM
5 points

7 votes

Overall karma indicates overall quality.

36 comments2 min readLW link

Mauhn Re­leases AI Safety Documentation

Berg SeverensJul 3, 2021, 9:23 PM
4 points

6 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

An­thropic Effects in Es­ti­mat­ing Evolu­tion Difficulty

Mark XuJul 5, 2021, 4:02 AM
12 points

7 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

A sim­ple ex­am­ple of con­di­tional or­thog­o­nal­ity in finite fac­tored sets

DanielFilanJul 6, 2021, 12:36 AM
43 points

11 votes

Overall karma indicates overall quality.

3 comments5 min readLW link
(danielfilan.com)

[Question] Is keep­ing AI “in the box” dur­ing train­ing enough?

tgbJul 6, 2021, 3:17 PM
7 points

2 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

A sec­ond ex­am­ple of con­di­tional or­thog­o­nal­ity in finite fac­tored sets

DanielFilanJul 7, 2021, 1:40 AM
46 points

8 votes

Overall karma indicates overall quality.

0 comments2 min readLW link
(danielfilan.com)

Agency and the un­re­li­able au­tonomous car

Alex FlintJul 7, 2021, 2:58 PM
29 points

10 votes

Overall karma indicates overall quality.

24 comments10 min readLW link

How much chess en­g­ine progress is about adapt­ing to big­ger com­put­ers?

paulfchristianoJul 7, 2021, 10:35 PM
114 points

33 votes

Overall karma indicates overall quality.

23 comments6 min readLW link

BASALT: A Bench­mark for Learn­ing from Hu­man Feedback

Rohin ShahJul 8, 2021, 5:40 PM
56 points

17 votes

Overall karma indicates overall quality.

20 comments2 min readLW link
(bair.berkeley.edu)

[AN #155]: A Minecraft bench­mark for al­gorithms that learn with­out re­ward functions

Rohin ShahJul 8, 2021, 5:20 PM
21 points

4 votes

Overall karma indicates overall quality.

5 comments7 min readLW link
(mailchi.mp)

Look­ing for Col­lab­o­ra­tors for an AGI Re­search Project

Rafael CosmanJul 8, 2021, 5:01 PM
3 points

6 votes

Overall karma indicates overall quality.

5 comments3 min readLW link

Jack­pot! An AI Vignette

Ben GoldhaberJul 8, 2021, 8:32 PM
13 points

8 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

In­ter­mit­tent Distil­la­tions #4: Semi­con­duc­tors, Eco­nomics, In­tel­li­gence, and Tech­nolog­i­cal Progress.

Mark XuJul 8, 2021, 10:14 PM
81 points

38 votes

Overall karma indicates overall quality.

9 comments10 min readLW link

Finite Fac­tored Sets: Con­di­tional Orthogonality

Scott GarrabrantJul 9, 2021, 6:01 AM
27 points

4 votes

Overall karma indicates overall quality.

2 comments7 min readLW link

The ac­cu­mu­la­tion of knowl­edge: liter­a­ture review

Alex FlintJul 10, 2021, 6:36 PM
29 points

7 votes

Overall karma indicates overall quality.

3 comments7 min readLW link

The in­escapa­bil­ity of knowledge

Alex FlintJul 11, 2021, 10:59 PM
28 points

12 votes

Overall karma indicates overall quality.

17 comments5 min readLW link

[Link] Musk’s non-miss­ing mood

jimrandomhJul 12, 2021, 10:09 PM
70 points

40 votes

Overall karma indicates overall quality.

21 comments1 min readLW link
(lukemuehlhauser.com)

[Question] What will the twen­ties look like if AGI is 30 years away?

Daniel KokotajloJul 13, 2021, 8:14 AM
29 points

10 votes

Overall karma indicates overall quality.

18 comments1 min readLW link

An­swer­ing ques­tions hon­estly in­stead of pre­dict­ing hu­man an­swers: lots of prob­lems and some solutions

evhubJul 13, 2021, 6:49 PM
53 points

13 votes

Overall karma indicates overall quality.

25 comments31 min readLW link

Model-based RL, De­sires, Brains, Wireheading

Steven ByrnesJul 14, 2021, 3:11 PM
17 points

8 votes

Overall karma indicates overall quality.

1 comment13 min readLW link

A closer look at chess scal­ings (into the past)

hippkeJul 15, 2021, 8:13 AM
49 points

19 votes

Overall karma indicates overall quality.

14 comments4 min readLW link

AlphaFold 2 pa­per re­leased: “Highly ac­cu­rate pro­tein struc­ture pre­dic­tion with AlphaFold”, Jumper et al 2021

gwernJul 15, 2021, 7:27 PM
39 points

16 votes

Overall karma indicates overall quality.

10 comments1 min readLW link
(www.nature.com)

Bench­mark­ing an old chess en­g­ine on new hardware

hippkeJul 16, 2021, 7:58 AM
71 points

18 votes

Overall karma indicates overall quality.

3 comments5 min readLW link

[AN #156]: The scal­ing hy­poth­e­sis: a plan for build­ing AGI

Rohin ShahJul 16, 2021, 5:10 PM
44 points

12 votes

Overall karma indicates overall quality.

20 comments8 min readLW link
(mailchi.mp)

Bayesi­anism ver­sus con­ser­vatism ver­sus Goodhart

Stuart_ArmstrongJul 16, 2021, 11:39 PM
15 points

5 votes

Overall karma indicates overall quality.

1 comment6 min readLW link

(2009) Shane Legg—Fund­ing safe AGI

Tomás B.Jul 17, 2021, 4:46 PM
36 points

14 votes

Overall karma indicates overall quality.

2 comments1 min readLW link
(www.vetta.org)

[Question] Equiv­a­lent of In­for­ma­tion The­ory but for Com­pu­ta­tion?

J BostockJul 17, 2021, 9:38 AM
5 points

3 votes

Overall karma indicates overall quality.

27 comments1 min readLW link

A Models-cen­tric Ap­proach to Cor­rigible Alignment

J BostockJul 17, 2021, 5:27 PM
2 points

1 vote

Overall karma indicates overall quality.

0 comments6 min readLW link

A model of de­ci­sion-mak­ing in the brain (the short ver­sion)

Steven ByrnesJul 18, 2021, 2:39 PM
20 points

8 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

[Question] Any tax­onomies of con­scious ex­pe­rience?

JohnDavidBustardJul 18, 2021, 6:28 PM
7 points

4 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

[Question] Work on Bayesian fit­ting of AI trends of perfor­mance?

JsevillamolJul 19, 2021, 6:45 PM
3 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Some thoughts on David Rood­man’s GWP model and its re­la­tion to AI timelines

Tom DavidsonJul 19, 2021, 10:59 PM
30 points

8 votes

Overall karma indicates overall quality.

1 comment8 min readLW link

In search of benev­olence (or: what should you get Clippy for Christ­mas?)

Joe CarlsmithJul 20, 2021, 1:12 AM
20 points

6 votes

Overall karma indicates overall quality.

0 comments33 min readLW link

En­tropic bound­ary con­di­tions to­wards safe ar­tifi­cial superintelligence

Santiago Nunez-CorralesJul 20, 2021, 10:15 PM
3 points

6 votes

Overall karma indicates overall quality.

0 comments2 min readLW link
(www.tandfonline.com)

Re­ward splin­ter­ing for AI design

Stuart_ArmstrongJul 21, 2021, 4:13 PM
30 points

9 votes

Overall karma indicates overall quality.

1 comment8 min readLW link

Re-Define In­tent Align­ment?

abramdemskiJul 22, 2021, 7:00 PM
27 points

8 votes

Overall karma indicates overall quality.

33 comments4 min readLW link

[AN #157]: Mea­sur­ing mis­al­ign­ment in the tech­nol­ogy un­der­ly­ing Copilot

Rohin ShahJul 23, 2021, 5:20 PM
28 points

7 votes

Overall karma indicates overall quality.

18 comments7 min readLW link
(mailchi.mp)

Ex­am­ples of hu­man-level AI run­ning un­al­igned.

df fdJul 23, 2021, 8:49 AM
−3 points

5 votes

Overall karma indicates overall quality.

0 comments2 min readLW link
(sortale.substack.com)

AXRP Epi­sode 10 - AI’s Fu­ture and Im­pacts with Katja Grace

DanielFilanJul 23, 2021, 10:10 PM
34 points

8 votes

Overall karma indicates overall quality.

2 comments76 min readLW link

Wanted: Foom-scared al­ign­ment re­search partner

Icarus GallagherJul 26, 2021, 7:23 PM
40 points

24 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Re­fac­tor­ing Align­ment (at­tempt #2)

abramdemskiJul 26, 2021, 8:12 PM
46 points

11 votes

Overall karma indicates overall quality.

17 comments8 min readLW link

[Question] How much com­pute was used to train Deep­Mind’s gen­er­ally ca­pa­ble agents?

Daniel KokotajloJul 29, 2021, 11:34 AM
32 points

14 votes

Overall karma indicates overall quality.

11 comments1 min readLW link

[Question] Did they or didn’t they learn tool use?

Daniel KokotajloJul 29, 2021, 1:26 PM
16 points

7 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

[AN #158]: Should we be op­ti­mistic about gen­er­al­iza­tion?

Rohin ShahJul 29, 2021, 5:20 PM
19 points

5 votes

Overall karma indicates overall quality.

0 comments8 min readLW link
(mailchi.mp)

[Question] Very Un­nat­u­ral Tasks?

OrfeasJul 31, 2021, 9:22 PM
4 points

4 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

[Question] Is iter­ated am­plifi­ca­tion re­ally more pow­er­ful than imi­ta­tion?

ChantielAug 2, 2021, 11:20 PM
5 points

3 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

What does GPT-3 un­der­stand? Sym­bol ground­ing and Chi­nese rooms

Stuart_ArmstrongAug 3, 2021, 1:14 PM
40 points

23 votes

Overall karma indicates overall quality.

15 comments12 min readLW link

Garrabrant and Shah on hu­man mod­el­ing in AGI

Rob BensingerAug 4, 2021, 4:35 AM
57 points

23 votes

Overall karma indicates overall quality.

10 comments47 min readLW link

Value load­ing in the hu­man brain: a worked example

Steven ByrnesAug 4, 2021, 5:20 PM
45 points

15 votes

Overall karma indicates overall quality.

2 comments8 min readLW link

[AN #159]: Build­ing agents that know how to ex­per­i­ment, by train­ing on pro­ce­du­rally gen­er­ated games

Rohin ShahAug 4, 2021, 5:10 PM
18 points

7 votes

Overall karma indicates overall quality.

4 comments14 min readLW link
(mailchi.mp)

[Question] How many pa­ram­e­ters do self-driv­ing-car neu­ral nets have?

Daniel KokotajloAug 6, 2021, 11:24 AM
9 points

2 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Rage Against The MOOChine

BoraskoAug 7, 2021, 5:57 PM
20 points

18 votes

Overall karma indicates overall quality.

12 comments7 min readLW link

Ap­pli­ca­tions for De­con­fus­ing Goal-Directedness

adamShimiAug 8, 2021, 1:05 PM
36 points

9 votes

Overall karma indicates overall quality.

3 comments5 min readLW link1 review

In­stru­men­tal Con­ver­gence: Power as Rademacher Complexity

Past AccountAug 12, 2021, 4:02 PM
6 points

4 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

A new defi­ni­tion of “op­ti­mizer”

ChantielAug 9, 2021, 1:42 PM
5 points

5 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

Goal-Direct­ed­ness and Be­hav­ior, Redux

adamShimiAug 9, 2021, 2:26 PM
14 points

7 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

Au­tomat­ing Au­dit­ing: An am­bi­tious con­crete tech­ni­cal re­search proposal

evhubAug 11, 2021, 8:32 PM
77 points

26 votes

Overall karma indicates overall quality.

9 comments14 min readLW link1 review

Some crite­ria for sand­wich­ing projects

dmzAug 12, 2021, 3:40 AM
18 points

8 votes

Overall karma indicates overall quality.

1 comment4 min readLW link

Power-seek­ing for suc­ces­sive choices

adamShimiAug 12, 2021, 8:37 PM
11 points

4 votes

Overall karma indicates overall quality.

9 comments4 min readLW link

[AN #160]: Build­ing AIs that learn and think like people

Rohin ShahAug 13, 2021, 5:10 PM
28 points

9 votes

Overall karma indicates overall quality.

6 comments10 min readLW link
(mailchi.mp)

[Question] How would the Scal­ing Hy­poth­e­sis change things?

Aryeh EnglanderAug 13, 2021, 3:42 PM
4 points

4 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

A re­view of “Agents and De­vices”

adamShimiAug 13, 2021, 8:42 AM
10 points

8 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Ap­proaches to gra­di­ent hacking

adamShimiAug 14, 2021, 3:16 PM
16 points

8 votes

Overall karma indicates overall quality.

8 comments8 min readLW link

[Question] What are some open ex­po­si­tion prob­lems in AI?

Sai Sasank YAug 16, 2021, 3:05 PM
4 points

4 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Think­ing about AI re­la­tion­ally

TekhneMakreAug 16, 2021, 10:03 PM
5 points

2 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Finite Fac­tored Sets: Polyno­mi­als and Probability

Scott GarrabrantAug 17, 2021, 9:53 PM
21 points

5 votes

Overall karma indicates overall quality.

2 comments8 min readLW link

How Deep­Mind’s Gen­er­ally Ca­pable Agents Were Trained

1a3ornAug 20, 2021, 6:52 PM
87 points

39 votes

Overall karma indicates overall quality.

6 comments19 min readLW link

[AN #161]: Creat­ing gen­er­al­iz­able re­ward func­tions for mul­ti­ple tasks by learn­ing a model of func­tional similarity

Rohin ShahAug 20, 2021, 5:20 PM
15 points

5 votes

Overall karma indicates overall quality.

0 comments9 min readLW link
(mailchi.mp)

Im­pli­ca­tion of AI timelines on plan­ning and solutions

JJ HepburnAug 21, 2021, 5:12 AM
18 points

8 votes

Overall karma indicates overall quality.

5 comments2 min readLW link

Au­tore­gres­sive Propaganda

lsusrAug 22, 2021, 2:18 AM
25 points

13 votes

Overall karma indicates overall quality.

3 comments3 min readLW link

AI Risk for Epistemic Minimalists

Alex FlintAug 22, 2021, 3:39 PM
57 points

25 votes

Overall karma indicates overall quality.

12 comments13 min readLW link1 review

The Codex Skep­tic FAQ

Michaël TrazziAug 24, 2021, 4:01 PM
49 points

25 votes

Overall karma indicates overall quality.

24 comments2 min readLW link

How to turn money into AI safety?

Charlie SteinerAug 25, 2021, 10:49 AM
66 points

29 votes

Overall karma indicates overall quality.

26 comments8 min readLW link

In­tro­duc­tion to Re­duc­ing Goodhart

Charlie SteinerAug 26, 2021, 6:38 PM
40 points

14 votes

Overall karma indicates overall quality.

10 comments4 min readLW link

Could you have stopped Ch­er­nobyl?

Carlos RamirezAug 27, 2021, 1:48 AM
29 points

16 votes

Overall karma indicates overall quality.

17 comments8 min readLW link

[AN #162]: Foun­da­tion mod­els: a paradigm shift within AI

Rohin ShahAug 27, 2021, 5:20 PM
21 points

6 votes

Overall karma indicates overall quality.

0 comments8 min readLW link
(mailchi.mp)

A short in­tro­duc­tion to ma­chine learning

Richard_NgoAug 30, 2021, 2:31 PM
67 points

44 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

[Question] What could small scale dis­asters from AI look like?

CharlesDAug 31, 2021, 3:52 PM
14 points

6 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

NIST AI Risk Man­age­ment Frame­work re­quest for in­for­ma­tion (RFI)

Aryeh EnglanderSep 1, 2021, 12:15 AM
15 points

4 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Re­ward splin­ter­ing as re­verse of interpretability

Stuart_ArmstrongAug 31, 2021, 10:27 PM
10 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

What are bi­ases, any­way? Mul­ti­ple type signatures

Stuart_ArmstrongAug 31, 2021, 9:16 PM
11 points

2 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

Finite Fac­tored Sets: Applications

Scott GarrabrantAug 31, 2021, 9:19 PM
27 points

7 votes

Overall karma indicates overall quality.

1 comment10 min readLW link

Finite Fac­tored Sets: In­fer­ring Time

Scott GarrabrantAug 31, 2021, 9:18 PM
17 points

3 votes

Overall karma indicates overall quality.

5 comments4 min readLW link

US Mili­tary Global In­for­ma­tion Dom­i­nance Experiments

NunoSempereSep 1, 2021, 1:34 PM
25 points

12 votes

Overall karma indicates overall quality.

0 comments4 min readLW link
(www.defense.gov)

Com­pe­tent Preferences

Charlie SteinerSep 2, 2021, 2:26 PM
27 points

14 votes

Overall karma indicates overall quality.

2 comments6 min readLW link

For­mal­iz­ing Ob­jec­tions against Sur­ro­gate Goals

VojtaKovarikSep 2, 2021, 4:24 PM
5 points

4 votes

Overall karma indicates overall quality.

23 comments20 min readLW link

[Question] Is there a name for the the­ory that “There will be fast take­off in real-world ca­pa­bil­ities be­cause al­most ev­ery­thing is AGI-com­plete”?

David Scott Krueger (formerly: capybaralet)Sep 2, 2021, 11:00 PM
31 points

9 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

Thoughts on gra­di­ent hacking

Richard_NgoSep 3, 2021, 1:02 PM
33 points

11 votes

Overall karma indicates overall quality.

12 comments4 min readLW link

Why the tech­nolog­i­cal sin­gu­lar­ity by AGI may never happen

hippkeSep 3, 2021, 2:19 PM
5 points

12 votes

Overall karma indicates overall quality.

14 comments1 min readLW link

All Pos­si­ble Views About Hu­man­ity’s Fu­ture Are Wild

HoldenKarnofskySep 3, 2021, 8:19 PM
140 points

68 votes

Overall karma indicates overall quality.

40 comments8 min readLW link1 review

The Most Im­por­tant Cen­tury: Se­quence Introduction

HoldenKarnofskySep 3, 2021, 8:19 PM
68 points

22 votes

Overall karma indicates overall quality.

5 comments4 min readLW link1 review

[Question] Are there sub­stan­tial re­search efforts to­wards al­ign­ing nar­row AIs?

RossinSep 4, 2021, 6:40 PM
11 points

6 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

Multi-Agent In­verse Re­in­force­ment Learn­ing: Subop­ti­mal De­mon­stra­tions and Alter­na­tive Solu­tion Concepts

sage_bergersonSep 7, 2021, 4:11 PM
5 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Bayeswatch 7: Wildfire

lsusrSep 8, 2021, 5:35 AM
47 points

27 votes

Overall karma indicates overall quality.

6 comments3 min readLW link

[AN #163]: Us­ing finite fac­tored sets for causal and tem­po­ral inference

Rohin ShahSep 8, 2021, 5:20 PM
38 points

9 votes

Overall karma indicates overall quality.

0 comments10 min readLW link
(mailchi.mp)

Gra­di­ent de­scent is not just more effi­cient ge­netic algorithms

leogaoSep 8, 2021, 4:23 PM
54 points

17 votes

Overall karma indicates overall quality.

14 comments1 min readLW link

Sam Alt­man Q&A Notes—Aftermath

p.b.Sep 8, 2021, 8:20 AM
45 points

29 votes

Overall karma indicates overall quality.

35 comments2 min readLW link

[Question] Does blockchain tech­nol­ogy offer po­ten­tial solu­tions to some AI al­ign­ment prob­lems?

pilordSep 9, 2021, 4:51 PM
−4 points

4 votes

Overall karma indicates overall quality.

8 comments2 min readLW link

Countably Fac­tored Spaces

DiffractorSep 9, 2021, 4:24 AM
47 points

14 votes

Overall karma indicates overall quality.

3 comments18 min readLW link

The al­ign­ment prob­lem in differ­ent ca­pa­bil­ity regimes

BuckSep 9, 2021, 7:46 PM
87 points

29 votes

Overall karma indicates overall quality.

12 comments5 min readLW link

GPT-X, DALL-E, and our Mul­ti­modal Fu­ture [video se­ries]

bakztfutureSep 9, 2021, 11:05 PM
0 points

3 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(youtube.com)

Bayeswatch 8: Antimatter

lsusrSep 10, 2021, 5:01 AM
29 points

18 votes

Overall karma indicates overall quality.

6 comments3 min readLW link

Mea­sure­ment, Op­ti­miza­tion, and Take-off Speed

jsteinhardtSep 10, 2021, 7:30 PM
47 points

15 votes

Overall karma indicates overall quality.

4 comments13 min readLW link

Bayeswatch 9: Zombies

lsusrSep 11, 2021, 5:57 AM
41 points

22 votes

Overall karma indicates overall quality.

15 comments3 min readLW link

[Question] Is MIRI’s read­ing list up to date?

Aryeh EnglanderSep 11, 2021, 6:56 PM
25 points

13 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Soldiers, Scouts, and Al­ba­trosses.

JanSep 12, 2021, 10:36 AM
5 points

6 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(universalprior.substack.com)

GPT-Aug­mented Blogging

lsusrSep 14, 2021, 11:55 AM
52 points

24 votes

Overall karma indicates overall quality.

18 comments13 min readLW link

[AN #164]: How well can lan­guage mod­els write code?

Rohin ShahSep 15, 2021, 5:20 PM
13 points

4 votes

Overall karma indicates overall quality.

7 comments9 min readLW link
(mailchi.mp)

I wanted to in­ter­view Eliezer Yud­kowsky but he’s busy so I simu­lated him instead

lsusrSep 16, 2021, 7:34 AM
110 points

76 votes

Overall karma indicates overall quality.

33 comments5 min readLW link

Eco­nomic AI Safety

jsteinhardtSep 16, 2021, 8:50 PM
35 points

12 votes

Overall karma indicates overall quality.

3 comments5 min readLW link

Jit­ters No Ev­i­dence of Stu­pidity in RL

1a3ornSep 16, 2021, 10:43 PM
82 points

43 votes

Overall karma indicates overall quality.

18 comments3 min readLW link

Im­mo­bile AI makes a move: anti-wire­head­ing, on­tol­ogy change, and model splintering

Stuart_ArmstrongSep 17, 2021, 3:24 PM
32 points

11 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

Great Power Conflict

Zach Stein-PerlmanSep 17, 2021, 3:00 PM
11 points

6 votes

Overall karma indicates overall quality.

7 comments4 min readLW link

The the­ory-prac­tice gap

BuckSep 17, 2021, 10:51 PM
133 points

52 votes

Overall karma indicates overall quality.

14 comments6 min readLW link

[Book Re­view] “The Align­ment Prob­lem” by Brian Christian

lsusrSep 20, 2021, 6:36 AM
70 points

34 votes

Overall karma indicates overall quality.

16 comments6 min readLW link

AI, learn to be con­ser­va­tive, then learn to be less so: re­duc­ing side-effects, learn­ing pre­served fea­tures, and go­ing be­yond conservatism

Stuart_ArmstrongSep 20, 2021, 11:56 AM
14 points

5 votes

Overall karma indicates overall quality.

4 comments3 min readLW link

Sig­moids be­hav­ing badly: arXiv paper

Stuart_ArmstrongSep 20, 2021, 10:29 AM
24 points

9 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

[Question] How much should you be will­ing to pay for an AGI?

Logan ZoellnerSep 20, 2021, 11:51 AM
11 points

6 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

An­nounc­ing the Vi­talik Bu­terin Fel­low­ships in AI Ex­is­ten­tial Safety!

DanielFilanSep 21, 2021, 12:33 AM
64 points

19 votes

Overall karma indicates overall quality.

2 comments1 min readLW link
(grants.futureoflife.org)

Red­wood Re­search’s cur­rent project

BuckSep 21, 2021, 11:30 PM
143 points

60 votes

Overall karma indicates overall quality.

29 comments15 min readLW link

[Question] What are good mod­els of col­lu­sion in AI?

EconomicModelSep 22, 2021, 3:16 PM
7 points

5 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

[AN #165]: When large mod­els are more likely to lie

Rohin ShahSep 22, 2021, 5:30 PM
23 points

6 votes

Overall karma indicates overall quality.

0 comments8 min readLW link
(mailchi.mp)

Neu­ral net /​ de­ci­sion tree hy­brids: a po­ten­tial path to­ward bridg­ing the in­ter­pretabil­ity gap

Nathan Helm-BurgerSep 23, 2021, 12:38 AM
21 points

8 votes

Overall karma indicates overall quality.

2 comments12 min readLW link

What is Com­pute? - Trans­for­ma­tive AI and Com­pute [1/​4]

lennartSep 23, 2021, 4:25 PM
24 points

13 votes

Overall karma indicates overall quality.

8 comments19 min readLW link

Fore­cast­ing Trans­for­ma­tive AI, Part 1: What Kind of AI?

HoldenKarnofskySep 24, 2021, 12:46 AM
17 points

8 votes

Overall karma indicates overall quality.

17 comments9 min readLW link

Path­ways: Google’s AGI

Lê Nguyên HoangSep 25, 2021, 7:02 AM
44 points

12 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Cog­ni­tive Bi­ases in Large Lan­guage Models

JanSep 25, 2021, 8:59 PM
17 points

4 votes

Overall karma indicates overall quality.

3 comments12 min readLW link
(universalprior.substack.com)

Trans­for­ma­tive AI and Com­pute [Sum­mary]

lennartSep 26, 2021, 11:41 AM
13 points

9 votes

Overall karma indicates overall quality.

0 comments9 min readLW link

Beyond fire alarms: free­ing the groupstruck

KatjaGraceSep 26, 2021, 9:30 AM
81 points

30 votes

Overall karma indicates overall quality.

15 comments54 min readLW link
(worldspiritsockpuppet.com)

[Question] Any write­ups on GPT agency?

OzyrusSep 26, 2021, 10:55 PM
4 points

4 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

AI take­off story: a con­tinu­a­tion of progress by other means

Edouard HarrisSep 27, 2021, 3:55 PM
75 points

32 votes

Overall karma indicates overall quality.

13 comments10 min readLW link

A Con­fused Chemist’s Re­view of AlphaFold 2

J BostockSep 27, 2021, 11:10 AM
23 points

13 votes

Overall karma indicates overall quality.

4 comments5 min readLW link

[Question] Col­lec­tion of ar­gu­ments to ex­pect (outer and in­ner) al­ign­ment failure?

Sam ClarkeSep 28, 2021, 4:55 PM
20 points

4 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

Brain-in­spired AGI and the “life­time an­chor”

Steven ByrnesSep 29, 2021, 1:09 PM
64 points

30 votes

Overall karma indicates overall quality.

16 comments13 min readLW link

[Question] What Heuris­tics Do You Use to Think About Align­ment Topics?

Logan RiggsSep 29, 2021, 2:31 AM
5 points

1 vote

Overall karma indicates overall quality.

3 comments1 min readLW link

Bayeswatch 10: Spyware

lsusrSep 29, 2021, 7:01 AM
97 points

55 votes

Overall karma indicates overall quality.

7 comments4 min readLW link

Un­solved ML Safety Problems

jsteinhardtSep 29, 2021, 4:00 PM
58 points

21 votes

Overall karma indicates overall quality.

2 comments3 min readLW link
(bounded-regret.ghost.io)

Some Ex­ist­ing Selec­tion Theorems

johnswentworthSep 30, 2021, 4:13 PM
48 points

16 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

Fore­cast­ing Com­pute—Trans­for­ma­tive AI and Com­pute [2/​4]

lennartOct 2, 2021, 3:54 PM
17 points

10 votes

Overall karma indicates overall quality.

0 comments19 min readLW link

Nu­clear Es­pi­onage and AI Governance

GuiveOct 4, 2021, 11:04 PM
26 points

12 votes

Overall karma indicates overall quality.

5 comments24 min readLW link

Model­ling and Un­der­stand­ing SGD

J BostockOct 5, 2021, 1:41 PM
8 points

2 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

Force neu­ral nets to use mod­els, then de­tect these

Stuart_ArmstrongOct 5, 2021, 11:31 AM
17 points

8 votes

Overall karma indicates overall quality.

8 comments2 min readLW link

[Question] Is GPT-3 already sam­ple-effi­cient?

Daniel KokotajloOct 6, 2021, 1:38 PM
36 points

15 votes

Overall karma indicates overall quality.

32 comments1 min readLW link

Prefer­ences from (real and hy­po­thet­i­cal) psy­chol­ogy papers

Stuart_ArmstrongOct 6, 2021, 9:06 AM
15 points

5 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Au­to­mated Fact Check­ing: A Look at the Field

HoagyOct 6, 2021, 11:52 PM
12 points

8 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

Safety-ca­pa­bil­ities trade­off di­als are in­evitable in AGI

Steven ByrnesOct 7, 2021, 7:03 PM
57 points

17 votes

Overall karma indicates overall quality.

4 comments3 min readLW link

Bayeswatch 11: Parabellum

lsusrOct 9, 2021, 7:08 AM
32 points

14 votes

Overall karma indicates overall quality.

12 comments2 min readLW link

Steel­man ar­gu­ments against the idea that AGI is in­evitable and will ar­rive soon

RomanSOct 9, 2021, 6:22 AM
19 points

11 votes

Overall karma indicates overall quality.

13 comments4 min readLW link

In­tel­li­gence or Evolu­tion?

Ramana KumarOct 9, 2021, 5:14 PM
50 points

16 votes

Overall karma indicates overall quality.

15 comments3 min readLW link

Bayeswatch 12: The Sin­gu­lar­ity War

lsusrOct 10, 2021, 1:04 AM
32 points

18 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

The Ex­trap­o­la­tion Problem

lsusrOct 10, 2021, 5:11 AM
25 points

14 votes

Overall karma indicates overall quality.

8 comments2 min readLW link

The eval­u­a­tion func­tion of an AI is not its aim

Yair HalberstadtOct 10, 2021, 2:52 PM
13 points

9 votes

Overall karma indicates overall quality.

5 comments3 min readLW link

On Solv­ing Prob­lems Be­fore They Ap­pear: The Weird Episte­molo­gies of Alignment

adamShimiOct 11, 2021, 8:20 AM
97 points

34 votes

Overall karma indicates overall quality.

11 comments15 min readLW link

Bayeswatch 13: Spaceship

lsusrOct 12, 2021, 9:35 PM
51 points

31 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Com­pute Gover­nance and Con­clu­sions—Trans­for­ma­tive AI and Com­pute [3/​4]

lennartOct 14, 2021, 8:23 AM
13 points

7 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Clas­si­cal sym­bol ground­ing and causal graphs

Stuart_ArmstrongOct 14, 2021, 6:04 PM
22 points

8 votes

Overall karma indicates overall quality.

2 comments5 min readLW link

NLP Po­si­tion Paper: When Com­bat­ting Hype, Pro­ceed with Caution

Sam BowmanOct 15, 2021, 8:57 PM
46 points

16 votes

Overall karma indicates overall quality.

15 comments1 min readLW link

[Question] Memetic haz­ards of AGI ar­chi­tec­ture posts

OzyrusOct 16, 2021, 4:10 PM
9 points

9 votes

Overall karma indicates overall quality.

12 comments1 min readLW link

[Pre­dic­tion] We are in an Al­gorith­mic Over­hang, Part 2

lsusrOct 17, 2021, 7:48 AM
20 points

12 votes

Overall karma indicates overall quality.

29 comments2 min readLW link

Epistemic Strate­gies of Selec­tion Theorems

adamShimiOct 18, 2021, 8:57 AM
32 points

10 votes

Overall karma indicates overall quality.

1 comment12 min readLW link

On The Risks of Emer­gent Be­hav­ior in Foun­da­tion Models

jsteinhardtOct 18, 2021, 8:00 PM
30 points

9 votes

Overall karma indicates overall quality.

0 comments3 min readLW link
(bounded-regret.ghost.io)

Beyond the hu­man train­ing dis­tri­bu­tion: would the AI CEO cre­ate al­most-ille­gal ted­dies?

Stuart_ArmstrongOct 18, 2021, 9:10 PM
36 points

17 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

[AN #167]: Con­crete ML safety prob­lems and their rele­vance to x-risk

Rohin ShahOct 20, 2021, 5:10 PM
19 points

5 votes

Overall karma indicates overall quality.

4 comments9 min readLW link
(mailchi.mp)

Bor­ing ma­chine learn­ing is where it’s at

George3d6Oct 20, 2021, 11:23 AM
28 points

27 votes

Overall karma indicates overall quality.

16 comments3 min readLW link
(cerebralab.com)

AGI Safety Fun­da­men­tals cur­ricu­lum and application

Richard_NgoOct 20, 2021, 9:44 PM
67 points

23 votes

Overall karma indicates overall quality.

0 comments8 min readLW link
(docs.google.com)

Epistemic Strate­gies of Safety-Ca­pa­bil­ities Tradeoffs

adamShimiOct 22, 2021, 8:22 AM
5 points

4 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

Gen­eral al­ign­ment plus hu­man val­ues, or al­ign­ment via hu­man val­ues?

Stuart_ArmstrongOct 22, 2021, 10:11 AM
45 points

20 votes

Overall karma indicates overall quality.

27 comments3 min readLW link

Naive self-su­per­vised ap­proaches to truth­ful AI

ryan_greenblattOct 23, 2021, 1:03 PM
9 points

3 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

My ML Scal­ing bibliography

gwernOct 23, 2021, 2:41 PM
35 points

16 votes

Overall karma indicates overall quality.

9 comments1 min readLW link
(www.gwern.net)

Selfish­ness, prefer­ence falsifi­ca­tion, and AI alignment

jessicataOct 28, 2021, 12:16 AM
52 points

24 votes

Overall karma indicates overall quality.

29 comments13 min readLW link
(unstableontology.com)

[AN #168]: Four tech­ni­cal top­ics for which Open Phil is so­lic­it­ing grant proposals

Rohin ShahOct 28, 2021, 5:20 PM
15 points

4 votes

Overall karma indicates overall quality.

0 comments9 min readLW link
(mailchi.mp)

Fore­cast­ing progress in lan­guage models

Oct 28, 2021, 8:40 PM
54 points

25 votes

Overall karma indicates overall quality.

5 comments11 min readLW link
(www.metaculus.com)

Re­quest for pro­pos­als for pro­jects in AI al­ign­ment that work with deep learn­ing systems

Oct 29, 2021, 7:26 AM
87 points

22 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Interpretability

Oct 29, 2021, 7:28 AM
59 points

17 votes

Overall karma indicates overall quality.

13 comments12 min readLW link

Truth­ful and hon­est AI

Oct 29, 2021, 7:28 AM
41 points

13 votes

Overall karma indicates overall quality.

1 comment13 min readLW link

Mea­sur­ing and fore­cast­ing risks

Oct 29, 2021, 7:27 AM
20 points

6 votes

Overall karma indicates overall quality.

0 comments12 min readLW link

Tech­niques for en­hanc­ing hu­man feedback

Oct 29, 2021, 7:27 AM
22 points

8 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Stu­art Rus­sell and Me­lanie Mitchell on Munk Debates

Alex FlintOct 29, 2021, 7:13 PM
29 points

13 votes

Overall karma indicates overall quality.

3 comments3 min readLW link

True Sto­ries of Al­gorith­mic Improvement

johnswentworthOct 29, 2021, 8:57 PM
91 points

43 votes

Overall karma indicates overall quality.

7 comments5 min readLW link

Must true AI sleep?

YimbyGeorgeOct 30, 2021, 4:47 PM
0 points

4 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Nate Soares on the Ul­ti­mate New­comb’s Problem

Rob BensingerOct 31, 2021, 7:42 PM
56 points

24 votes

Overall karma indicates overall quality.

20 comments1 min readLW link

Models Model­ing Models

Charlie SteinerNov 2, 2021, 7:08 AM
20 points

10 votes

Overall karma indicates overall quality.

5 comments10 min readLW link

[Question] What’s the differ­ence be­tween newer Atari-play­ing AI and the older Deep­mind one (from 2014)?

RaemonNov 2, 2021, 11:36 PM
27 points

15 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

Ap­ply to the ML for Align­ment Boot­camp (MLAB) in Berkeley [Jan 3 - Jan 22]

Nov 3, 2021, 6:22 PM
95 points

29 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

[Ex­ter­nal Event] 2022 IEEE In­ter­na­tional Con­fer­ence on As­sured Au­ton­omy (ICAA) - sub­mis­sion dead­line extended

Aryeh EnglanderNov 5, 2021, 3:29 PM
13 points

3 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

Y2K: Suc­cess­ful Prac­tice for AI Alignment

DarmaniNov 5, 2021, 6:09 AM
47 points

24 votes

Overall karma indicates overall quality.

5 comments6 min readLW link

Some Re­marks on Reg­u­la­tor The­o­rems No One Asked For

Past AccountNov 5, 2021, 7:33 PM
19 points

6 votes

Overall karma indicates overall quality.

1 comment4 min readLW link

How should we com­pare neu­ral net­work rep­re­sen­ta­tions?

jsteinhardtNov 5, 2021, 10:10 PM
24 points

11 votes

Overall karma indicates overall quality.

0 comments3 min readLW link
(bounded-regret.ghost.io)

Drug ad­dicts and de­cep­tively al­igned agents—a com­par­a­tive analysis

JanNov 5, 2021, 9:42 PM
41 points

15 votes

Overall karma indicates overall quality.

2 comments12 min readLW link
(universalprior.substack.com)

Com­ments on OpenPhil’s In­ter­pretabil­ity RFP

paulfchristianoNov 5, 2021, 10:36 PM
84 points

30 votes

Overall karma indicates overall quality.

5 comments7 min readLW link

How do we be­come con­fi­dent in the safety of a ma­chine learn­ing sys­tem?

evhubNov 8, 2021, 10:49 PM
92 points

38 votes

Overall karma indicates overall quality.

2 comments32 min readLW link

[Question] What ex­actly is GPT-3′s base ob­jec­tive?

Daniel KokotajloNov 10, 2021, 12:57 AM
60 points

26 votes

Overall karma indicates overall quality.

15 comments2 min readLW link

Re­lax­ation-Based Search, From Every­day Life To Un­fa­mil­iar Territory

johnswentworthNov 10, 2021, 9:47 PM
57 points

27 votes

Overall karma indicates overall quality.

3 comments8 min readLW link

Us­ing blin­ders to help you see things for what they are

Adam ZernerNov 11, 2021, 7:07 AM
13 points

6 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

AGI is at least as far away as Nu­clear Fu­sion.

Logan ZoellnerNov 11, 2021, 9:33 PM
0 points

7 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

Mea­sur­ing and Fore­cast­ing Risks from AI

jsteinhardtNov 12, 2021, 2:30 AM
24 points

7 votes

Overall karma indicates overall quality.

0 comments3 min readLW link
(bounded-regret.ghost.io)

Why I’m ex­cited about Red­wood Re­search’s cur­rent project

paulfchristianoNov 12, 2021, 7:26 PM
112 points

58 votes

Overall karma indicates overall quality.

6 comments7 min readLW link

A Defense of Func­tional De­ci­sion Theory

HeighnNov 12, 2021, 8:59 PM
21 points

12 votes

Overall karma indicates overall quality.

120 comments10 min readLW link

Com­ments on Car­l­smith’s “Is power-seek­ing AI an ex­is­ten­tial risk?”

So8resNov 13, 2021, 4:29 AM
137 points

43 votes

Overall karma indicates overall quality.

13 comments40 min readLW link

[Question] What’s the like­li­hood of only sub ex­po­nen­tial growth for AGI?

M. Y. ZuoNov 13, 2021, 10:46 PM
5 points

4 votes

Overall karma indicates overall quality.

22 comments1 min readLW link

My cur­rent un­cer­tain­ties re­gard­ing AI, al­ign­ment, and the end of the world

dominicqNov 14, 2021, 2:08 PM
2 points

5 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

My un­der­stand­ing of the al­ign­ment problem

danieldeweyNov 15, 2021, 6:13 PM
43 points

17 votes

Overall karma indicates overall quality.

3 comments3 min readLW link

“Sum­ma­riz­ing Books with Hu­man Feed­back” (re­cur­sive GPT-3)

gwernNov 15, 2021, 5:41 PM
24 points

10 votes

Overall karma indicates overall quality.

4 comments1 min readLW link
(openai.com)

Quan­tilizer ≡ Op­ti­mizer with a Bounded Amount of Output

itaibn0Nov 16, 2021, 1:03 AM
10 points

4 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

Two Stupid AI Align­ment Ideas

aphyerNov 16, 2021, 4:13 PM
24 points

18 votes

Overall karma indicates overall quality.

3 comments4 min readLW link

[Question] What are the mu­tual benefits of AGI-hu­man col­lab­o­ra­tion that would oth­er­wise be un­ob­tain­able?

M. Y. ZuoNov 17, 2021, 3:09 AM
1 point

2 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Ap­pli­ca­tions for AI Safety Camp 2022 Now Open!

adamShimiNov 17, 2021, 9:42 PM
47 points

20 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Ngo and Yud­kowsky on AI ca­pa­bil­ity gains

Nov 18, 2021, 10:19 PM
129 points

41 votes

Overall karma indicates overall quality.

61 comments39 min readLW link

“Ac­qui­si­tion of Chess Knowl­edge in AlphaZero”: prob­ing AZ over time

jsdNov 18, 2021, 11:24 PM
11 points

5 votes

Overall karma indicates overall quality.

9 comments1 min readLW link
(arxiv.org)

How To Get Into In­de­pen­dent Re­search On Align­ment/​Agency

johnswentworthNov 19, 2021, 12:00 AM
314 points

157 votes

Overall karma indicates overall quality.

33 comments13 min readLW link

Good­hart: Endgame

Charlie SteinerNov 19, 2021, 1:26 AM
23 points

7 votes

Overall karma indicates overall quality.

3 comments8 min readLW link

More de­tailed pro­posal for mea­sur­ing al­ign­ment of cur­rent models

Beth BarnesNov 20, 2021, 12:03 AM
31 points

11 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

From lan­guage to ethics by au­to­mated reasoning

Michele CampoloNov 21, 2021, 3:16 PM
4 points

4 votes

Overall karma indicates overall quality.

4 comments6 min readLW link

Mo­rally un­der­defined situ­a­tions can be deadly

Stuart_ArmstrongNov 22, 2021, 2:48 PM
17 points

5 votes

Overall karma indicates overall quality.

8 comments2 min readLW link

Yud­kowsky and Chris­ti­ano dis­cuss “Take­off Speeds”

Eliezer YudkowskyNov 22, 2021, 7:35 PM
191 points

65 votes

Overall karma indicates overall quality.

181 comments60 min readLW link1 review

Po­ten­tial Align­ment men­tal tool: Keep­ing track of the types

Donald HobsonNov 22, 2021, 8:05 PM
28 points

9 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

For­mal­iz­ing Policy-Mod­ifi­ca­tion Corrigibility

TurnTroutDec 3, 2021, 1:31 AM
23 points

7 votes

Overall karma indicates overall quality.

6 comments6 min readLW link

[AN #169]: Col­lab­o­rat­ing with hu­mans with­out hu­man data

Rohin ShahNov 24, 2021, 6:30 PM
33 points

9 votes

Overall karma indicates overall quality.

0 comments8 min readLW link
(mailchi.mp)

Chris­ti­ano, Co­tra, and Yud­kowsky on AI progress

Nov 25, 2021, 4:45 PM
117 points

37 votes

Overall karma indicates overall quality.

95 comments68 min readLW link

Lat­a­cora might be of in­ter­est to some AI Safety organizations

NunoSempereNov 25, 2021, 11:57 PM
14 points

5 votes

Overall karma indicates overall quality.

10 comments1 min readLW link
(www.latacora.com)

Solve Cor­rigi­bil­ity Week

Logan RiggsNov 28, 2021, 5:00 PM
39 points

13 votes

Overall karma indicates overall quality.

21 comments1 min readLW link

TTS au­dio of “Ngo and Yud­kowsky on al­ign­ment difficulty”

Quintin PopeNov 28, 2021, 6:11 PM
4 points

1 vote

Overall karma indicates overall quality.

3 comments1 min readLW link

Red­wood Re­search is hiring for sev­eral roles

Nov 29, 2021, 12:16 AM
44 points

15 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Com­pute Re­search Ques­tions and Met­rics—Trans­for­ma­tive AI and Com­pute [4/​4]

lennartNov 28, 2021, 10:49 PM
6 points

3 votes

Overall karma indicates overall quality.

0 comments16 min readLW link

Com­ments on Allan Dafoe on AI Governance

Alex FlintNov 29, 2021, 4:16 PM
13 points

6 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

Soares, Tal­linn, and Yud­kowsky dis­cuss AGI cognition

Nov 29, 2021, 7:26 PM
118 points

36 votes

Overall karma indicates overall quality.

35 comments40 min readLW link

Self-study­ing to de­velop an in­side-view model of AI al­ign­ment; co-studiers wel­come!

Vael GatesNov 30, 2021, 9:25 AM
13 points

9 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Ma­chine Agents, Hy­brid Su­per­in­tel­li­gences, and The Loss of Hu­man Con­trol (Chap­ter 1)

Justin BullockNov 30, 2021, 5:35 PM
4 points

4 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

AXRP Epi­sode 12 - AI Ex­is­ten­tial Risk with Paul Christiano

DanielFilanDec 2, 2021, 2:20 AM
36 points

10 votes

Overall karma indicates overall quality.

0 comments125 min readLW link

Mo­ral­ity is Scary

Wei DaiDec 2, 2021, 6:35 AM
175 points

109 votes

Overall karma indicates overall quality.

125 comments4 min readLW link

Syd­ney AI Safety Fellowship

Chris_LeongDec 2, 2021, 7:34 AM
22 points

7 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

$100/​$50 re­wards for good references

Stuart_ArmstrongDec 3, 2021, 4:55 PM
20 points

9 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

[Question] Does the Struc­ture of an al­gorithm mat­ter for AI Risk and/​or con­scious­ness?

Logan ZoellnerDec 3, 2021, 6:31 PM
7 points

4 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

[Linkpost] A Gen­eral Lan­guage As­sis­tant as a Lab­o­ra­tory for Alignment

Quintin PopeDec 3, 2021, 7:42 PM
37 points

10 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

Agency: What it is and why it matters

Daniel KokotajloDec 4, 2021, 9:32 PM
25 points

11 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

[Question] Are limited-hori­zon agents a good heuris­tic for the off-switch prob­lem?

Yonadav ShavitDec 5, 2021, 7:27 PM
5 points

6 votes

Overall karma indicates overall quality.

19 comments1 min readLW link

In­tro­duc­tion to in­ac­cessible information

Ryan KiddDec 9, 2021, 1:28 AM
27 points

12 votes

Overall karma indicates overall quality.

6 comments8 min readLW link

More Chris­ti­ano, Co­tra, and Yud­kowsky on AI progress

Dec 6, 2021, 8:33 PM
85 points

30 votes

Overall karma indicates overall quality.

30 comments40 min readLW link

Ex­ter­mi­nat­ing hu­mans might be on the to-do list of a Friendly AI

RomanSDec 7, 2021, 2:15 PM
5 points

22 votes

Overall karma indicates overall quality.

8 comments2 min readLW link

In­ter­views on Im­prov­ing the AI Safety Pipeline

Chris_LeongDec 7, 2021, 12:03 PM
55 points

26 votes

Overall karma indicates overall quality.

16 comments17 min readLW link

Let’s buy out Cyc, for use in AGI in­ter­pretabil­ity sys­tems?

Steven ByrnesDec 7, 2021, 8:46 PM
47 points

21 votes

Overall karma indicates overall quality.

10 comments2 min readLW link

[AN #170]: An­a­lyz­ing the ar­gu­ment for risk from power-seek­ing AI

Rohin ShahDec 8, 2021, 6:10 PM
21 points

7 votes

Overall karma indicates overall quality.

1 comment7 min readLW link
(mailchi.mp)

[MLSN #2]: Ad­ver­sar­ial Training

Dan_HDec 9, 2021, 5:16 PM
26 points

10 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

Su­per­vised learn­ing and self-mod­el­ing: What’s “su­per­hu­man?”

Charlie SteinerDec 9, 2021, 12:44 PM
12 points

4 votes

Overall karma indicates overall quality.

1 comment8 min readLW link

Some ab­stract, non-tech­ni­cal rea­sons to be non-max­i­mally-pes­simistic about AI alignment

Rob BensingerDec 12, 2021, 2:08 AM
66 points

35 votes

Overall karma indicates overall quality.

37 comments7 min readLW link

Trans­form­ing my­opic op­ti­miza­tion to or­di­nary op­ti­miza­tion—Do we want to seek con­ver­gence for my­opic op­ti­miza­tion prob­lems?

tailcalledDec 11, 2021, 8:38 PM
12 points

2 votes

Overall karma indicates overall quality.

1 comment5 min readLW link

Red­wood’s Tech­nique-Fo­cused Epistemic Strategy

adamShimiDec 12, 2021, 4:36 PM
48 points

21 votes

Overall karma indicates overall quality.

1 comment7 min readLW link

[Question] [Re­solved] Who else prefers “AI al­ign­ment” to “AI safety?”

Evan_GaensbauerDec 13, 2021, 12:35 AM
5 points

4 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

Hard-Cod­ing Neu­ral Computation

MadHatterDec 13, 2021, 4:35 AM
32 points

14 votes

Overall karma indicates overall quality.

8 comments27 min readLW link

Solv­ing In­ter­pretabil­ity Week

Logan RiggsDec 13, 2021, 5:09 PM
11 points

5 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Un­der­stand­ing and con­trol­ling auto-in­duced dis­tri­bu­tional shift

L Rudolf LDec 13, 2021, 2:59 PM
26 points

10 votes

Overall karma indicates overall quality.

3 comments16 min readLW link

Lan­guage Model Align­ment Re­search Internships

Ethan PerezDec 13, 2021, 7:53 PM
68 points

34 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

En­abling More Feed­back for AI Safety Researchers

frances_lorenzDec 13, 2021, 8:10 PM
17 points

9 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

ARC’s first tech­ni­cal re­port: Elic­it­ing La­tent Knowledge

Dec 14, 2021, 8:09 PM
212 points

62 votes

Overall karma indicates overall quality.

88 comments1 min readLW link
(docs.google.com)

In­ter­lude: Agents as Automobiles

Daniel KokotajloDec 14, 2021, 6:49 PM
25 points

9 votes

Overall karma indicates overall quality.

6 comments5 min readLW link

ARC is hiring!

Dec 14, 2021, 8:09 PM
62 points

21 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Ngo’s view on al­ign­ment difficulty

Dec 14, 2021, 9:34 PM
63 points

24 votes

Overall karma indicates overall quality.

7 comments17 min readLW link

The Nat­u­ral Ab­strac­tion Hy­poth­e­sis: Im­pli­ca­tions and Evidence

CallumMcDougallDec 14, 2021, 11:14 PM
30 points

15 votes

Overall karma indicates overall quality.

8 comments19 min readLW link

Elic­i­ta­tion for Model­ing Trans­for­ma­tive AI Risks

DavidmanheimDec 16, 2021, 3:24 PM
30 points

10 votes

Overall karma indicates overall quality.

2 comments9 min readLW link

Some mo­ti­va­tions to gra­di­ent hack

peterbarnettDec 17, 2021, 3:06 AM
8 points

5 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

In­tro­duc­ing the Prin­ci­ples of In­tel­li­gent Be­havi­our in Biolog­i­cal and So­cial Sys­tems (PIBBSS) Fellowship

adamShimiDec 18, 2021, 3:23 PM
51 points

18 votes

Overall karma indicates overall quality.

4 comments10 min readLW link

[Question] Im­por­tant ML sys­tems from be­fore 2012?

JsevillamolDec 18, 2021, 12:12 PM
12 points

7 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

[Ex­tended Dead­line: Jan 23rd] An­nounc­ing the PIBBSS Sum­mer Re­search Fellowship

Nora_AmmannDec 18, 2021, 4:56 PM
6 points

4 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Ex­plor­ing De­ci­sion The­o­ries With Coun­ter­fac­tu­als and Dy­namic Agent Self-Pointers

JoshuaOSHickmanDec 18, 2021, 9:50 PM
2 points

1 vote

Overall karma indicates overall quality.

0 comments4 min readLW link

Don’t In­fluence the In­fluencers!

lhcDec 19, 2021, 9:02 AM
14 points

6 votes

Overall karma indicates overall quality.

2 comments10 min readLW link

SGD Un­der­stood through Prob­a­bil­ity Current

J BostockDec 19, 2021, 11:26 PM
23 points

9 votes

Overall karma indicates overall quality.

1 comment5 min readLW link

Worst-case think­ing in AI alignment

BuckDec 23, 2021, 1:29 AM
139 points

62 votes

Overall karma indicates overall quality.

15 comments6 min readLW link

2021 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

LarksDec 23, 2021, 2:06 PM
164 points

60 votes

Overall karma indicates overall quality.

26 comments73 min readLW link

Re­ply to Eliezer on Biolog­i­cal Anchors

HoldenKarnofskyDec 23, 2021, 4:15 PM
146 points

67 votes

Overall karma indicates overall quality.

46 comments15 min readLW link

Risks from AI persuasion

Beth BarnesDec 24, 2021, 1:48 AM
68 points

29 votes

Overall karma indicates overall quality.

15 comments31 min readLW link

Un­der­stand­ing the ten­sor product for­mu­la­tion in Trans­former Circuits

Tom LieberumDec 24, 2021, 6:05 PM
16 points

8 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

Mechanis­tic In­ter­pretabil­ity for the MLP Lay­ers (rough early thoughts)

MadHatterDec 24, 2021, 7:24 AM
11 points

6 votes

Overall karma indicates overall quality.

2 comments1 min readLW link
(www.youtube.com)

My Overview of the AI Align­ment Land­scape: Threat Models

Neel NandaDec 25, 2021, 11:07 PM
50 points

19 votes

Overall karma indicates overall quality.

4 comments28 min readLW link

Re­in­force­ment Learn­ing Study Group

Kay KozaronekDec 26, 2021, 11:11 PM
20 points

13 votes

Overall karma indicates overall quality.

9 comments1 min readLW link

AI Fire Alarm Scenarios

PeterMcCluskeyDec 28, 2021, 2:20 AM
10 points

3 votes

Overall karma indicates overall quality.

0 comments6 min readLW link
(www.bayesianinvestor.com)

Re­v­erse-en­g­ineer­ing us­ing interpretability

Beth BarnesDec 29, 2021, 11:21 PM
21 points

8 votes

Overall karma indicates overall quality.

1 comment5 min readLW link

Coun­terex­am­ples to some ELK proposals

paulfchristianoDec 31, 2021, 5:05 PM
50 points

15 votes

Overall karma indicates overall quality.

10 comments7 min readLW link

We Choose To Align AI

johnswentworthJan 1, 2022, 8:06 PM
259 points

169 votes

Overall karma indicates overall quality.

15 comments3 min readLW link

Why don’t we just, like, try and build safe AGI?

SunJan 1, 2022, 11:24 PM
0 points

8 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

[Question] Tag for AI al­ign­ment?

Alex_AltairJan 2, 2022, 6:55 PM
7 points

2 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

How an alien the­ory of mind might be unlearnable

Stuart_ArmstrongJan 3, 2022, 11:16 AM
26 points

18 votes

Overall karma indicates overall quality.

35 comments5 min readLW link

Shad­ows Of The Com­ing Race (1879)

CapybasiliskJan 3, 2022, 3:55 PM
49 points

14 votes

Overall karma indicates overall quality.

4 comments7 min readLW link

Ap­ply for re­search in­tern­ships at ARC!

paulfchristianoJan 3, 2022, 8:26 PM
61 points

24 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Promis­ing posts on AF that have fallen through the cracks

Evan R. MurphyJan 4, 2022, 3:39 PM
33 points

13 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

You can’t un­der­stand hu­man agency with­out un­der­stand­ing amoeba agency

ShmiJan 6, 2022, 4:42 AM
19 points

13 votes

Overall karma indicates overall quality.

36 comments1 min readLW link

Satisf-AI: A Route to Re­duc­ing Risks From AI

harsimonyJan 6, 2022, 2:34 AM
4 points

2 votes

Overall karma indicates overall quality.

1 comment4 min readLW link
(harsimony.wordpress.com)

Im­por­tance of fore­sight eval­u­a­tions within ELK

Jonathan UesatoJan 6, 2022, 3:34 PM
25 points

6 votes

Overall karma indicates overall quality.

1 comment10 min readLW link

Goal-di­rect­ed­ness: my baseline beliefs

Morgan_RogersJan 8, 2022, 1:09 PM
21 points

7 votes

Overall karma indicates overall quality.

3 comments3 min readLW link

The Un­rea­son­able Fea­si­bil­ity Of Play­ing Chess Un­der The Influence

JanJan 12, 2022, 11:09 PM
29 points

16 votes

Overall karma indicates overall quality.

17 comments13 min readLW link
(universalprior.substack.com)

New year, new re­search agenda post

Charlie SteinerJan 12, 2022, 5:58 PM
29 points

11 votes

Overall karma indicates overall quality.

4 comments16 min readLW link

Value ex­trap­o­la­tion par­tially re­solves sym­bol grounding

Stuart_ArmstrongJan 12, 2022, 4:30 PM
24 points

7 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

2020 Re­view Article

VaniverJan 14, 2022, 4:58 AM
74 points

36 votes

Overall karma indicates overall quality.

3 comments7 min readLW link

The Greedy Doc­tor Prob­lem… turns out to be rele­vant to the ELK prob­lem?

JanJan 14, 2022, 11:58 AM
33 points

17 votes

Overall karma indicates overall quality.

10 comments14 min readLW link
(universalprior.substack.com)

PIBBSS Fel­low­ship: Bounty for Refer­rals & Dead­line Extension

Anna GajdovaJan 17, 2022, 4:23 PM
7 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Differ­ent way clas­sifiers can be diverse

Stuart_ArmstrongJan 17, 2022, 4:30 PM
10 points

2 votes

Overall karma indicates overall quality.

5 comments2 min readLW link

Scalar re­ward is not enough for al­igned AGI

Peter VamplewJan 17, 2022, 9:02 PM
15 points

11 votes

Overall karma indicates overall quality.

3 comments11 min readLW link

Challenges with Break­ing into MIRI-Style Research

Chris_LeongJan 17, 2022, 9:23 AM
72 points

40 votes

Overall karma indicates overall quality.

15 comments3 min readLW link

Thought Ex­per­i­ments Provide a Third Anchor

jsteinhardtJan 18, 2022, 4:00 PM
44 points

18 votes

Overall karma indicates overall quality.

20 comments4 min readLW link
(bounded-regret.ghost.io)

An­chor Weights for ML

jsteinhardtJan 20, 2022, 4:20 PM
17 points

7 votes

Overall karma indicates overall quality.

2 comments2 min readLW link
(bounded-regret.ghost.io)

Es­ti­mat­ing train­ing com­pute of Deep Learn­ing models

Jan 20, 2022, 4:12 PM
37 points

18 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Shar­ing Pow­er­ful AI Models

apcJan 21, 2022, 11:57 AM
6 points

3 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

[AN #171]: Disagree­ments be­tween al­ign­ment “op­ti­mists” and “pes­simists”

Rohin ShahJan 21, 2022, 6:30 PM
32 points

18 votes

Overall karma indicates overall quality.

1 comment7 min readLW link
(mailchi.mp)

A one-ques­tion Tur­ing test for GPT-3

Jan 22, 2022, 6:17 PM
84 points

49 votes

Overall karma indicates overall quality.

23 comments5 min readLW link

ML Sys­tems Will Have Weird Failure Modes

jsteinhardtJan 26, 2022, 1:40 AM
54 points

15 votes

Overall karma indicates overall quality.

8 comments6 min readLW link
(bounded-regret.ghost.io)

Search Is All You Need

blake8086Jan 25, 2022, 11:13 PM
33 points

27 votes

Overall karma indicates overall quality.

13 comments3 min readLW link

Aligned AI Needs Slack

ShmiJan 26, 2022, 9:29 AM
23 points

15 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

Em­piri­cal Find­ings Gen­er­al­ize Sur­pris­ingly Far

jsteinhardtFeb 1, 2022, 10:30 PM
46 points

20 votes

Overall karma indicates overall quality.

0 comments6 min readLW link
(bounded-regret.ghost.io)

OpenAI Solves (Some) For­mal Math Olympiad Problems

Michaël TrazziFeb 2, 2022, 9:49 PM
77 points

34 votes

Overall karma indicates overall quality.

26 comments2 min readLW link

Ob­served pat­terns around ma­jor tech­nolog­i­cal advancements

Richard Korzekwa Feb 3, 2022, 12:30 AM
45 points

15 votes

Overall karma indicates overall quality.

15 comments11 min readLW link
(aiimpacts.org)

Paper­clip­pers, s-risks, hope

superads91Feb 4, 2022, 7:03 PM
13 points

10 votes

Overall karma indicates overall quality.

17 comments1 min readLW link

AI Wri­teup Part 1

SNlFeb 4, 2022, 9:16 PM
8 points

6 votes

Overall karma indicates overall quality.

1 comment18 min readLW link

Align­ment ver­sus AI Alignment

Alex FlintFeb 4, 2022, 10:59 PM
87 points

29 votes

Overall karma indicates overall quality.

15 comments22 min readLW link

Ca­pa­bil­ity Phase Tran­si­tion Examples

gwernFeb 8, 2022, 3:32 AM
39 points

12 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(www.reddit.com)

A broad basin of at­trac­tion around hu­man val­ues?

Wei DaiApr 12, 2022, 5:15 AM
105 points

41 votes

Overall karma indicates overall quality.

16 comments2 min readLW link

Ap­pendix: More Is Differ­ent In Other Domains

jsteinhardtFeb 8, 2022, 4:00 PM
12 points

4 votes

Overall karma indicates overall quality.

1 comment4 min readLW link
(bounded-regret.ghost.io)

[In­tro to brain-like-AGI safety] 2. “Learn­ing from scratch” in the brain

Steven ByrnesFeb 2, 2022, 1:22 PM
43 points

22 votes

Overall karma indicates overall quality.

12 comments25 min readLW link

Bet­ter im­pos­si­bil­ity re­sult for un­bounded utilities

paulfchristianoFeb 9, 2022, 6:10 AM
29 points

13 votes

Overall karma indicates overall quality.

24 comments5 min readLW link

EleutherAI’s GPT-NeoX-20B release

leogaoFeb 10, 2022, 6:56 AM
30 points

14 votes

Overall karma indicates overall quality.

3 comments1 min readLW link
(eaidata.bmk.sh)

In­fer­ring util­ity func­tions from lo­cally non-tran­si­tive preferences

JanFeb 10, 2022, 10:33 AM
28 points

15 votes

Overall karma indicates overall quality.

15 comments8 min readLW link
(universalprior.substack.com)

A sum­mary of al­ign­ing nar­rowly su­per­hu­man models

guguFeb 10, 2022, 6:26 PM
8 points

2 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

Idea: build al­ign­ment dataset for very ca­pa­ble models

Quintin PopeFeb 12, 2022, 7:30 PM
9 points

3 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

Goal-di­rect­ed­ness: ex­plor­ing explanations

Morgan_RogersFeb 14, 2022, 4:20 PM
13 points

5 votes

Overall karma indicates overall quality.

3 comments18 min readLW link

Is ELK enough? Di­a­mond, Ma­trix and Child AI

adamShimiFeb 15, 2022, 2:29 AM
17 points

7 votes

Overall karma indicates overall quality.

10 comments4 min readLW link

What Does The Nat­u­ral Ab­strac­tion Frame­work Say About ELK?

johnswentworthFeb 15, 2022, 2:27 AM
34 points

15 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

Some Hacky ELK Ideas

johnswentworthFeb 15, 2022, 2:27 AM
34 points

9 votes

Overall karma indicates overall quality.

8 comments5 min readLW link

How harm­ful are im­prove­ments in AI? + Poll

Feb 15, 2022, 6:16 PM
15 points

10 votes

Overall karma indicates overall quality.

4 comments8 min readLW link

Be­com­ing Stronger as Episte­mol­o­gist: Introduction

adamShimiFeb 15, 2022, 6:15 AM
29 points

10 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

REPL’s: a type sig­na­ture for agents

scottviteriFeb 15, 2022, 10:57 PM
23 points

13 votes

Overall karma indicates overall quality.

5 comments2 min readLW link

REPL’s and ELK

scottviteriFeb 17, 2022, 1:14 AM
9 points

6 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

[Link] Eric Sch­midt’s new AI2050 Fund

Aryeh EnglanderFeb 16, 2022, 9:21 PM
32 points

19 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

Align­ment re­searchers, how use­ful is ex­tra com­pute for you?

Lauro LangoscoFeb 19, 2022, 3:35 PM
7 points

5 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

[Question] 2 (naive?) ideas for alignment

Jonathan MoregårdFeb 20, 2022, 7:01 PM
3 points

2 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

The Big Pic­ture Of Align­ment (Talk Part 1)

johnswentworthFeb 21, 2022, 5:49 AM
98 points

28 votes

Overall karma indicates overall quality.

35 comments1 min readLW link
(www.youtube.com)

[Question] Fa­vorite /​ most ob­scure re­search on un­der­stand­ing DNNs?

Vivek HebbarFeb 21, 2022, 5:49 AM
16 points

8 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Two Challenges for ELK

derek shillerFeb 21, 2022, 5:49 AM
7 points

5 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

[Question] Do any AI al­ign­ment orgs hire re­motely?

RobertMFeb 21, 2022, 10:33 PM
24 points

13 votes

Overall karma indicates overall quality.

9 comments2 min readLW link

More GPT-3 and sym­bol grounding

Stuart_ArmstrongFeb 23, 2022, 6:30 PM
21 points

10 votes

Overall karma indicates overall quality.

7 comments3 min readLW link

Trans­former in­duc­tive bi­ases & RASP

Vivek HebbarFeb 24, 2022, 12:42 AM
15 points

6 votes

Overall karma indicates overall quality.

4 comments1 min readLW link
(proceedings.mlr.press)

A com­ment on Ajeya Co­tra’s draft re­port on AI timelines

Matthew BarnettFeb 24, 2022, 12:41 AM
69 points

35 votes

Overall karma indicates overall quality.

13 comments7 min readLW link

The Big Pic­ture Of Align­ment (Talk Part 2)

johnswentworthFeb 25, 2022, 2:53 AM
33 points

12 votes

Overall karma indicates overall quality.

12 comments1 min readLW link
(www.youtube.com)

Trust-max­i­miz­ing AGI

Feb 25, 2022, 3:13 PM
7 points

5 votes

Overall karma indicates overall quality.

26 comments9 min readLW link
(universalprior.substack.com)

IMO challenge bet with Eliezer

paulfchristianoFeb 26, 2022, 4:50 AM
162 points

65 votes

Overall karma indicates overall quality.

25 comments3 min readLW link

New Speaker Series on AI Align­ment Start­ing March 3

Zechen ZhangFeb 26, 2022, 7:31 PM
7 points

4 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

How I Formed My Own Views About AI Safety

Neel NandaFeb 27, 2022, 6:50 PM
64 points

31 votes

Overall karma indicates overall quality.

6 comments13 min readLW link
(www.neelnanda.io)

Shah and Yud­kowsky on al­ign­ment failures

Feb 28, 2022, 7:18 PM
83 points

26 votes

Overall karma indicates overall quality.

38 comments91 min readLW link

ELK Thought Dump

abramdemskiFeb 28, 2022, 6:46 PM
58 points

16 votes

Overall karma indicates overall quality.

18 comments17 min readLW link

Late 2021 MIRI Con­ver­sa­tions: AMA /​ Discussion

Rob BensingerFeb 28, 2022, 8:03 PM
119 points

42 votes

Overall karma indicates overall quality.

208 comments1 min readLW link

[Question] What are the causal­ity effects of an agents pres­ence in a re­in­force­ment learn­ing environment

Jonas KgomoMar 1, 2022, 9:57 PM
0 points

2 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Mus­ings on the Speed Prior

evhubMar 2, 2022, 4:04 AM
19 points

11 votes

Overall karma indicates overall quality.

4 comments10 min readLW link

AI Perfor­mance on Hu­man Tasks

Asher EllisMar 3, 2022, 8:13 PM
58 points

20 votes

Overall karma indicates overall quality.

3 comments21 min readLW link

In­tro­duc­ing my­self: Henry Lie­ber­man, MIT CSAIL, why­cantwe.org

Henry A LiebermanMar 3, 2022, 11:42 PM
−2 points

15 votes

Overall karma indicates overall quality.

9 comments1 min readLW link

Pre­serv­ing and con­tin­u­ing al­ign­ment re­search through a se­vere global catastrophe

A_donorMar 6, 2022, 6:43 PM
36 points

17 votes

Overall karma indicates overall quality.

11 comments5 min readLW link

Why work at AI Im­pacts?

KatjaMar 6, 2022, 10:10 PM
50 points

18 votes

Overall karma indicates overall quality.

7 comments13 min readLW link
(aiimpacts.org)

Per­sonal imi­ta­tion software

FlaglandbaseMar 7, 2022, 7:55 AM
6 points

2 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

[MLSN #3]: NeurIPS Safety Paper Roundup

Dan HMar 8, 2022, 3:17 PM
45 points

16 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

ELK prize results

Mar 9, 2022, 12:01 AM
130 points

50 votes

Overall karma indicates overall quality.

50 comments21 min readLW link

[Question] Non-co­er­cive mo­ti­va­tion for al­ign­ment re­search?

Jonathan MoregårdMar 8, 2022, 8:50 PM
1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

On pre­sent­ing the case for AI risk

Aryeh EnglanderMar 9, 2022, 1:41 AM
54 points

25 votes

Overall karma indicates overall quality.

18 comments4 min readLW link

Ask AI com­pa­nies about what they are do­ing for AI safety?

micMar 9, 2022, 3:14 PM
50 points

26 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Deriv­ing Our World From Small Datasets

CapybasiliskMar 9, 2022, 12:34 AM
5 points

2 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

Value ex­trap­o­la­tion, con­cept ex­trap­o­la­tion, model splintering

Stuart_ArmstrongMar 8, 2022, 10:50 PM
14 points

4 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

The Proof of Doom

johnlawrenceaspdenMar 9, 2022, 7:37 PM
27 points

17 votes

Overall karma indicates overall quality.

18 comments3 min readLW link

A Rephras­ing Of and Foot­note To An Embed­ded Agency Proposal

JoshuaOSHickmanMar 9, 2022, 6:13 PM
5 points

3 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

ELK Sub—Note-tak­ing in in­ter­nal rollouts

HoagyMar 9, 2022, 5:23 PM
6 points

4 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

[Question] Are there any im­pos­si­bil­ity the­o­rems for strong and safe AI?

David JohnstonMar 11, 2022, 1:41 AM
5 points

3 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Com­pute Trends — Com­par­i­son to OpenAI’s AI and Compute

Mar 12, 2022, 6:09 PM
23 points

7 votes

Overall karma indicates overall quality.

3 comments3 min readLW link

ELK con­test sub­mis­sion: route un­der­stand­ing through the hu­man ontology

Mar 14, 2022, 9:42 PM
21 points

9 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

Dual use of ar­tifi­cial-in­tel­li­gence-pow­ered drug discovery

VaniverMar 15, 2022, 2:52 AM
91 points

47 votes

Overall karma indicates overall quality.

15 comments1 min readLW link
(www.nature.com)

[In­tro to brain-like-AGI safety] 8. Take­aways from neuro 1/​2: On AGI development

Steven ByrnesMar 16, 2022, 1:59 PM
41 points

15 votes

Overall karma indicates overall quality.

2 comments15 min readLW link

Some (po­ten­tially) fund­able AI Safety Ideas

Logan RiggsMar 16, 2022, 12:48 PM
21 points

13 votes

Overall karma indicates overall quality.

5 comments5 min readLW link

What do paradigm shifts look like?

leogaoMar 16, 2022, 7:17 PM
15 points

7 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

[Question] What is the equiv­a­lent of the “do” op­er­a­tor for finite fac­tored sets?

Chris van MerwijkMar 17, 2022, 8:05 AM
8 points

3 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

[Question] What to do af­ter in­vent­ing AGI?

elephantcrewMar 18, 2022, 10:30 PM
9 points

8 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Goal-di­rect­ed­ness: im­perfect rea­son­ing, limited knowl­edge and in­ac­cu­rate beliefs

Morgan_RogersMar 19, 2022, 5:28 PM
4 points

2 votes

Overall karma indicates overall quality.

1 comment21 min readLW link

Wargam­ing AGI Development

ryan_bMar 19, 2022, 5:59 PM
36 points

12 votes

Overall karma indicates overall quality.

13 comments5 min readLW link

Ex­plor­ing Finite Fac­tored Sets with some toy examples

Thomas KehrenbergMar 19, 2022, 10:08 PM
36 points

14 votes

Overall karma indicates overall quality.

1 comment9 min readLW link
(tm.kehrenberg.net)

Nat­u­ral Value Learning

Chris van MerwijkMar 20, 2022, 12:44 PM
7 points

6 votes

Overall karma indicates overall quality.

10 comments4 min readLW link

Why will an AGI be ra­tio­nal?

azsantoskMar 21, 2022, 9:54 PM
4 points

5 votes

Overall karma indicates overall quality.

8 comments2 min readLW link

We can­not di­rectly choose an AGI’s util­ity function

azsantoskMar 21, 2022, 10:08 PM
12 points

10 votes

Overall karma indicates overall quality.

18 comments3 min readLW link

Progress Re­port 1: in­ter­pretabil­ity ex­per­i­ments & learn­ing, test­ing com­pres­sion hypotheses

Nathan Helm-BurgerMar 22, 2022, 8:12 PM
11 points

6 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Les­sons After a Cou­ple Months of Try­ing to Do ML Research

RowanWangMar 22, 2022, 11:45 PM
68 points

41 votes

Overall karma indicates overall quality.

8 comments6 min readLW link

Job Offer­ing: Help Com­mu­ni­cate Infrabayesianism

Mar 23, 2022, 6:35 PM
135 points

44 votes

Overall karma indicates overall quality.

21 comments1 min readLW link

A sur­vey of tool use and work­flows in al­ign­ment research

Mar 23, 2022, 11:44 PM
43 points

21 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Why Agent Foun­da­tions? An Overly Ab­stract Explanation

johnswentworthMar 25, 2022, 11:17 PM
247 points

113 votes

Overall karma indicates overall quality.

54 comments8 min readLW link

[ASoT] Ob­ser­va­tions about ELK

leogaoMar 26, 2022, 12:42 AM
30 points

12 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

[Question] When peo­ple ask for your P(doom), do you give them your in­side view or your bet­ting odds?

Vivek HebbarMar 26, 2022, 11:08 PM
11 points

5 votes

Overall karma indicates overall quality.

12 comments1 min readLW link

Com­pute Gover­nance: The Role of Com­mod­ity Hardware

JanMar 26, 2022, 10:08 AM
14 points

8 votes

Overall karma indicates overall quality.

7 comments7 min readLW link
(universalprior.substack.com)

Agency and Coherence

David UdellMar 26, 2022, 7:25 PM
23 points

15 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

[ASoT] Some ways ELK could still be solv­able in practice

leogaoMar 27, 2022, 1:15 AM
26 points

9 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

[Question] Your spe­cific at­ti­tudes to­wards AI safety

Esben KranMar 27, 2022, 10:33 PM
8 points

6 votes

Overall karma indicates overall quality.

22 comments1 min readLW link

[ASoT] Search­ing for con­se­quen­tial­ist structure

leogaoMar 27, 2022, 7:09 PM
25 points

12 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

Vaniver’s ELK Submission

VaniverMar 28, 2022, 9:14 PM
10 points

1 vote

Overall karma indicates overall quality.

0 comments7 min readLW link

Towards a bet­ter cir­cuit prior: Im­prov­ing on ELK state-of-the-art

evhubMar 29, 2022, 1:56 AM
19 points

7 votes

Overall karma indicates overall quality.

0 comments16 min readLW link

Strate­gies for differ­en­tial di­vul­ga­tion of key ideas in AI capability

azsantoskMar 29, 2022, 3:22 AM
8 points

5 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

[ASoT] Some thoughts about de­cep­tive mesaoptimization

leogaoMar 28, 2022, 9:14 PM
24 points

8 votes

Overall karma indicates overall quality.

5 comments7 min readLW link

[Question] What would make you con­fi­dent that AGI has been achieved?

YitzMar 29, 2022, 11:02 PM
17 points

8 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Progress Re­port 2

Nathan Helm-BurgerMar 30, 2022, 2:29 AM
4 points

3 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

[ASoT] Some thoughts about LM monologue limi­ta­tions and ELK

leogaoMar 30, 2022, 2:26 PM
10 points

6 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Pro­ce­du­rally eval­u­at­ing fac­tual ac­cu­racy: a re­quest for research

Jacob_HiltonMar 30, 2022, 4:37 PM
24 points

12 votes

Overall karma indicates overall quality.

2 comments6 min readLW link

No, EDT Did Not Get It Right All Along: Why the Coin Flip Creation Prob­lem Is Irrelevant

HeighnMar 30, 2022, 6:41 PM
6 points

5 votes

Overall karma indicates overall quality.

6 comments3 min readLW link

ELK Com­pu­ta­tional Com­plex­ity: Three Levels of Difficulty

abramdemskiMar 30, 2022, 8:56 PM
46 points

11 votes

Overall karma indicates overall quality.

9 comments7 min readLW link

[Link] Train­ing Com­pute-Op­ti­mal Large Lan­guage Models

nostalgebraistMar 31, 2022, 6:01 PM
50 points

24 votes

Overall karma indicates overall quality.

23 comments1 min readLW link
(arxiv.org)

New­comb’s prob­lem is just a stan­dard time con­sis­tency problem

basil.halperinMar 31, 2022, 5:32 PM
12 points

10 votes

Overall karma indicates overall quality.

6 comments12 min readLW link

The Calcu­lus of New­comb’s Problem

HeighnApr 1, 2022, 2:41 PM
3 points

1 vote

Overall karma indicates overall quality.

6 comments2 min readLW link

New Scal­ing Laws for Large Lan­guage Models

1a3ornApr 1, 2022, 8:41 PM
223 points

116 votes

Overall karma indicates overall quality.

21 comments5 min readLW link

In­ter­act­ing with a Boxed AI

aphyerApr 1, 2022, 10:42 PM
11 points

5 votes

Overall karma indicates overall quality.

19 comments4 min readLW link

Op­ti­mal­ity is the tiger, and agents are its teeth

VeedracApr 2, 2022, 12:46 AM
197 points

93 votes

Overall karma indicates overall quality.

31 comments16 min readLW link

[Question] How can a lay­man con­tribute to AI Align­ment efforts, given shorter timeline/​doomier sce­nar­ios?

AprilSRApr 2, 2022, 4:34 AM
13 points

6 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

AI Gover­nance across Slow/​Fast Take­off and Easy/​Hard Align­ment spectra

DavidmanheimApr 3, 2022, 7:45 AM
27 points

12 votes

Overall karma indicates overall quality.

6 comments3 min readLW link

[Question] What are some ways in which we can die with more dig­nity?

Chris_LeongApr 3, 2022, 5:32 AM
14 points

13 votes

Overall karma indicates overall quality.

19 comments1 min readLW link

[Question] Should we push for ban­ning mak­ing hiring de­ci­sions based on AI?

ChristianKlApr 3, 2022, 7:46 PM
10 points

7 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Bayeswatch 9.5: Rest & Relaxation

lsusrApr 4, 2022, 1:13 AM
24 points

11 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

Bayeswatch 6.5: Therapy

lsusrApr 4, 2022, 1:20 AM
15 points

6 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

The­o­ries of Mo­du­lar­ity in the Biolog­i­cal Literature

Apr 4, 2022, 12:48 PM
47 points

19 votes

Overall karma indicates overall quality.

13 comments7 min readLW link

Google’s new 540 billion pa­ram­e­ter lan­guage model

Matthew BarnettApr 4, 2022, 5:49 PM
108 points

60 votes

Overall karma indicates overall quality.

83 comments1 min readLW link
(storage.googleapis.com)

Call For Distillers

johnswentworthApr 4, 2022, 6:25 PM
192 points

87 votes

Overall karma indicates overall quality.

42 comments3 min readLW link

Is the scal­ing race fi­nally on?

p.b.Apr 4, 2022, 7:53 PM
24 points

15 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Yud­kowsky Con­tra Chris­ti­ano on AI Take­off Speeds [Linkpost]

aogApr 5, 2022, 2:09 AM
18 points

9 votes

Overall karma indicates overall quality.

0 comments11 min readLW link

[Cross-post] Half baked ideas: defin­ing and mea­sur­ing Ar­tifi­cial In­tel­li­gence sys­tem effectiveness

David JohnstonApr 5, 2022, 12:29 AM
2 points

1 vote

Overall karma indicates overall quality.

0 comments7 min readLW link

[Question] Why is Toby Ord’s like­li­hood of hu­man ex­tinc­tion due to AI so low?

ChristianKlApr 5, 2022, 12:16 PM
8 points

8 votes

Overall karma indicates overall quality.

9 comments1 min readLW link

Non-pro­gram­mers in­tro to AI for programmers

DustinApr 5, 2022, 6:12 PM
6 points

2 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

What Would A Fight Between Hu­man­ity And AGI Look Like?

johnswentworthApr 5, 2022, 8:03 PM
79 points

47 votes

Overall karma indicates overall quality.

22 comments3 min readLW link

Su­per­vise Pro­cess, not Outcomes

Apr 5, 2022, 10:18 PM
119 points

46 votes

Overall karma indicates overall quality.

8 comments10 min readLW link

AXRP Epi­sode 14 - In­fra-Bayesian Phys­i­cal­ism with Vanessa Kosoy

DanielFilanApr 5, 2022, 11:10 PM
23 points

7 votes

Overall karma indicates overall quality.

9 comments52 min readLW link

[Question] What’s the prob­lem with hav­ing an AI al­ign it­self?

FinalFormal2Apr 6, 2022, 12:59 AM
0 points

4 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

What if we stopped mak­ing GPUs for a bit?

MrPointyApr 5, 2022, 11:02 PM
−3 points

4 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Don’t die with dig­nity; in­stead play to your outs

Jeffrey LadishApr 6, 2022, 7:53 AM
243 points

136 votes

Overall karma indicates overall quality.

58 comments5 min readLW link

What I Was Think­ing About Be­fore Alignment

johnswentworthApr 6, 2022, 4:08 PM
77 points

33 votes

Overall karma indicates overall quality.

8 comments5 min readLW link

[Link] A min­i­mal vi­able product for alignment

janleikeApr 6, 2022, 3:38 PM
51 points

19 votes

Overall karma indicates overall quality.

38 comments1 min readLW link

[Link] Why I’m ex­cited about AI-as­sisted hu­man feedback

janleikeApr 6, 2022, 3:37 PM
29 points

10 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Test­ing PaLM prompts on GPT3

YitzApr 6, 2022, 5:21 AM
103 points

57 votes

Overall karma indicates overall quality.

15 comments8 min readLW link

[ASoT] Some thoughts about im­perfect world modeling

leogaoApr 7, 2022, 3:42 PM
7 points

3 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Truth­ful­ness, stan­dards and credibility

Joe CollmanApr 7, 2022, 10:31 AM
12 points

6 votes

Overall karma indicates overall quality.

2 comments32 min readLW link

What if “friendly/​un­friendly” GAI isn’t a thing?

homunqApr 7, 2022, 4:54 PM
−1 points

4 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

Pro­duc­tive Mis­takes, Not Perfect Answers

adamShimiApr 7, 2022, 4:41 PM
95 points

50 votes

Overall karma indicates overall quality.

11 comments6 min readLW link

Believ­able near-term AI disaster

DagonApr 7, 2022, 6:20 PM
8 points

9 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

How BoMAI Might fail

Donald HobsonApr 7, 2022, 3:32 PM
11 points

5 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

Deep­Mind: The Pod­cast—Ex­cerpts on AGI

WilliamKielyApr 7, 2022, 10:09 PM
75 points

40 votes

Overall karma indicates overall quality.

10 comments5 min readLW link

AI Align­ment and Recognition

Chris_LeongApr 8, 2022, 5:39 AM
7 points

6 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Re­v­erse (in­tent) al­ign­ment may al­low for safer Oracles

azsantoskApr 8, 2022, 2:48 AM
4 points

3 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

AIs should learn hu­man prefer­ences, not biases

Stuart_ArmstrongApr 8, 2022, 1:45 PM
10 points

2 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

[Question] Is there a pos­si­bil­ity that the up­com­ing scal­ing of data in lan­guage mod­els causes A.G.I.?

ArtMiApr 8, 2022, 6:56 AM
2 points

4 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Differ­ent per­spec­tives on con­cept extrapolation

Stuart_ArmstrongApr 8, 2022, 10:42 AM
42 points

13 votes

Overall karma indicates overall quality.

7 comments5 min readLW link

[RETRACTED] It’s time for EA lead­er­ship to pull the short-timelines fire alarm.

Not RelevantApr 8, 2022, 4:07 PM
112 points

134 votes

Overall karma indicates overall quality.

165 comments4 min readLW link

Con­vinc­ing All Ca­pa­bil­ity Researchers

Logan RiggsApr 8, 2022, 5:40 PM
120 points

66 votes

Overall karma indicates overall quality.

70 comments3 min readLW link

Lan­guage Model Tools for Align­ment Research

Logan RiggsApr 8, 2022, 5:32 PM
27 points

15 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

[Question] What would the cre­ation of al­igned AGI look like for us?

PerhapsApr 8, 2022, 6:05 PM
3 points

5 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Take­aways From 3 Years Work­ing In Ma­chine Learning

George3d6Apr 8, 2022, 5:14 PM
34 points

22 votes

Overall karma indicates overall quality.

10 comments11 min readLW link
(www.epistem.ink)

[Question] Can AI sys­tems have ex­tremely im­pres­sive out­puts and also not need to be al­igned be­cause they aren’t gen­eral enough or some­thing?

WilliamKielyApr 9, 2022, 6:03 AM
6 points

5 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Why In­stru­men­tal Goals are not a big AI Safety Problem

Jonathan PaulsonApr 9, 2022, 12:10 AM
0 points

16 votes

Overall karma indicates overall quality.

9 comments3 min readLW link

Emer­gent Ven­tures/​Sch­midt (new grantor for in­di­vi­d­ual re­searchers)

gwernApr 9, 2022, 2:41 PM
21 points

9 votes

Overall karma indicates overall quality.

6 comments1 min readLW link
(marginalrevolution.com)

Strate­gies for keep­ing AIs nar­row in the short term

RossinApr 9, 2022, 4:42 PM
9 points

3 votes

Overall karma indicates overall quality.

3 comments3 min readLW link

A con­crete bet offer to those with short AI timelines

Apr 9, 2022, 9:41 PM
195 points

127 votes

Overall karma indicates overall quality.

104 comments4 min readLW link

Fi­nally En­ter­ing Alignment

Ulisse MiniApr 10, 2022, 5:01 PM
75 points

39 votes

Overall karma indicates overall quality.

8 comments2 min readLW link

[Question] Does non-ac­cess to out­puts pre­vent re­cur­sive self-im­prove­ment?

Gunnar_ZarnckeApr 10, 2022, 6:37 PM
14 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

[Question] Con­vince me that hu­man­ity is as doomed by AGI as Yud­kowsky et al., seems to believe

YitzApr 10, 2022, 9:02 PM
91 points

65 votes

Overall karma indicates overall quality.

142 comments2 min readLW link

[Question] Could we set a re­s­olu­tion/​stop­per for the up­per bound of the util­ity func­tion of an AI?

FinalFormal2Apr 11, 2022, 3:10 AM
−5 points

5 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

What can peo­ple not smart/​tech­ni­cal enough for AI re­search/​AI risk work do to re­duce AI-risk/​max­i­mize AI safety? (which is most peo­ple?)

Alex K. Chen (parrot)Apr 11, 2022, 2:05 PM
7 points

8 votes

Overall karma indicates overall quality.

3 comments3 min readLW link

We should stop be­ing so con­fi­dent that AI co­or­di­na­tion is unlikely

trevorApr 11, 2022, 10:27 PM
14 points

24 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

The Reg­u­la­tory Op­tion: A re­sponse to near 0% sur­vival odds

Matthew LowensteinApr 11, 2022, 10:00 PM
45 points

43 votes

Overall karma indicates overall quality.

21 comments6 min readLW link

[Question] How can I de­ter­mine that Elicit is not some weak AGI’s at­tempt at tak­ing over the world ?

Lucie PhilipponApr 12, 2022, 12:54 AM
5 points

5 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

[Question] Three ques­tions about mesa-optimizers

Eric NeymanApr 12, 2022, 2:58 AM
23 points

10 votes

Overall karma indicates overall quality.

5 comments3 min readLW link

A Small Nega­tive Re­sult on Debate

Sam BowmanApr 12, 2022, 6:19 PM
42 points

19 votes

Overall karma indicates overall quality.

11 comments1 min readLW link

The Peerless

Tamsin LeakeApr 13, 2022, 1:07 AM
18 points

7 votes

Overall karma indicates overall quality.

2 comments1 min readLW link
(carado.moe)

Con­vinc­ing Peo­ple of Align­ment with Street Epistemology

Logan RiggsApr 12, 2022, 11:43 PM
54 points

23 votes

Overall karma indicates overall quality.

4 comments3 min readLW link

[Question] “Frag­ility of Value” vs. LLMs

Not RelevantApr 13, 2022, 2:02 AM
32 points

11 votes

Overall karma indicates overall quality.

32 comments1 min readLW link

How dath ilan co­or­di­nates around solv­ing alignment

Thomas KwaApr 13, 2022, 4:22 AM
46 points

40 votes

Overall karma indicates overall quality.

37 comments5 min readLW link

[Question] What’s a good prob­a­bil­ity dis­tri­bu­tion fam­ily (e.g. “log-nor­mal”) to use for AGI timelines?

David Scott Krueger (formerly: capybaralet)Apr 13, 2022, 4:45 AM
9 points

4 votes

Overall karma indicates overall quality.

12 comments1 min readLW link

Take­off speeds have a huge effect on what it means to work on AI x-risk

BuckApr 13, 2022, 5:38 PM
117 points

60 votes

Overall karma indicates overall quality.

25 comments2 min readLW link

De­sign, Im­ple­ment and Verify

rwallaceApr 13, 2022, 6:14 PM
32 points

13 votes

Overall karma indicates overall quality.

13 comments4 min readLW link

[Question] What to in­clude in a guest lec­ture on ex­is­ten­tial risks from AI?

Aryeh EnglanderApr 13, 2022, 5:03 PM
20 points

8 votes

Overall karma indicates overall quality.

9 comments1 min readLW link

A Quick Guide to Con­fronting Doom

RubyApr 13, 2022, 7:30 PM
224 points

105 votes

Overall karma indicates overall quality.

36 comments2 min readLW link

Ex­plor­ing toy neu­ral nets un­der node re­moval. Sec­tion 1.

Donald HobsonApr 13, 2022, 11:30 PM
12 points

6 votes

Overall karma indicates overall quality.

7 comments8 min readLW link

[Question] Un­change­able Code pos­si­ble ?

AntonTimmerApr 14, 2022, 11:17 AM
7 points

4 votes

Overall karma indicates overall quality.

9 comments1 min readLW link

How to be­come an AI safety researcher

peterbarnettApr 15, 2022, 11:41 AM
19 points

12 votes

Overall karma indicates overall quality.

0 comments14 min readLW link

Early 2022 Paper Round-up

jsteinhardtApr 14, 2022, 8:50 PM
80 points

31 votes

Overall karma indicates overall quality.

4 comments3 min readLW link
(bounded-regret.ghost.io)

[Question] Can some­one ex­plain to me why MIRI is so pes­simistic of our chances of sur­vival?

iamthouthouartiApr 14, 2022, 8:28 PM
10 points

6 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

Pivotal acts from Math AIs

azsantoskApr 15, 2022, 12:25 AM
10 points

6 votes

Overall karma indicates overall quality.

4 comments5 min readLW link

Refine: An In­cu­ba­tor for Con­cep­tual Align­ment Re­search Bets

adamShimiApr 15, 2022, 8:57 AM
123 points

58 votes

Overall karma indicates overall quality.

13 comments4 min readLW link

My least fa­vorite thing

sudoApr 14, 2022, 10:33 PM
41 points

54 votes

Overall karma indicates overall quality.

30 comments3 min readLW link

[Question] Con­strain­ing nar­row AI in a cor­po­rate setting

MaximumLibertyApr 15, 2022, 10:36 PM
28 points

5 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Pop Cul­ture Align­ment Re­search and Taxes

JanApr 16, 2022, 3:45 PM
16 points

7 votes

Overall karma indicates overall quality.

14 comments11 min readLW link
(universalprior.substack.com)

Org an­nounce­ment: [AC]RC

Vivek HebbarApr 17, 2022, 5:24 PM
79 points

63 votes

Overall karma indicates overall quality.

12 comments1 min readLW link

Code Gen­er­a­tion as an AI risk setting

Not RelevantApr 17, 2022, 10:27 PM
91 points

39 votes

Overall karma indicates overall quality.

16 comments2 min readLW link

Men­tal Health and the Align­ment Prob­lem: A Com­pila­tion of Resources

Chris ScammellApr 18, 2022, 6:36 PM
139 points

64 votes

Overall karma indicates overall quality.

7 comments17 min readLW link

Is “Con­trol” of a Su­per­in­tel­li­gence Pos­si­ble?

Mahdi ComplexApr 18, 2022, 4:03 PM
9 points

4 votes

Overall karma indicates overall quality.

14 comments1 min readLW link

[Closed] Hiring a math­e­mat­i­cian to work on the learn­ing-the­o­retic AI al­ign­ment agenda

Vanessa KosoyApr 19, 2022, 6:44 AM
84 points

39 votes

Overall karma indicates overall quality.

21 comments2 min readLW link

[Question] The two miss­ing core rea­sons why al­ign­ing at-least-par­tially su­per­hu­man AGI is hard

Joel BurgetApr 19, 2022, 5:15 PM
7 points

3 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

[Question] How does the world look like 10 years af­ter we have de­ployed an al­igned AGI?

mukashiApr 19, 2022, 11:34 AM
4 points

6 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

[Question] Clar­ifi­ca­tion on Defi­ni­tion of AGI

stanislawApr 19, 2022, 12:41 PM
0 points

2 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

[Question] What’s the Re­la­tion­ship Between “Hu­man Values” and the Brain’s Re­ward Sys­tem?

intersticeApr 19, 2022, 5:15 AM
36 points

14 votes

Overall karma indicates overall quality.

16 comments1 min readLW link

De­cep­tive Agents are a Good Way to Do Things

David UdellApr 19, 2022, 6:04 PM
15 points

7 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

The Scale Prob­lem in AI

tailcalledApr 19, 2022, 5:46 PM
22 points

12 votes

Overall karma indicates overall quality.

17 comments3 min readLW link

Con­cept ex­trap­o­la­tion: key posts

Stuart_ArmstrongApr 19, 2022, 10:01 AM
12 points

3 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

“Pivotal Act” In­ten­tions: Nega­tive Con­se­quences and Fal­la­cious Arguments

Andrew_CritchApr 19, 2022, 8:25 PM
96 points

72 votes

Overall karma indicates overall quality.

56 comments7 min readLW link

GPT-3 and con­cept extrapolation

Stuart_ArmstrongApr 20, 2022, 10:39 AM
19 points

6 votes

Overall karma indicates overall quality.

28 comments1 min readLW link

[In­tro to brain-like-AGI safety] 12. Two paths for­ward: “Con­trol­led AGI” and “So­cial-in­stinct AGI”

Steven ByrnesApr 20, 2022, 12:58 PM
33 points

14 votes

Overall karma indicates overall quality.

10 comments16 min readLW link

Pr­ereg­is­tra­tion: Air Con­di­tioner Test

johnswentworthApr 21, 2022, 7:48 PM
109 points

48 votes

Overall karma indicates overall quality.

64 comments9 min readLW link

[Question] Choice := An­throp­ics un­cer­tainty? And po­ten­tial im­pli­ca­tions for agency

Antoine de ScorrailleApr 21, 2022, 4:38 PM
5 points

3 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Un­der­stand­ing the Merg­ing of Opinions with In­creas­ing In­for­ma­tion theorem

ViktoriaMalyasovaApr 21, 2022, 2:13 PM
13 points

7 votes

Overall karma indicates overall quality.

1 comment5 min readLW link

Early 2022 Paper Round-up (Part 2)

jsteinhardtApr 21, 2022, 11:40 PM
10 points

5 votes

Overall karma indicates overall quality.

0 comments5 min readLW link
(bounded-regret.ghost.io)

[Question] What are the num­bers in mind for the su­per-short AGI timelines so many long-ter­mists are alarmed about?

Evan_GaensbauerApr 21, 2022, 11:32 PM
22 points

9 votes

Overall karma indicates overall quality.

14 comments1 min readLW link

AI Will Multiply

harsimonyApr 22, 2022, 4:33 AM
13 points

7 votes

Overall karma indicates overall quality.

4 comments1 min readLW link
(harsimony.wordpress.com)

Hu­man­ity as an en­tity: An al­ter­na­tive to Co­her­ent Ex­trap­o­lated Volition

Victor NovikovApr 22, 2022, 12:48 PM
0 points

5 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

[ASoT] Con­se­quen­tial­ist mod­els as a su­per­set of mesaoptimizers

leogaoApr 23, 2022, 5:57 PM
36 points

17 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

Skil­ling-up in ML Eng­ineer­ing for Align­ment: re­quest for comments

Apr 23, 2022, 3:11 PM
19 points

15 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

[Question] Want­ing to change what you want

MithrandirApr 23, 2022, 4:23 AM
−1 points

6 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Progress Re­port 5: ty­ing it together

Nathan Helm-BurgerApr 23, 2022, 9:07 PM
10 points

3 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Cal­ling for Stu­dent Sub­mis­sions: AI Safety Distil­la­tion Contest

ArisApr 24, 2022, 1:53 AM
48 points

24 votes

Overall karma indicates overall quality.

15 comments4 min readLW link

Ex­am­in­ing Evolu­tion as an Up­per Bound for AGI Timelines

meanderingmooseApr 24, 2022, 7:08 PM
5 points

3 votes

Overall karma indicates overall quality.

1 comment9 min readLW link

AI safety rais­ing aware­ness re­sources bleg

iivonenApr 24, 2022, 5:13 PM
6 points

5 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

In­tu­itions about solv­ing hard problems

Richard_NgoApr 25, 2022, 3:29 PM
92 points

43 votes

Overall karma indicates overall quality.

23 comments6 min readLW link

[Re­quest for Distil­la­tion] Co­her­ence of Distributed De­ci­sions With Differ­ent In­puts Im­plies Conditioning

johnswentworthApr 25, 2022, 5:01 PM
22 points

9 votes

Overall karma indicates overall quality.

14 comments2 min readLW link

dalle2 comments

nostalgebraistApr 26, 2022, 5:30 AM
183 points

86 votes

Overall karma indicates overall quality.

13 comments13 min readLW link
(nostalgebraist.tumblr.com)

Make a neu­ral net­work in ~10 minutes

Arjun YadavApr 26, 2022, 5:24 AM
8 points

7 votes

Overall karma indicates overall quality.

0 comments4 min readLW link
(arjunyadav.net)

Law-Fol­low­ing AI 1: Se­quence In­tro­duc­tion and Structure

CullenApr 27, 2022, 5:26 PM
16 points

11 votes

Overall karma indicates overall quality.

10 comments9 min readLW link

Law-Fol­low­ing AI 2: In­tent Align­ment + Su­per­in­tel­li­gence → Lawless AI (By De­fault)

CullenApr 27, 2022, 5:27 PM
5 points

3 votes

Overall karma indicates overall quality.

2 comments6 min readLW link

Law-Fol­low­ing AI 3: Lawless AI Agents Un­der­mine Sta­bi­liz­ing Agreements

CullenApr 27, 2022, 5:30 PM
2 points

1 vote

Overall karma indicates overall quality.

2 comments3 min readLW link

If you’re very op­ti­mistic about ELK then you should be op­ti­mistic about outer alignment

Sam MarksApr 27, 2022, 7:30 PM
17 points

12 votes

Overall karma indicates overall quality.

8 comments3 min readLW link

AI Alter­na­tive Fu­tures: Sce­nario Map­ping Ar­tifi­cial In­tel­li­gence Risk—Re­quest for Par­ti­ci­pa­tion (*Closed*)

KakiliApr 27, 2022, 10:07 PM
10 points

5 votes

Overall karma indicates overall quality.

2 comments8 min readLW link

The Speed + Sim­plic­ity Prior is prob­a­bly anti-deceptive

Yonadav ShavitApr 27, 2022, 7:30 PM
30 points

10 votes

Overall karma indicates overall quality.

29 comments12 min readLW link

Slides: Po­ten­tial Risks From Ad­vanced AI

Aryeh EnglanderApr 28, 2022, 2:15 AM
7 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

How Might an Align­ment At­trac­tor Look like?

ShmiApr 28, 2022, 6:46 AM
47 points

18 votes

Overall karma indicates overall quality.

15 comments2 min readLW link

Naive com­ments on AGIlignment

EricfApr 28, 2022, 1:08 AM
2 points

4 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

[Question] Is al­ign­ment pos­si­ble?

ShayApr 28, 2022, 9:18 PM
0 points

3 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Learn­ing the smooth prior

Apr 29, 2022, 9:10 PM
31 points

9 votes

Overall karma indicates overall quality.

0 comments12 min readLW link

[Linkpost] New multi-modal Deep­mind model fus­ing Chin­chilla with images and videos

p.b.Apr 30, 2022, 3:47 AM
53 points

29 votes

Overall karma indicates overall quality.

18 comments1 min readLW link

Note-Tak­ing with­out Hid­den Messages

HoagyApr 30, 2022, 11:15 AM
7 points

4 votes

Overall karma indicates overall quality.

1 comment4 min readLW link

[Question] Why hasn’t deep learn­ing gen­er­ated sig­nifi­cant eco­nomic value yet?

Alex_AltairApr 30, 2022, 8:27 PM
112 points

63 votes

Overall karma indicates overall quality.

95 comments2 min readLW link

What is the solu­tion to the Align­ment prob­lem?

AlgonApr 30, 2022, 11:19 PM
24 points

18 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

[Linkpost] Value ex­trac­tion via lan­guage model abduction

Paul BricmanMay 1, 2022, 7:11 PM
4 points

3 votes

Overall karma indicates overall quality.

3 comments1 min readLW link
(paulbricman.com)

ELK shaving

Miss Aligned AIMay 1, 2022, 9:05 PM
6 points

9 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

So has AI con­quered Bridge ?

Ponder StibbonsMay 2, 2022, 3:01 PM
16 points

9 votes

Overall karma indicates overall quality.

2 comments14 min readLW link

In­for­ma­tion se­cu­rity con­sid­er­a­tions for AI and the long term future

May 2, 2022, 8:54 PM
74 points

31 votes

Overall karma indicates overall quality.

6 comments10 min readLW link

Is evolu­tion­ary in­fluence the mesa ob­jec­tive that we’re in­ter­ested in?

David JohnstonMay 3, 2022, 1:18 AM
3 points

2 votes

Overall karma indicates overall quality.

2 comments5 min readLW link

Var­i­ous Align­ment Strate­gies (and how likely they are to work)

Logan ZoellnerMay 3, 2022, 4:54 PM
73 points

34 votes

Overall karma indicates overall quality.

34 comments11 min readLW link

In­tro­duc­ing the ML Safety Schol­ars Program

May 4, 2022, 4:01 PM
73 points

34 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

Franken­stein: A Modern AGI

SableMay 5, 2022, 4:16 PM
9 points

7 votes

Overall karma indicates overall quality.

10 comments9 min readLW link

[Question] What is bias in al­ign­ment terms?

Jonas KgomoMay 4, 2022, 9:35 PM
0 points

2 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Ethan Ca­ballero on Pri­vate Scal­ing Progress

Michaël TrazziMay 5, 2022, 6:32 PM
62 points

31 votes

Overall karma indicates overall quality.

1 comment2 min readLW link
(theinsideview.github.io)

Ap­ply to the sec­ond iter­a­tion of the ML for Align­ment Boot­camp (MLAB 2) in Berkeley [Aug 15 - Fri Sept 2]

BuckMay 6, 2022, 4:23 AM
68 points

24 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

The case for be­com­ing a black-box in­ves­ti­ga­tor of lan­guage models

BuckMay 6, 2022, 2:35 PM
118 points

67 votes

Overall karma indicates overall quality.

19 comments3 min readLW link

Get­ting GPT-3 to pre­dict Me­tac­u­lus questions

MathiasKBMay 6, 2022, 6:01 AM
68 points

34 votes

Overall karma indicates overall quality.

8 comments2 min readLW link

But What’s Your *New Align­ment In­sight,* out of a Fu­ture-Text­book Para­graph?

David UdellMay 7, 2022, 3:10 AM
24 points

14 votes

Overall karma indicates overall quality.

18 comments5 min readLW link

Video and Tran­script of Pre­sen­ta­tion on Ex­is­ten­tial Risk from Power-Seek­ing AI

Joe CarlsmithMay 8, 2022, 3:50 AM
20 points

5 votes

Overall karma indicates overall quality.

1 comment29 min readLW link

A Bird’s Eye View of the ML Field [Prag­matic AI Safety #2]

May 9, 2022, 5:18 PM
126 points

55 votes

Overall karma indicates overall quality.

5 comments35 min readLW link

In­tro­duc­tion to Prag­matic AI Safety [Prag­matic AI Safety #1]

May 9, 2022, 5:06 PM
70 points

36 votes

Overall karma indicates overall quality.

1 comment6 min readLW link

Jobs: Help scale up LM al­ign­ment re­search at NYU

Sam BowmanMay 9, 2022, 2:12 PM
60 points

15 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

When is AI safety re­search harm­ful?

NathanBarnardMay 9, 2022, 6:19 PM
2 points

2 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

AI Align­ment YouTube Playlists

May 9, 2022, 9:33 PM
29 points

15 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Ex­am­in­ing Arm­strong’s cat­e­gory of gen­er­al­ized models

Morgan_RogersMay 10, 2022, 9:07 AM
14 points

9 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

An In­side View of AI Alignment

Ansh RadhakrishnanMay 11, 2022, 2:16 AM
31 points

23 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

[Question] What are your recom­men­da­tions for tech­ni­cal AI al­ign­ment pod­casts?

Evan_GaensbauerMay 11, 2022, 9:52 PM
5 points

2 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Deep­mind’s Gato: Gen­er­al­ist Agent

Daniel KokotajloMay 12, 2022, 4:01 PM
164 points

102 votes

Overall karma indicates overall quality.

61 comments1 min readLW link

“A Gen­er­al­ist Agent”: New Deep­Mind Publication

1a3ornMay 12, 2022, 3:30 PM
79 points

35 votes

Overall karma indicates overall quality.

43 comments1 min readLW link

A ten­ta­tive di­alogue with a Friendly-boxed-su­per-AGI on brain uploads

Ramiro P.May 12, 2022, 7:40 PM
1 point

9 votes

Overall karma indicates overall quality.

12 comments4 min readLW link

Pos­i­tive out­comes un­der an un­al­igned AGI takeover

YitzMay 12, 2022, 7:45 AM
19 points

17 votes

Overall karma indicates overall quality.

12 comments3 min readLW link

The Last Paperclip

Logan ZoellnerMay 12, 2022, 7:25 PM
57 points

41 votes

Overall karma indicates overall quality.

15 comments17 min readLW link

RLHF

Ansh RadhakrishnanMay 12, 2022, 9:18 PM
16 points

7 votes

Overall karma indicates overall quality.

5 comments5 min readLW link

[Question] What to do when start­ing a busi­ness in an im­mi­nent-AGI world?

ryan_bMay 12, 2022, 9:07 PM
25 points

13 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

Deep­Mind is hiring for the Scal­able Align­ment and Align­ment Teams

May 13, 2022, 12:17 PM
145 points

61 votes

Overall karma indicates overall quality.

35 comments9 min readLW link

“Tech com­pany sin­gu­lar­i­ties”, and steer­ing them to re­duce x-risk

Andrew_CritchMay 13, 2022, 5:24 PM
73 points

36 votes

Overall karma indicates overall quality.

12 comments4 min readLW link

Against Time in Agent Models

johnswentworthMay 13, 2022, 7:55 PM
50 points

26 votes

Overall karma indicates overall quality.

12 comments3 min readLW link

Frame for Take-Off Speeds to in­form com­pute gov­er­nance & scal­ing alignment

Logan RiggsMay 13, 2022, 10:23 PM
15 points

7 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

Align­ment as Constraints

Logan RiggsMay 13, 2022, 10:07 PM
10 points

5 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Fermi es­ti­ma­tion of the im­pact you might have work­ing on AI safety

Fabien RogerMay 13, 2022, 5:49 PM
6 points

5 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

An ob­ser­va­tion about Hub­inger et al.’s frame­work for learned optimization

carboniferous_umbraculum May 13, 2022, 4:20 PM
33 points

12 votes

Overall karma indicates overall quality.

9 comments8 min readLW link

Thoughts on AI Safety Camp

Charlie SteinerMay 13, 2022, 7:16 AM
24 points

13 votes

Overall karma indicates overall quality.

7 comments7 min readLW link

Clar­ify­ing the con­fu­sion around in­ner alignment

Rauno ArikeMay 13, 2022, 11:05 PM
27 points

12 votes

Overall karma indicates overall quality.

0 comments11 min readLW link

[Link post] Promis­ing Paths to Align­ment—Con­nor Leahy | Talk

frances_lorenzMay 14, 2022, 4:01 PM
34 points

24 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

The AI Count­down Clock

River LewisMay 15, 2022, 6:37 PM
40 points

38 votes

Overall karma indicates overall quality.

27 comments2 min readLW link
(heytraveler.substack.com)

Sur­viv­ing Au­toma­tion In The 21st Cen­tury—Part 1

George3d6May 15, 2022, 7:16 PM
27 points

14 votes

Overall karma indicates overall quality.

17 comments8 min readLW link
(www.epistem.ink)

Why I’m Op­ti­mistic About Near-Term AI Risk

harsimonyMay 15, 2022, 11:05 PM
57 points

44 votes

Overall karma indicates overall quality.

28 comments1 min readLW link

Op­ti­miza­tion at a Distance

johnswentworthMay 16, 2022, 5:58 PM
78 points

34 votes

Overall karma indicates overall quality.

13 comments4 min readLW link

[Question] To what ex­tent is your AGI timeline bi­modal or oth­er­wise “bumpy”?

jchanMay 16, 2022, 5:42 PM
13 points

5 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Proxy mis­speci­fi­ca­tion and the ca­pa­bil­ities vs. value learn­ing race

Sam MarksMay 16, 2022, 6:58 PM
19 points

7 votes

Overall karma indicates overall quality.

1 comment4 min readLW link

How to in­vest in ex­pec­ta­tion of AGI?

JakobovskiMay 17, 2022, 11:03 AM
3 points

5 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

[In­tro to brain-like-AGI safety] 15. Con­clu­sion: Open prob­lems, how to help, AMA

Steven ByrnesMay 17, 2022, 3:11 PM
81 points

32 votes

Overall karma indicates overall quality.

11 comments14 min readLW link

Ac­tion­able-guidance and roadmap recom­men­da­tions for the NIST AI Risk Man­age­ment Framework

May 17, 2022, 3:26 PM
25 points

10 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

What are the pos­si­ble tra­jec­to­ries of an AGI/​ASI world?

JakobovskiMay 17, 2022, 1:28 PM
0 points

6 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Max­ent and Ab­strac­tions: Cur­rent Best Arguments

johnswentworthMay 18, 2022, 7:54 PM
34 points

8 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

How to get into AI safety research

Stuart_ArmstrongMay 18, 2022, 6:05 PM
44 points

29 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

A bridge to Dath Ilan? Im­proved gov­er­nance on the crit­i­cal path to AI al­ign­ment.

Jackson WagnerMay 18, 2022, 3:51 PM
23 points

9 votes

Overall karma indicates overall quality.

0 comments11 min readLW link

We have achieved Noob Gains in AI

phdeadMay 18, 2022, 8:56 PM
114 points

68 votes

Overall karma indicates overall quality.

21 comments7 min readLW link

[Question] Why does gra­di­ent de­scent always work on neu­ral net­works?

MichaelDickensMay 20, 2022, 9:13 PM
15 points

6 votes

Overall karma indicates overall quality.

11 comments1 min readLW link

How RL Agents Be­have When Their Ac­tions Are Mod­ified? [Distil­la­tion post]

PabloAMCMay 20, 2022, 6:47 PM
21 points

11 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

Over-digi­tal­iza­tion: A Pre­lude to Analo­gia (Chap­ter 6)

Justin BullockMay 20, 2022, 4:39 PM
3 points

2 votes

Overall karma indicates overall quality.

0 comments13 min readLW link

Clar­ify­ing what ELK is try­ing to achieve

Towards_KeeperhoodMay 21, 2022, 7:34 AM
7 points

3 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

[Short ver­sion] In­for­ma­tion Loss --> Basin flatness

Vivek HebbarMay 21, 2022, 12:59 PM
11 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

In­for­ma­tion Loss --> Basin flatness

Vivek HebbarMay 21, 2022, 12:58 PM
47 points

27 votes

Overall karma indicates overall quality.

31 comments7 min readLW link

What kinds of al­gorithms do multi-hu­man imi­ta­tors learn?

May 22, 2022, 2:27 PM
20 points

6 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

Are hu­man imi­ta­tors su­per­hu­man mod­els with ex­plicit con­straints on ca­pa­bil­ities?

Chris van MerwijkMay 22, 2022, 12:46 PM
41 points

19 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Ad­ver­sar­ial at­tacks and op­ti­mal control

JanMay 22, 2022, 6:22 PM
16 points

8 votes

Overall karma indicates overall quality.

7 comments8 min readLW link
(universalprior.substack.com)

CNN fea­ture vi­su­al­iza­tion in 50 lines of code

StefanHexMay 26, 2022, 11:02 AM
17 points

11 votes

Overall karma indicates overall quality.

4 comments5 min readLW link

[Question] [Align­ment] Is there a cen­sus on who’s work­ing on what?

CedarMay 23, 2022, 3:33 PM
23 points

14 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

AXRP Epi­sode 15 - Nat­u­ral Ab­strac­tions with John Wentworth

DanielFilanMay 23, 2022, 5:40 AM
32 points

13 votes

Overall karma indicates overall quality.

1 comment57 min readLW link

Why I’m Wor­ried About AI

peterbarnettMay 23, 2022, 9:13 PM
21 points

13 votes

Overall karma indicates overall quality.

2 comments12 min readLW link

Com­plex Sys­tems for AI Safety [Prag­matic AI Safety #3]

May 24, 2022, 12:00 AM
49 points

28 votes

Overall karma indicates overall quality.

2 comments21 min readLW link

The No Free Lunch the­o­rems and their Razor

Adrià Garriga-alonsoMay 24, 2022, 6:40 AM
47 points

24 votes

Overall karma indicates overall quality.

3 comments9 min readLW link

Google’s Ima­gen uses larger text encoder

Ben LivengoodMay 24, 2022, 9:55 PM
27 points

14 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

au­ton­omy: the miss­ing AGI in­gre­di­ent?

nostalgebraistMay 25, 2022, 12:33 AM
61 points

29 votes

Overall karma indicates overall quality.

13 comments6 min readLW link

Paper: Teach­ing GPT3 to ex­press un­cer­tainty in words

Owain_EvansMay 31, 2022, 1:27 PM
96 points

45 votes

Overall karma indicates overall quality.

7 comments4 min readLW link

Croe­sus, Cer­berus, and the mag­pies: a gen­tle in­tro­duc­tion to Elic­it­ing La­tent Knowledge

Alexandre VariengienMay 27, 2022, 5:58 PM
14 points

10 votes

Overall karma indicates overall quality.

0 comments16 min readLW link

[Question] How much white col­lar work could be au­to­mated us­ing ex­ist­ing ML mod­els?

AMMay 26, 2022, 8:09 AM
25 points

15 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

The Poin­t­ers Prob­lem—Distilled

Nina PanicksseryMay 26, 2022, 10:44 PM
9 points

7 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Iter­ated Distil­la­tion-Am­plifi­ca­tion, Gato, and Proto-AGI [Re-Ex­plained]

Gabe MMay 27, 2022, 5:42 AM
21 points

13 votes

Overall karma indicates overall quality.

4 comments6 min readLW link

Boot­strap­ping Lan­guage Models

harsimonyMay 27, 2022, 7:43 PM
7 points

5 votes

Overall karma indicates overall quality.

5 comments2 min readLW link

Un­der­stand­ing Selec­tion Theorems

adamkMay 28, 2022, 1:49 AM
35 points

14 votes

Overall karma indicates overall quality.

3 comments7 min readLW link

[Question] What have been the ma­jor “triumphs” in the field of AI over the last ten years?

lcMay 28, 2022, 7:49 PM
35 points

13 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

[Question] Bayesian Per­sua­sion?

Karthik TadepalliMay 28, 2022, 5:52 PM
8 points

6 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Distributed Decisions

johnswentworthMay 29, 2022, 2:43 AM
65 points

22 votes

Overall karma indicates overall quality.

4 comments6 min readLW link

The Prob­lem With The Cur­rent State of AGI Definitions

YitzMay 29, 2022, 1:58 PM
40 points

19 votes

Overall karma indicates overall quality.

22 comments8 min readLW link

Func­tional Anal­y­sis Read­ing Group

Ulisse MiniMay 28, 2022, 2:40 AM
4 points

4 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

[Question] Im­pact of ” ‘Let’s think step by step’ is all you need”?

yrimonJul 24, 2022, 8:59 PM
20 points

12 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Perform Tractable Re­search While Avoid­ing Ca­pa­bil­ities Ex­ter­nal­ities [Prag­matic AI Safety #4]

May 30, 2022, 8:25 PM
43 points

23 votes

Overall karma indicates overall quality.

3 comments25 min readLW link

[Question] What is the state of Chi­nese AI re­search?

RatiosMay 31, 2022, 10:05 AM
34 points

18 votes

Overall karma indicates overall quality.

17 comments1 min readLW link

The Brain That Builds Itself

JanMay 31, 2022, 9:42 AM
55 points

29 votes

Overall karma indicates overall quality.

6 comments8 min readLW link
(universalprior.substack.com)

Machines vs. Memes 2: Memet­i­cally-Mo­ti­vated Model Extensions

naterushMay 31, 2022, 10:03 PM
4 points

3 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Machines vs Memes Part 3: Imi­ta­tion and Memes

ceru23Jun 1, 2022, 1:36 PM
5 points

3 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

Paradigms of AI al­ign­ment: com­po­nents and enablers

VikaJun 2, 2022, 6:19 AM
48 points

15 votes

Overall karma indicates overall quality.

4 comments8 min readLW link

The Bio An­chors Forecast

Ansh RadhakrishnanJun 2, 2022, 1:32 AM
12 points

6 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

[MLSN #4]: Many New In­ter­pretabil­ity Papers, Vir­tual Logit Match­ing, Ra­tion­al­iza­tion Helps Robustness

Dan HJun 3, 2022, 1:20 AM
18 points

10 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

The pro­to­typ­i­cal catas­trophic AI ac­tion is get­ting root ac­cess to its datacenter

BuckJun 2, 2022, 11:46 PM
142 points

69 votes

Overall karma indicates overall quality.

10 comments2 min readLW link

Ad­ver­sar­ial train­ing, im­por­tance sam­pling, and anti-ad­ver­sar­ial train­ing for AI whistleblowing

BuckJun 2, 2022, 11:48 PM
33 points

14 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

Deep Learn­ing Sys­tems Are Not Less In­ter­pretable Than Logic/​Prob­a­bil­ity/​Etc

johnswentworthJun 4, 2022, 5:41 AM
118 points

77 votes

Overall karma indicates overall quality.

52 comments2 min readLW link

How to pur­sue a ca­reer in tech­ni­cal AI alignment

Charlie Rogers-SmithJun 4, 2022, 9:11 PM
63 points

36 votes

Overall karma indicates overall quality.

0 comments39 min readLW link

Noisy en­vi­ron­ment reg­u­late util­ity maximizers

Niclas KupperJun 5, 2022, 6:48 PM
4 points

3 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

Why agents are powerful

Daniel KokotajloJun 6, 2022, 1:37 AM
35 points

16 votes

Overall karma indicates overall quality.

7 comments7 min readLW link

Why do some peo­ple try to make AGI?

TekhneMakreJun 6, 2022, 9:14 AM
14 points

8 votes

Overall karma indicates overall quality.

7 comments3 min readLW link

Some ideas for fol­low-up pro­jects to Red­wood Re­search’s re­cent paper

JanBJun 6, 2022, 1:29 PM
10 points

3 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

Read­ing the ethi­cists 2: Hunt­ing for AI al­ign­ment papers

Charlie SteinerJun 6, 2022, 3:49 PM
21 points

10 votes

Overall karma indicates overall quality.

1 comment7 min readLW link

DALL-E 2 - Unoffi­cial Nat­u­ral Lan­guage Image Edit­ing, Art Cri­tique Survey

bakztfutureJun 6, 2022, 6:27 PM
0 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(bakztfuture.substack.com)

Think­ing about Broad Classes of Utility-like Functions

J BostockJun 7, 2022, 2:05 PM
7 points

2 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Thoughts on For­mal­iz­ing Composition

Tom LieberumJun 7, 2022, 7:51 AM
13 points

9 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

“Pivotal Acts” means some­thing specific

RaemonJun 7, 2022, 9:56 PM
114 points

44 votes

Overall karma indicates overall quality.

23 comments2 min readLW link

Why I don’t be­lieve in doom

mukashiJun 7, 2022, 11:49 PM
6 points

32 votes

Overall karma indicates overall quality.

30 comments4 min readLW link

[Question] Has any­one ac­tu­ally tried to con­vince Terry Tao or other top math­e­mat­i­ci­ans to work on al­ign­ment?

P.Jun 8, 2022, 10:26 PM
52 points

33 votes

Overall karma indicates overall quality.

49 comments4 min readLW link

To­day in AI Risk His­tory: The Ter­mi­na­tor (1984 film) was re­leased.

ImpassionataJun 9, 2022, 1:32 AM
−3 points

7 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

There’s prob­a­bly a trade­off be­tween AI ca­pa­bil­ity and safety, and we should act like it

David JohnstonJun 9, 2022, 12:17 AM
3 points

7 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

AI Could Defeat All Of Us Combined

HoldenKarnofskyJun 9, 2022, 3:50 PM
168 points

63 votes

Overall karma indicates overall quality.

29 comments17 min readLW link
(www.cold-takes.com)

[Question] If there was a mil­len­nium equiv­a­lent prize for AI al­ign­ment, what would the prob­lems be?

Yair HalberstadtJun 9, 2022, 4:56 PM
17 points

9 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

[Linkpost & Dis­cus­sion] AI Trained on 4Chan Be­comes ‘Hate Speech Ma­chine’ [and out­performs GPT-3 on Truth­fulQA Bench­mark?!]

YitzJun 9, 2022, 10:59 AM
16 points

9 votes

Overall karma indicates overall quality.

5 comments2 min readLW link
(www.vice.com)

If no near-term al­ign­ment strat­egy, re­search should aim for the long-term

harsimonyJun 9, 2022, 7:10 PM
7 points

3 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

How Do Selec­tion The­o­rems Re­late To In­ter­pretabil­ity?

johnswentworthJun 9, 2022, 7:39 PM
57 points

25 votes

Overall karma indicates overall quality.

14 comments3 min readLW link

Bureau­cracy of AIs

Logan ZoellnerJun 9, 2022, 11:03 PM
11 points

6 votes

Overall karma indicates overall quality.

6 comments14 min readLW link

Tao, Kont­se­vich & oth­ers on HLAI in Math

intersticeJun 10, 2022, 2:25 AM
41 points

20 votes

Overall karma indicates overall quality.

5 comments2 min readLW link
(www.youtube.com)

Open Prob­lems in AI X-Risk [PAIS #5]

Jun 10, 2022, 2:08 AM
50 points

19 votes

Overall karma indicates overall quality.

3 comments36 min readLW link

[Question] why as­sume AGIs will op­ti­mize for fixed goals?

nostalgebraistJun 10, 2022, 1:28 AM
119 points

55 votes

Overall karma indicates overall quality.

52 comments4 min readLW link

Progress Re­port 6: get the tool working

Nathan Helm-BurgerJun 10, 2022, 11:18 AM
4 points

1 vote

Overall karma indicates overall quality.

0 comments2 min readLW link

Another plau­si­ble sce­nario of AI risk: AI builds mil­i­tary in­fras­truc­ture while col­lab­o­rat­ing with hu­mans, defects later.

avturchinJun 10, 2022, 5:24 PM
10 points

5 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

[Question] Is AI Align­ment Im­pos­si­ble?

HeighnJun 10, 2022, 10:08 AM
3 points

6 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

How dan­ger­ous is hu­man-level AI?

Alex_AltairJun 10, 2022, 5:38 PM
21 points

9 votes

Overall karma indicates overall quality.

4 comments8 min readLW link

[linkpost] The fi­nal AI bench­mark: BIG-bench

RomanSJun 10, 2022, 8:53 AM
30 points

19 votes

Overall karma indicates overall quality.

19 comments1 min readLW link

[Question] Could Pa­tent-Trol­ling de­lay AI timelines?

Pablo RepettoJun 10, 2022, 2:53 AM
1 point

4 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

How fast can we perform a for­ward pass?

jsteinhardtJun 10, 2022, 11:30 PM
53 points

19 votes

Overall karma indicates overall quality.

9 comments15 min readLW link
(bounded-regret.ghost.io)

Steganog­ra­phy and the Cy­cleGAN—al­ign­ment failure case study

Jan CzechowskiJun 11, 2022, 9:41 AM
28 points

17 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

AGI Safety Com­mu­ni­ca­tions Initiative

inesJun 11, 2022, 5:34 PM
7 points

5 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

[Question] How much stupi­der than hu­mans can AI be and still kill us all through sheer num­bers and re­source ac­cess?

ShmiJun 12, 2022, 1:01 AM
11 points

3 votes

Overall karma indicates overall quality.

12 comments1 min readLW link

A claim that Google’s LaMDA is sentient

Ben LivengoodJun 12, 2022, 4:18 AM
31 points

27 votes

Overall karma indicates overall quality.

134 comments1 min readLW link

Let’s not name spe­cific AI labs in an ad­ver­sar­ial context

acylhalideJun 12, 2022, 5:38 PM
8 points

20 votes

Overall karma indicates overall quality.

17 comments1 min readLW link

[Question] How much does cy­ber­se­cu­rity re­duce AI risk?

DarmaniJun 12, 2022, 10:13 PM
34 points

16 votes

Overall karma indicates overall quality.

23 comments1 min readLW link

[Question] How are com­pute as­sets dis­tributed in the world?

Chris van MerwijkJun 12, 2022, 10:13 PM
29 points

12 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

The beau­tiful mag­i­cal en­chanted golden Dall-e Mini is underrated

p.b.Jun 13, 2022, 7:58 AM
14 points

7 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Why so lit­tle AI risk on ra­tio­nal­ist-ad­ja­cent blogs?

Grant DemareeJun 13, 2022, 6:31 AM
46 points

27 votes

Overall karma indicates overall quality.

23 comments8 min readLW link

[Question] What’s the “This AI is of moral con­cern.” fire alarm?

Quintin PopeJun 13, 2022, 8:05 AM
37 points

21 votes

Overall karma indicates overall quality.

56 comments2 min readLW link

On A List of Lethalities

ZviJun 13, 2022, 12:30 PM
154 points

72 votes

Overall karma indicates overall quality.

48 comments54 min readLW link
(thezvi.wordpress.com)

[Question] Can you MRI a deep learn­ing model?

Yair HalberstadtJun 13, 2022, 1:43 PM
3 points

2 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

What are some smaller-but-con­crete challenges re­lated to AI safety that are im­pact­ing peo­ple to­day?

nonzerosumJun 13, 2022, 5:36 PM
3 points

3 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Con­ti­nu­ity Assumptions

Jan_KulveitJun 13, 2022, 9:31 PM
26 points

21 votes

Overall karma indicates overall quality.

13 comments4 min readLW link

Crypto-fed Computation

aaguirreJun 13, 2022, 9:20 PM
22 points

13 votes

Overall karma indicates overall quality.

7 comments7 min readLW link

Blake Richards on Why he is Skep­ti­cal of Ex­is­ten­tial Risk from AI

Michaël TrazziJun 14, 2022, 7:09 PM
41 points

19 votes

Overall karma indicates overall quality.

12 comments4 min readLW link
(theinsideview.ai)

I ap­plied for a MIRI job in 2020. Here’s what hap­pened next.

ViktoriaMalyasovaJun 15, 2022, 7:37 PM
78 points

47 votes

Overall karma indicates overall quality.

17 comments7 min readLW link

[Question] What are all the AI Align­ment and AI Safety Com­mu­ni­ca­tion Hubs?

Gunnar_ZarnckeJun 15, 2022, 4:16 PM
25 points

14 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

[Question] Has there been any work on at­tempt­ing to use Pas­cal’s Mug­ging to make an AGI be­have?

Chris_LeongJun 15, 2022, 8:33 AM
7 points

3 votes

Overall karma indicates overall quality.

17 comments1 min readLW link

Will vague “AI sen­tience” con­cerns do more for AI safety than any­thing else we might do?

Aryeh EnglanderJun 14, 2022, 11:53 PM
12 points

6 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

“Brain en­thu­si­asts” in AI Safety

Jun 18, 2022, 9:59 AM
57 points

27 votes

Overall karma indicates overall quality.

5 comments10 min readLW link
(universalprior.substack.com)

FYI: I’m work­ing on a book about the threat of AGI/​ASI for a gen­eral au­di­ence. I hope it will be of value to the cause and the community

Darren McKeeJun 15, 2022, 6:08 PM
40 points

26 votes

Overall karma indicates overall quality.

17 comments2 min readLW link

A cen­tral AI al­ign­ment prob­lem: ca­pa­bil­ities gen­er­al­iza­tion, and the sharp left turn

So8resJun 15, 2022, 1:10 PM
253 points

106 votes

Overall karma indicates overall quality.

48 comments10 min readLW link

AI Risk, as Seen on Snapchat

dkirmaniJun 16, 2022, 7:31 PM
23 points

12 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

Hu­mans are very re­li­able agents

alyssavanceJun 16, 2022, 10:02 PM
248 points

134 votes

Overall karma indicates overall quality.

35 comments3 min readLW link

A pos­si­ble AI-in­oc­u­la­tion due to early “robot up­ris­ing”

ShmiJun 16, 2022, 9:21 PM
16 points

5 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

A trans­parency and in­ter­pretabil­ity tech tree

evhubJun 16, 2022, 11:44 PM
136 points

48 votes

Overall karma indicates overall quality.

10 comments19 min readLW link

Value ex­trap­o­la­tion vs Wireheading

Stuart_ArmstrongJun 17, 2022, 3:02 PM
16 points

5 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

#SAT with Ten­sor Networks

Adam JermynJun 17, 2022, 1:20 PM
4 points

2 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

wrap­per-minds are the enemy

nostalgebraistJun 17, 2022, 1:58 AM
92 points

43 votes

Overall karma indicates overall quality.

36 comments8 min readLW link

[Question] Is there an unified way to make sense of ai failure modes?

walking_mushroomJun 17, 2022, 6:00 PM
3 points

3 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Quan­tify­ing Gen­eral Intelligence

JasonBrownJun 17, 2022, 9:57 PM
9 points

8 votes

Overall karma indicates overall quality.

6 comments13 min readLW link

Pivotal out­comes and pivotal processes

Andrew_CritchJun 17, 2022, 11:43 PM
79 points

41 votes

Overall karma indicates overall quality.

32 comments4 min readLW link

Scott Aaron­son is join­ing OpenAI to work on AI safety

peterbarnettJun 18, 2022, 4:06 AM
117 points

64 votes

Overall karma indicates overall quality.

31 comments1 min readLW link
(scottaaronson.blog)

Can DALL-E un­der­stand sim­ple ge­om­e­try?

Isaac KingJun 18, 2022, 4:37 AM
25 points

9 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Spe­cific prob­lems with spe­cific an­i­mal com­par­i­sons for AI policy

trevorJun 19, 2022, 1:27 AM
3 points

2 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

Agent level parallelism

Johannes C. MayerJun 18, 2022, 8:56 PM
6 points

3 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

[Link-post] On Defer­ence and Yud­kowsky’s AI Risk Estimates

bmgJun 19, 2022, 5:25 PM
27 points

18 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

Where I agree and dis­agree with Eliezer

paulfchristianoJun 19, 2022, 7:15 PM
777 points

334 votes

Overall karma indicates overall quality.

205 comments20 min readLW link

Let’s See You Write That Cor­rigi­bil­ity Tag

Eliezer YudkowskyJun 19, 2022, 9:11 PM
109 points

56 votes

Overall karma indicates overall quality.

67 comments1 min readLW link

Are we there yet?

theflowerpotJun 20, 2022, 11:19 AM
2 points

2 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

On cor­rigi­bil­ity and its basin

Donald HobsonJun 20, 2022, 4:33 PM
16 points

7 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

Parable: The Bomb that doesn’t Explode

Lone PineJun 20, 2022, 4:41 PM
14 points

20 votes

Overall karma indicates overall quality.

5 comments2 min readLW link

Key Papers in Lan­guage Model Safety

aogJun 20, 2022, 3:00 PM
37 points

16 votes

Overall karma indicates overall quality.

1 comment22 min readLW link

Sur­vey re AIS/​LTism office in NYC

RyanCareyJun 20, 2022, 7:21 PM
7 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

An AI defense-offense sym­me­try thesis

Chris van MerwijkJun 20, 2022, 10:01 AM
10 points

5 votes

Overall karma indicates overall quality.

9 comments3 min readLW link

[Question] How easy/​fast is it for a AGI to hack com­put­ers/​a hu­man brain?

Noosphere89Jun 21, 2022, 12:34 AM
0 points

2 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

A Toy Model of Gra­di­ent Hacking

Oam PatelJun 20, 2022, 10:01 PM
25 points

10 votes

Overall karma indicates overall quality.

7 comments4 min readLW link

De­bat­ing Whether AI is Con­scious Is A Dis­trac­tion from Real Problems

sidhe_theyJun 21, 2022, 4:56 PM
4 points

6 votes

Overall karma indicates overall quality.

10 comments1 min readLW link
(techpolicy.press)

The in­or­di­nately slow spread of good AGI con­ver­sa­tions in ML

Rob BensingerJun 21, 2022, 4:09 PM
160 points

85 votes

Overall karma indicates overall quality.

66 comments8 min readLW link

[Question] What is the differ­ence be­tween AI mis­al­ign­ment and bad pro­gram­ming?

puzzleGuzzleJun 21, 2022, 9:52 PM
6 points

6 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Se­cu­rity Mind­set: Les­sons from 20+ years of Soft­ware Se­cu­rity Failures Rele­vant to AGI Alignment

elspoodJun 21, 2022, 11:55 PM
331 points

153 votes

Overall karma indicates overall quality.

40 comments7 min readLW link

A Quick List of Some Prob­lems in AI Align­ment As A Field

Nicholas KrossJun 21, 2022, 11:23 PM
74 points

50 votes

Overall karma indicates overall quality.

12 comments6 min readLW link
(www.thinkingmuchbetter.com)

Con­fu­sion about neu­ro­science/​cog­ni­tive sci­ence as a dan­ger for AI Alignment

Samuel NellessenJun 22, 2022, 5:59 PM
2 points

2 votes

Overall karma indicates overall quality.

1 comment3 min readLW link
(snellessen.com)

Air Con­di­tioner Test Re­sults & Discussion

johnswentworthJun 22, 2022, 10:26 PM
80 points

45 votes

Overall karma indicates overall quality.

38 comments6 min readLW link

Loose thoughts on AGI risk

YitzJun 23, 2022, 1:02 AM
7 points

2 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

[Question] What’s the con­tin­gency plan if we get AGI to­mor­row?

YitzJun 23, 2022, 3:10 AM
61 points

32 votes

Overall karma indicates overall quality.

24 comments1 min readLW link

[Question] What are the best “policy” ap­proaches in wor­lds where al­ign­ment is difficult?

LHAJun 23, 2022, 1:53 AM
1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

[Question] Is CIRL a promis­ing agenda?

Chris_LeongJun 23, 2022, 5:12 PM
25 points

12 votes

Overall karma indicates overall quality.

12 comments1 min readLW link

Half-baked AI Safety ideas thread

Aryeh EnglanderJun 23, 2022, 4:11 PM
58 points

22 votes

Overall karma indicates overall quality.

60 comments1 min readLW link

20 Cri­tiques of AI Safety That I Found on Twitter

dkirmaniJun 23, 2022, 7:23 PM
21 points

25 votes

Overall karma indicates overall quality.

16 comments1 min readLW link

Linkpost: Robin Han­son—Why Not Wait On AI Risk?

Yair HalberstadtJun 24, 2022, 2:23 PM
41 points

24 votes

Overall karma indicates overall quality.

14 comments1 min readLW link
(www.overcomingbias.com)

Raphaël Millière on Gen­er­al­iza­tion and Scal­ing Maximalism

Michaël TrazziJun 24, 2022, 6:18 PM
21 points

7 votes

Overall karma indicates overall quality.

2 comments4 min readLW link
(theinsideview.ai)

[Question] Do al­ign­ment con­cerns ex­tend to pow­er­ful non-AI agents?

OzyrusJun 24, 2022, 6:26 PM
21 points

10 votes

Overall karma indicates overall quality.

13 comments1 min readLW link

Depen­den­cies for AGI pessimism

YitzJun 24, 2022, 10:25 PM
6 points

5 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

What if the best path for a per­son who wants to work on AGI al­ign­ment is to join Face­book or Google?

dbaschJun 24, 2022, 9:23 PM
2 points

4 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

[Link] Ad­ver­sar­i­ally trained neu­ral rep­re­sen­ta­tions may already be as ro­bust as cor­re­spond­ing biolog­i­cal neu­ral representations

Gunnar_ZarnckeJun 24, 2022, 8:51 PM
35 points

12 votes

Overall karma indicates overall quality.

9 comments1 min readLW link

AI-Writ­ten Cri­tiques Help Hu­mans No­tice Flaws

paulfchristianoJun 25, 2022, 5:22 PM
133 points

66 votes

Overall karma indicates overall quality.

5 comments3 min readLW link
(openai.com)

[LQ] Some Thoughts on Mes­sag­ing Around AI Risk

DragonGodJun 25, 2022, 1:53 PM
5 points

4 votes

Overall karma indicates overall quality.

3 comments6 min readLW link

[Question] Should any hu­man en­slave an AGI sys­tem?

AlignmentMirrorJun 25, 2022, 7:35 PM
−13 points

8 votes

Overall karma indicates overall quality.

44 comments1 min readLW link

The Ba­sics of AGI Policy (Flowchart)

trevorJun 26, 2022, 2:01 AM
18 points

24 votes

Overall karma indicates overall quality.

8 comments2 min readLW link

Slow mo­tion videos as AI risk in­tu­ition pumps

Andrew_CritchJun 14, 2022, 7:31 PM
209 points

104 votes

Overall karma indicates overall quality.

36 comments2 min readLW link

Robin Han­son asks “Why Not Wait On AI Risk?”

Gunnar_ZarnckeJun 26, 2022, 11:32 PM
22 points

7 votes

Overall karma indicates overall quality.

4 comments1 min readLW link
(www.overcomingbias.com)

Epistemic mod­esty and how I think about AI risk

Aryeh EnglanderJun 27, 2022, 6:47 PM
22 points

7 votes

Overall karma indicates overall quality.

4 comments4 min readLW link

An­nounc­ing the In­verse Scal­ing Prize ($250k Prize Pool)

Jun 27, 2022, 3:58 PM
166 points

65 votes

Overall karma indicates overall quality.

14 comments7 min readLW link

Scott Aaron­son and Steven Pinker De­bate AI Scaling

LironJun 28, 2022, 4:04 PM
37 points

16 votes

Overall karma indicates overall quality.

10 comments1 min readLW link
(scottaaronson.blog)

Four rea­sons I find AI safety emo­tion­ally compelling

Jun 28, 2022, 2:10 PM
38 points

25 votes

Overall karma indicates overall quality.

3 comments4 min readLW link

Some al­ter­na­tive AI safety re­search projects

Michele CampoloJun 28, 2022, 2:09 PM
9 points

4 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

Assess­ing AlephAlphas Mul­ti­modal Model

p.b.Jun 28, 2022, 9:28 AM
30 points

18 votes

Overall karma indicates overall quality.

5 comments3 min readLW link

Kurzge­sagt – The Last Hu­man (Youtube)

habrykaJun 29, 2022, 3:28 AM
54 points

31 votes

Overall karma indicates overall quality.

7 comments1 min readLW link
(www.youtube.com)

Can We Align AI by Hav­ing It Learn Hu­man Prefer­ences? I’m Scared (sum­mary of last third of Hu­man Com­pat­i­ble)

apollonianbluesJun 29, 2022, 4:09 AM
19 points

9 votes

Overall karma indicates overall quality.

3 comments6 min readLW link

Look­ing back on my al­ign­ment PhD

TurnTroutJul 1, 2022, 3:19 AM
287 points

120 votes

Overall karma indicates overall quality.

60 comments11 min readLW link

Will Ca­pa­bil­ities Gen­er­al­ise More?

Ramana KumarJun 29, 2022, 5:12 PM
109 points

48 votes

Overall karma indicates overall quality.

38 comments4 min readLW link

Gra­di­ent hack­ing: defi­ni­tions and examples

Richard_NgoJun 29, 2022, 9:35 PM
24 points

8 votes

Overall karma indicates overall quality.

1 comment5 min readLW link

[Question] Cor­rect­ing hu­man er­ror vs do­ing ex­actly what you’re told—is there liter­a­ture on this in con­text of gen­eral sys­tem de­sign?

Jan CzechowskiJun 29, 2022, 9:30 PM
6 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Most Func­tions Have Un­de­sir­able Global Extrema

En KepeigJun 30, 2022, 5:10 PM
8 points

8 votes

Overall karma indicates overall quality.

5 comments3 min readLW link

$500 bounty for al­ign­ment con­test ideas

Orpheus16Jun 30, 2022, 1:56 AM
29 points

13 votes

Overall karma indicates overall quality.

5 comments2 min readLW link

Quick sur­vey on AI al­ign­ment resources

frances_lorenzJun 30, 2022, 7:09 PM
14 points

6 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

[Linkpost] Solv­ing Quan­ti­ta­tive Rea­son­ing Prob­lems with Lan­guage Models

YitzJun 30, 2022, 6:58 PM
76 points

34 votes

Overall karma indicates overall quality.

15 comments2 min readLW link
(storage.googleapis.com)

GPT-3 Catch­ing Fish in Morse Code

Megan KinnimentJun 30, 2022, 9:22 PM
110 points

63 votes

Overall karma indicates overall quality.

27 comments8 min readLW link

Selec­tion pro­cesses for subagents

Ryan KiddJun 30, 2022, 11:57 PM
33 points

14 votes

Overall karma indicates overall quality.

2 comments9 min readLW link

AI safety uni­ver­sity groups: a promis­ing op­por­tu­nity to re­duce ex­is­ten­tial risk

micJul 1, 2022, 3:59 AM
13 points

5 votes

Overall karma indicates overall quality.

0 comments11 min readLW link

Safetywashing

Adam SchollJul 1, 2022, 11:56 AM
212 points

111 votes

Overall karma indicates overall quality.

17 comments1 min readLW link

[Question] AGI al­ign­ment with what?

AlignmentMirrorJul 1, 2022, 10:22 AM
6 points

5 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

What Is The True Name of Mo­du­lar­ity?

Jul 1, 2022, 2:55 PM
21 points

13 votes

Overall karma indicates overall quality.

10 comments12 min readLW link

AXRP Epi­sode 16 - Prepar­ing for De­bate AI with Ge­offrey Irving

DanielFilanJul 1, 2022, 10:20 PM
14 points

6 votes

Overall karma indicates overall quality.

0 comments37 min readLW link

Agenty AGI – How Tempt­ing?

PeterMcCluskeyJul 1, 2022, 11:40 PM
21 points

13 votes

Overall karma indicates overall quality.

3 comments5 min readLW link
(www.bayesianinvestor.com)

[Linkpost] Ex­is­ten­tial Risk Anal­y­sis in Em­piri­cal Re­search Papers

Dan HJul 2, 2022, 12:09 AM
40 points

17 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(arxiv.org)

Minerva

AlgonJul 1, 2022, 8:06 PM
35 points

17 votes

Overall karma indicates overall quality.

6 comments2 min readLW link
(ai.googleblog.com)

Could an AI Align­ment Sand­box be use­ful?

Michael SoareverixJul 2, 2022, 5:06 AM
2 points

4 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Goal-di­rect­ed­ness: tack­ling complexity

Morgan_RogersJul 2, 2022, 1:51 PM
8 points

4 votes

Overall karma indicates overall quality.

0 comments38 min readLW link

[Question] Which one of these two aca­demic routes should I take to end up in AI Safety?

Martín SotoJul 3, 2022, 1:05 AM
5 points

4 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Won­der and The Golden AI Rule

JeffreyKJul 3, 2022, 6:21 PM
0 points

7 votes

Overall karma indicates overall quality.

4 comments6 min readLW link

De­ci­sion the­ory and dy­namic inconsistency

paulfchristianoJul 3, 2022, 10:20 PM
66 points

27 votes

Overall karma indicates overall quality.

33 comments10 min readLW link
(sideways-view.com)

AI Fore­cast­ing: One Year In

jsteinhardtJul 4, 2022, 5:10 AM
131 points

65 votes

Overall karma indicates overall quality.

12 comments6 min readLW link
(bounded-regret.ghost.io)

Re­mak­ing Effi­cien­tZero (as best I can)

HoagyJul 4, 2022, 11:03 AM
34 points

22 votes

Overall karma indicates overall quality.

9 comments22 min readLW link

Please help us com­mu­ni­cate AI xrisk. It could save the world.

otto.bartenJul 4, 2022, 9:47 PM
4 points

14 votes

Overall karma indicates overall quality.

7 comments2 min readLW link

Bench­mark for suc­cess­ful con­cept ex­trap­o­la­tion/​avoid­ing goal misgeneralization

Stuart_ArmstrongJul 4, 2022, 8:48 PM
80 points

37 votes

Overall karma indicates overall quality.

12 comments4 min readLW link

An­thropic’s SoLU (Soft­max Lin­ear Unit)

Joel BurgetJul 4, 2022, 6:38 PM
15 points

9 votes

Overall karma indicates overall quality.

1 comment4 min readLW link
(transformer-circuits.pub)

[AN #172] Sorry for the long hi­a­tus!

Rohin ShahJul 5, 2022, 6:20 AM
54 points

21 votes

Overall karma indicates overall quality.

0 comments3 min readLW link
(mailchi.mp)

Prin­ci­ples for Align­ment/​Agency Projects

johnswentworthJul 7, 2022, 2:07 AM
115 points

50 votes

Overall karma indicates overall quality.

20 comments4 min readLW link

Race Along Rashomon Ridge

Jul 7, 2022, 3:20 AM
49 points

25 votes

Overall karma indicates overall quality.

15 comments8 min readLW link

Con­fu­sions in My Model of AI Risk

peterbarnettJul 7, 2022, 1:05 AM
21 points

7 votes

Overall karma indicates overall quality.

9 comments5 min readLW link

Safety con­sid­er­a­tions for on­line gen­er­a­tive modeling

Sam MarksJul 7, 2022, 6:31 PM
41 points

19 votes

Overall karma indicates overall quality.

9 comments14 min readLW link

Re­in­force­ment Learner Wireheading

Nate ShowellJul 8, 2022, 5:32 AM
8 points

6 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

MATS Models

johnswentworthJul 9, 2022, 12:14 AM
84 points

34 votes

Overall karma indicates overall quality.

5 comments16 min readLW link

Train first VS prune first in neu­ral net­works.

Donald HobsonJul 9, 2022, 3:53 PM
20 points

7 votes

Overall karma indicates overall quality.

5 comments2 min readLW link

Re­search Notes: What are we al­ign­ing for?

Shoshannah TekofskyJul 8, 2022, 10:13 PM
19 points

13 votes

Overall karma indicates overall quality.

8 comments2 min readLW link

Re­port from a civ­i­liza­tional ob­server on Earth

owencbJul 9, 2022, 5:26 PM
49 points

22 votes

Overall karma indicates overall quality.

12 comments6 min readLW link

Vi­su­al­iz­ing Neu­ral net­works, how to blame the bias

Donald HobsonJul 9, 2022, 3:52 PM
7 points

2 votes

Overall karma indicates overall quality.

1 comment6 min readLW link

Com­ment on “Propo­si­tions Con­cern­ing Digi­tal Minds and So­ciety”

Zack_M_DavisJul 10, 2022, 5:48 AM
95 points

31 votes

Overall karma indicates overall quality.

12 comments8 min readLW link

Hes­sian and Basin volume

Vivek HebbarJul 10, 2022, 6:59 AM
33 points

14 votes

Overall karma indicates overall quality.

9 comments4 min readLW link

Check­sum Sen­sor Alignment

lsusrJul 11, 2022, 3:31 AM
12 points

15 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

The Align­ment Problem

lsusrJul 11, 2022, 3:03 AM
45 points

32 votes

Overall karma indicates overall quality.

20 comments3 min readLW link

[Question] How do AI timelines af­fect how you live your life?

Quadratic ReciprocityJul 11, 2022, 1:54 PM
77 points

38 votes

Overall karma indicates overall quality.

47 comments1 min readLW link

Three Min­i­mum Pivotal Acts Pos­si­ble by Nar­row AI

Michael SoareverixJul 12, 2022, 9:51 AM
0 points

4 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

On how var­i­ous plans miss the hard bits of the al­ign­ment challenge

So8resJul 12, 2022, 2:49 AM
258 points

121 votes

Overall karma indicates overall quality.

81 comments29 min readLW link

[Question] What is wrong with this ap­proach to cor­rigi­bil­ity?

Rafael CosmanJul 12, 2022, 10:55 PM
7 points

7 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

MIRI Con­ver­sa­tions: Tech­nol­ogy Fore­cast­ing & Grad­u­al­ism (Distil­la­tion)

CallumMcDougallJul 13, 2022, 3:55 PM
31 points

15 votes

Overall karma indicates overall quality.

1 comment20 min readLW link

[Question] Which AI Safety re­search agen­das are the most promis­ing?

Chris_LeongJul 13, 2022, 7:54 AM
27 points

14 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Deep learn­ing cur­ricu­lum for large lan­guage model alignment

Jacob_HiltonJul 13, 2022, 9:58 PM
53 points

22 votes

Overall karma indicates overall quality.

3 comments1 min readLW link
(github.com)

Ar­tifi­cial Sand­wich­ing: When can we test scal­able al­ign­ment pro­to­cols with­out hu­mans?

Sam BowmanJul 13, 2022, 9:14 PM
40 points

19 votes

Overall karma indicates overall quality.

6 comments5 min readLW link

[Question] How to im­press stu­dents with re­cent ad­vances in ML?

Charbel-RaphaëlJul 14, 2022, 12:03 AM
12 points

7 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Cir­cum­vent­ing in­ter­pretabil­ity: How to defeat mind-readers

Lee SharkeyJul 14, 2022, 4:59 PM
94 points

45 votes

Overall karma indicates overall quality.

8 comments36 min readLW link

Mus­ings on the Hu­man Ob­jec­tive Function

Michael SoareverixJul 15, 2022, 7:13 AM
3 points

5 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

Peter Singer’s first pub­lished piece on AI

FaiJul 15, 2022, 6:18 AM
20 points

14 votes

Overall karma indicates overall quality.

5 comments1 min readLW link
(link.springer.com)

Notes on Learn­ing the Prior

carboniferous_umbraculum Jul 15, 2022, 5:28 PM
21 points

5 votes

Overall karma indicates overall quality.

2 comments25 min readLW link

Pro­posed Orthog­o­nal­ity Th­e­ses #2-5

rjbgJul 14, 2022, 10:59 PM
6 points

7 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

A story about a du­plic­i­tous API

LiLiLiJul 15, 2022, 6:26 PM
2 points

10 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Safety Im­pli­ca­tions of LeCun’s path to ma­chine intelligence

Ivan VendrovJul 15, 2022, 9:47 PM
89 points

38 votes

Overall karma indicates overall quality.

16 comments6 min readLW link

QNR Prospects

PeterMcCluskeyJul 16, 2022, 2:03 AM
38 points

8 votes

Overall karma indicates overall quality.

3 comments8 min readLW link
(www.bayesianinvestor.com)

All AGI safety ques­tions wel­come (es­pe­cially ba­sic ones) [July 2022]

Jul 16, 2022, 12:57 PM
84 points

41 votes

Overall karma indicates overall quality.

130 comments3 min readLW link

Align­ment as Game Design

Shoshannah TekofskyJul 16, 2022, 10:36 PM
11 points

10 votes

Overall karma indicates overall quality.

7 comments2 min readLW link

Why I Think Abrupt AI Takeoff

lincolnquirkJul 17, 2022, 5:04 PM
14 points

6 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Why you might ex­pect ho­mo­ge­neous take-off: ev­i­dence from ML research

Andrei AlexandruJul 17, 2022, 8:31 PM
24 points

10 votes

Overall karma indicates overall quality.

0 comments10 min readLW link

What should you change in re­sponse to an “emer­gency”? And AI risk

AnnaSalamonJul 18, 2022, 1:11 AM
303 points

139 votes

Overall karma indicates overall quality.

60 comments6 min readLW link

Quan­tiliz­ers and Gen­er­a­tive Models

Adam JermynJul 18, 2022, 4:32 PM
24 points

9 votes

Overall karma indicates overall quality.

5 comments4 min readLW link

Train­ing goals for large lan­guage models

Johannes TreutleinJul 18, 2022, 7:09 AM
26 points

10 votes

Overall karma indicates overall quality.

5 comments19 min readLW link

Ma­chine Learn­ing Model Sizes and the Pa­ram­e­ter Gap [abridged]

Pablo VillalobosJul 18, 2022, 4:51 PM
20 points

12 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(epochai.org)

Without spe­cific coun­ter­mea­sures, the eas­iest path to trans­for­ma­tive AI likely leads to AI takeover

Ajeya CotraJul 18, 2022, 7:06 PM
310 points

133 votes

Overall karma indicates overall quality.

89 comments84 min readLW link

At what point will we know if Eliezer’s pre­dic­tions are right or wrong?

anonymous123456Jul 18, 2022, 10:06 PM
5 points

9 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

A daily rou­tine I do for my AI safety re­search work

scasperJul 19, 2022, 9:58 PM
15 points

8 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

Pit­falls with Proofs

scasperJul 19, 2022, 10:21 PM
19 points

12 votes

Overall karma indicates overall quality.

21 comments8 min readLW link

Which sin­gu­lar­ity schools plus the no sin­gu­lar­ity school was right?

Noosphere89Jul 23, 2022, 3:16 PM
9 points

12 votes

Overall karma indicates overall quality.

27 comments9 min readLW link

Defin­ing Op­ti­miza­tion in a Deeper Way Part 3

J BostockJul 20, 2022, 10:06 PM
8 points

3 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

[AN #173] Re­cent lan­guage model re­sults from DeepMind

Rohin ShahJul 21, 2022, 2:30 AM
37 points

17 votes

Overall karma indicates overall quality.

9 comments8 min readLW link
(mailchi.mp)

[Question] How much to op­ti­mize for the short-timelines sce­nario?

SoerenMindJul 21, 2022, 10:47 AM
19 points

11 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Mak­ing DALL-E Count

DirectedEvolutionJul 22, 2022, 9:11 AM
23 points

14 votes

Overall karma indicates overall quality.

12 comments4 min readLW link

Con­di­tion­ing Gen­er­a­tive Models with Restrictions

Adam JermynJul 21, 2022, 8:33 PM
16 points

7 votes

Overall karma indicates overall quality.

4 comments8 min readLW link

Gen­eral al­ign­ment properties

TurnTroutAug 8, 2022, 11:40 PM
46 points

20 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Which val­ues are sta­ble un­der on­tol­ogy shifts?

Richard_NgoJul 23, 2022, 2:40 AM
68 points

33 votes

Overall karma indicates overall quality.

47 comments3 min readLW link
(thinkingcomplete.blogspot.com)

Try­ing out Prompt Eng­ineer­ing on TruthfulQA

Megan KinnimentJul 23, 2022, 2:04 AM
10 points

5 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

Sym­bolic dis­til­la­tion, Diffu­sion, En­tropy, Repli­ca­tors, Agents, oh my (a mid-low qual­ity think­ing out loud post)

the gears to ascensionJul 23, 2022, 9:13 PM
2 points

6 votes

Overall karma indicates overall quality.

2 comments6 min readLW link

Eaves­drop­ping on Aliens: A Data De­cod­ing Challenge

anonymousaisafetyJul 24, 2022, 4:35 AM
44 points

18 votes

Overall karma indicates overall quality.

9 comments4 min readLW link

How much should we worry about mesa-op­ti­miza­tion challenges?

sudoJul 25, 2022, 3:56 AM
4 points

2 votes

Overall karma indicates overall quality.

13 comments2 min readLW link

[Question] Does agent foun­da­tions cover all fu­ture ML sys­tems?

Jonas HallgrenJul 25, 2022, 1:17 AM
2 points

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

[Question] How op­ti­mistic should we be about AI figur­ing out how to in­ter­pret it­self?

oh54321Jul 25, 2022, 10:09 PM
3 points

4 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Ac­tive In­fer­ence as a for­mal­i­sa­tion of in­stru­men­tal convergence

Roman LeventovJul 26, 2022, 5:55 PM
6 points

4 votes

Overall karma indicates overall quality.

2 comments3 min readLW link
(direct.mit.edu)

«Boundaries» Se­quence (In­dex Post)

Andrew_CritchJul 26, 2022, 7:12 PM
23 points

13 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Mo­ral strate­gies at differ­ent ca­pa­bil­ity levels

Richard_NgoJul 27, 2022, 6:50 PM
95 points

36 votes

Overall karma indicates overall quality.

14 comments5 min readLW link
(thinkingcomplete.blogspot.com)

Prin­ci­ples of Pri­vacy for Align­ment Research

johnswentworthJul 27, 2022, 7:53 PM
68 points

24 votes

Overall karma indicates overall quality.

30 comments7 min readLW link

Seek­ing beta read­ers who are ig­no­rant of biol­ogy but knowl­edge­able about AI safety

Holly_ElmoreJul 27, 2022, 11:02 PM
10 points

6 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Defin­ing Op­ti­miza­tion in a Deeper Way Part 4

J BostockJul 28, 2022, 5:02 PM
7 points

2 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

An­nounc­ing the AI Safety Field Build­ing Hub, a new effort to provide AISFB pro­jects, men­tor­ship, and funding

Vael GatesJul 28, 2022, 9:29 PM
49 points

22 votes

Overall karma indicates overall quality.

3 comments6 min readLW link

Distil­la­tion Con­test—Re­sults and Recap

ArisJul 29, 2022, 5:40 PM
33 points

19 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

Ab­stract­ing The Hard­ness of Align­ment: Un­bounded Atomic Optimization

adamShimiJul 29, 2022, 6:59 PM
62 points

27 votes

Overall karma indicates overall quality.

3 comments16 min readLW link

How trans­parency changed over time

ViktoriaMalyasovaJul 30, 2022, 4:36 AM
21 points

8 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

Trans­lat­ing be­tween La­tent Spaces

Jul 30, 2022, 3:25 AM
20 points

12 votes

Overall karma indicates overall quality.

1 comment8 min readLW link

AGI-level rea­soner will ap­pear sooner than an agent; what the hu­man­ity will do with this rea­soner is critical

Roman LeventovJul 30, 2022, 8:56 PM
24 points

14 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

chin­chilla’s wild implications

nostalgebraistJul 31, 2022, 1:18 AM
366 points

205 votes

Overall karma indicates overall quality.

114 comments11 min readLW link

Tech­ni­cal AI Align­ment Study Group

Eric KAug 1, 2022, 6:33 PM
5 points

7 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

[Question] Which in­tro-to-AI-risk text would you recom­mend to...

SherrinfordAug 1, 2022, 9:36 AM
12 points

6 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Two-year up­date on my per­sonal AI timelines

Ajeya CotraAug 2, 2022, 11:07 PM
287 points

129 votes

Overall karma indicates overall quality.

60 comments16 min readLW link

What are the Red Flags for Neu­ral Net­work Suffer­ing? - Seeds of Science call for reviewers

rogersbaconAug 2, 2022, 10:37 PM
24 points

9 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Pre­cur­sor check­ing for de­cep­tive alignment

evhubAug 3, 2022, 10:56 PM
18 points

6 votes

Overall karma indicates overall quality.

0 comments14 min readLW link

Sur­vey: What (de)mo­ti­vates you about AI risk?

Daniel_FriedrichAug 3, 2022, 7:17 PM
1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link
(forms.gle)

High Reli­a­bil­ity Orgs, and AI Companies

RaemonAug 4, 2022, 5:45 AM
73 points

34 votes

Overall karma indicates overall quality.

6 comments12 min readLW link

In­ter­pretabil­ity isn’t Free

Joel BurgetAug 4, 2022, 3:02 PM
10 points

8 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

[Question] AI al­ign­ment: Would a lazy self-preser­va­tion in­stinct be suffi­cient?

BrainFrogAug 4, 2022, 5:53 PM
−1 points

2 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

[Question] What drives progress, the­ory or ap­pli­ca­tion?

lberglundAug 5, 2022, 1:14 AM
5 points

3 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

The Prag­mas­cope Idea

johnswentworthAug 4, 2022, 9:52 PM
55 points

24 votes

Overall karma indicates overall quality.

19 comments3 min readLW link

$20K In Boun­ties for AI Safety Public Materials

Aug 5, 2022, 2:52 AM
68 points

32 votes

Overall karma indicates overall quality.

7 comments6 min readLW link

Rant on Prob­lem Fac­tor­iza­tion for Alignment

johnswentworthAug 5, 2022, 7:23 PM
73 points

64 votes

Overall karma indicates overall quality.

48 comments6 min readLW link

Rant on Prob­lem Fac­tor­iza­tion for Alignment

johnswentworthAug 5, 2022, 7:23 PM
73 points

64 votes

Overall karma indicates overall quality.

48 comments6 min readLW link

An­nounc­ing the In­tro­duc­tion to ML Safety course

Aug 6, 2022, 2:46 AM
69 points

36 votes

Overall karma indicates overall quality.

6 comments7 min readLW link

Why I Am Skep­ti­cal of AI Reg­u­la­tion as an X-Risk Miti­ga­tion Strategy

A RayAug 6, 2022, 5:46 AM
31 points

14 votes

Overall karma indicates overall quality.

14 comments2 min readLW link

My ad­vice on find­ing your own path

A RayAug 6, 2022, 4:57 AM
34 points

13 votes

Overall karma indicates overall quality.

3 comments3 min readLW link

A De­cep­tively Sim­ple Ar­gu­ment in fa­vor of Prob­lem Factorization

Logan ZoellnerAug 6, 2022, 5:32 PM
3 points

3 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

[Question] Can we get full au­dio for Eliezer’s con­ver­sa­tion with Sam Har­ris?

JakubKAug 7, 2022, 8:35 PM
30 points

15 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

How Deadly Will Roughly-Hu­man-Level AGI Be?

David UdellAug 8, 2022, 1:59 AM
12 points

5 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Broad Bas­ins and Data Compression

Aug 8, 2022, 8:33 PM
29 points

9 votes

Overall karma indicates overall quality.

6 comments7 min readLW link

En­cul­tured AI Pre-plan­ning, Part 1: En­abling New Benchmarks

Aug 8, 2022, 10:44 PM
62 points

20 votes

Overall karma indicates overall quality.

2 comments6 min readLW link

En­cul­tured AI, Part 1 Ap­pendix: Rele­vant Re­search Examples

Aug 8, 2022, 10:44 PM
11 points

5 votes

Overall karma indicates overall quality.

1 comment7 min readLW link

Disagree­ments about Align­ment: Why, and how, we should try to solve them

ojorgensenAug 9, 2022, 6:49 PM
8 points

6 votes

Overall karma indicates overall quality.

1 comment16 min readLW link

[Question] Many Gods re­fu­ta­tion and In­stru­men­tal Goals. (Proper one)

aditya malikAug 9, 2022, 11:59 AM
0 points

4 votes

Overall karma indicates overall quality.

15 comments1 min readLW link

[Question] Is it pos­si­ble to find ven­ture cap­i­tal for AI re­search org with strong safety fo­cus?

AnonResearchAug 9, 2022, 4:12 PM
6 points

4 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Us­ing GPT-3 to aug­ment hu­man intelligence

Henrik KarlssonAug 10, 2022, 3:54 PM
48 points

20 votes

Overall karma indicates overall quality.

7 comments18 min readLW link
(escapingflatland.substack.com)

Emer­gent Abil­ities of Large Lan­guage Models [Linkpost]

aogAug 10, 2022, 6:02 PM
25 points

12 votes

Overall karma indicates overall quality.

2 comments1 min readLW link
(arxiv.org)

How Do We Align an AGI Without Get­ting So­cially Eng­ineered? (Hint: Box It)

Aug 10, 2022, 6:14 PM
26 points

22 votes

Overall karma indicates overall quality.

30 comments11 min readLW link

The al­ign­ment prob­lem from a deep learn­ing perspective

Richard_NgoAug 10, 2022, 10:46 PM
93 points

35 votes

Overall karma indicates overall quality.

13 comments27 min readLW link

How much al­ign­ment data will we need in the long run?

Jacob_HiltonAug 10, 2022, 9:39 PM
34 points

18 votes

Overall karma indicates overall quality.

15 comments4 min readLW link

Thoughts on the good reg­u­la­tor theorem

JonasMossAug 11, 2022, 12:08 PM
8 points

5 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Lan­guage mod­els seem to be much bet­ter than hu­mans at next-to­ken prediction

Aug 11, 2022, 5:45 PM
164 points

76 votes

Overall karma indicates overall quality.

56 comments13 min readLW link

[Question] Se­ri­ously, what goes wrong with “re­ward the agent when it makes you smile”?

TurnTroutAug 11, 2022, 10:22 PM
76 points

43 votes

Overall karma indicates overall quality.

41 comments2 min readLW link

Dis­sected boxed AI

Nathan1123Aug 12, 2022, 2:37 AM
−8 points

6 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Steelmin­ing via Analogy

Paul BricmanAug 13, 2022, 9:59 AM
24 points

11 votes

Overall karma indicates overall quality.

0 comments2 min readLW link
(paulbricman.com)

Refin­ing the Sharp Left Turn threat model, part 1: claims and mechanisms

Aug 12, 2022, 3:17 PM
71 points

31 votes

Overall karma indicates overall quality.

3 comments3 min readLW link
(vkrakovna.wordpress.com)

Over­sight Misses 100% of Thoughts The AI Does Not Think

johnswentworthAug 12, 2022, 4:30 PM
85 points

49 votes

Overall karma indicates overall quality.

49 comments1 min readLW link

Timelines ex­pla­na­tion post part 1 of ?

Nathan Helm-BurgerAug 12, 2022, 4:13 PM
10 points

5 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

A lit­tle play­ing around with Blen­der­bot3

Nathan Helm-BurgerAug 12, 2022, 4:06 PM
9 points

4 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Deep­Mind al­ign­ment team opinions on AGI ruin arguments

VikaAug 12, 2022, 9:06 PM
364 points

158 votes

Overall karma indicates overall quality.

34 comments14 min readLW link

the In­su­lated Goal-Pro­gram idea

Tamsin LeakeAug 13, 2022, 9:57 AM
39 points

13 votes

Overall karma indicates overall quality.

3 comments2 min readLW link
(carado.moe)

goal-pro­gram bricks

Tamsin LeakeAug 13, 2022, 10:08 AM
27 points

10 votes

Overall karma indicates overall quality.

2 comments2 min readLW link
(carado.moe)

How I think about alignment

Linda LinseforsAug 13, 2022, 10:01 AM
30 points

18 votes

Overall karma indicates overall quality.

11 comments5 min readLW link

Refine’s First Blog Post Day

adamShimiAug 13, 2022, 10:23 AM
55 points

20 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Shapes of Mind and Plu­ral­ism in Alignment

adamShimiAug 13, 2022, 10:01 AM
30 points

13 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

An ex­tended rocket al­ign­ment analogy

rememberAug 13, 2022, 6:22 PM
25 points

12 votes

Overall karma indicates overall quality.

3 comments4 min readLW link

Cul­ti­vat­ing Valiance

Shoshannah TekofskyAug 13, 2022, 6:47 PM
35 points

19 votes

Overall karma indicates overall quality.

4 comments4 min readLW link

Evolu­tion is a bad anal­ogy for AGI: in­ner alignment

Quintin PopeAug 13, 2022, 10:15 PM
52 points

26 votes

Overall karma indicates overall quality.

6 comments8 min readLW link

A brief note on Sim­plic­ity Bias

carboniferous_umbraculum Aug 14, 2022, 2:05 AM
16 points

9 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Seek­ing In­terns/​RAs for Mechanis­tic In­ter­pretabil­ity Projects

Neel NandaAug 15, 2022, 7:11 AM
61 points

24 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Ex­treme Security

lcAug 15, 2022, 12:11 PM
39 points

28 votes

Overall karma indicates overall quality.

4 comments5 min readLW link

On Prefer­ence Ma­nipu­la­tion in Re­ward Learn­ing Processes

Felix HofstätterAug 15, 2022, 7:32 PM
8 points

4 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Limits of Ask­ing ELK if Models are Deceptive

Oam PatelAug 15, 2022, 8:44 PM
6 points

4 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

What Makes an Idea Un­der­stand­able? On Ar­chi­tec­turally and Cul­turally Nat­u­ral Ideas.

Aug 16, 2022, 2:09 AM
17 points

11 votes

Overall karma indicates overall quality.

2 comments16 min readLW link

De­cep­tion as the op­ti­mal: mesa-op­ti­miz­ers and in­ner al­ign­ment

Eleni AngelouAug 16, 2022, 4:49 AM
10 points

5 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Un­der­stand­ing differ­ences be­tween hu­mans and in­tel­li­gence-in-gen­eral to build safe AGI

Florian_DietzAug 16, 2022, 8:27 AM
7 points

4 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

Au­ton­omy as tak­ing re­spon­si­bil­ity for refer­ence maintenance

Ramana KumarAug 17, 2022, 12:50 PM
52 points

15 votes

Overall karma indicates overall quality.

3 comments5 min readLW link

Thoughts on ‘List of Lethal­ities’

Alex Lawsen Aug 17, 2022, 6:33 PM
25 points

11 votes

Overall karma indicates overall quality.

0 comments10 min readLW link

Hu­man Mimicry Mainly Works When We’re Already Close

johnswentworthAug 17, 2022, 6:41 PM
68 points

26 votes

Overall karma indicates overall quality.

16 comments5 min readLW link

The Core of the Align­ment Prob­lem is...

Aug 17, 2022, 8:07 PM
58 points

26 votes

Overall karma indicates overall quality.

10 comments9 min readLW link

Con­crete Ad­vice for Form­ing In­side Views on AI Safety

Neel NandaAug 17, 2022, 10:02 PM
18 points

10 votes

Overall karma indicates overall quality.

6 comments10 min readLW link

An­nounc­ing En­cul­tured AI: Build­ing a Video Game

Aug 18, 2022, 2:16 AM
103 points

38 votes

Overall karma indicates overall quality.

26 comments4 min readLW link

An­nounc­ing the Distil­la­tion for Align­ment Practicum (DAP)

Aug 18, 2022, 7:50 PM
21 points

16 votes

Overall karma indicates overall quality.

3 comments3 min readLW link

Align­ment’s phlo­gis­ton

Eleni AngelouAug 18, 2022, 10:27 PM
10 points

7 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

[Question] Are lan­guage mod­els close to the su­per­hu­man level in philos­o­phy?

Roman LeventovAug 19, 2022, 4:43 AM
5 points

5 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

How to do the­o­ret­i­cal re­search, a per­sonal perspective

Mark XuAug 19, 2022, 7:41 PM
84 points

38 votes

Overall karma indicates overall quality.

4 comments15 min readLW link

Refine’s Se­cond Blog Post Day

adamShimiAug 20, 2022, 1:01 PM
19 points

5 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

No One-Size-Fit-All Epistemic Strategy

adamShimiAug 20, 2022, 12:56 PM
23 points

10 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

Re­duc­ing Good­hart: An­nounce­ment, Ex­ec­u­tive Summary

Charlie SteinerAug 20, 2022, 9:49 AM
14 points

6 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Pivotal acts us­ing an un­al­igned AGI?

Simon FischerAug 21, 2022, 5:13 PM
26 points

13 votes

Overall karma indicates overall quality.

3 comments8 min readLW link

Beyond Hyperanthropomorphism

PointlessOneAug 21, 2022, 5:55 PM
3 points

13 votes

Overall karma indicates overall quality.

17 comments1 min readLW link
(studio.ribbonfarm.com)

AXRP Epi­sode 17 - Train­ing for Very High Reli­a­bil­ity with Daniel Ziegler

DanielFilanAug 21, 2022, 11:50 PM
16 points

6 votes

Overall karma indicates overall quality.

0 comments34 min readLW link

[Question] What if we solve AI Safety but no one cares

142857Aug 22, 2022, 5:38 AM
18 points

11 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Find­ing Goals in the World Model

Aug 22, 2022, 6:06 PM
55 points

21 votes

Overall karma indicates overall quality.

8 comments13 min readLW link

[Question] AI Box Ex­per­i­ment: Are peo­ple still in­ter­ested?

DoubleAug 31, 2022, 3:04 AM
31 points

16 votes

Overall karma indicates overall quality.

13 comments1 min readLW link

Stable Diffu­sion has been released

P.Aug 22, 2022, 7:42 PM
15 points

13 votes

Overall karma indicates overall quality.

7 comments1 min readLW link
(stability.ai)

Dis­cus­sion on uti­liz­ing AI for alignment

eliflandAug 23, 2022, 2:36 AM
16 points

7 votes

Overall karma indicates overall quality.

3 comments1 min readLW link
(www.foxy-scout.com)

It Looks Like You’re Try­ing To Take Over The Narrative

George3d6Aug 24, 2022, 1:36 PM
2 points

15 votes

Overall karma indicates overall quality.

20 comments9 min readLW link
(www.epistem.ink)

Thoughts about OOD alignment

CatneeAug 24, 2022, 3:31 PM
11 points

6 votes

Overall karma indicates overall quality.

10 comments2 min readLW link

Vingean Agency

abramdemskiAug 24, 2022, 8:08 PM
57 points

30 votes

Overall karma indicates overall quality.

13 comments3 min readLW link

In­ter­species diplo­macy as a po­ten­tially pro­duc­tive lens on AGI alignment

Shariq HashmeAug 24, 2022, 5:59 PM
5 points

3 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

OpenAI’s Align­ment Plans

dkirmaniAug 24, 2022, 7:39 PM
60 points

35 votes

Overall karma indicates overall quality.

17 comments5 min readLW link
(openai.com)

What Makes A Good Mea­sure­ment De­vice?

johnswentworthAug 24, 2022, 10:45 PM
35 points

16 votes

Overall karma indicates overall quality.

7 comments2 min readLW link

Eval­u­at­ing OpenAI’s al­ign­ment plans us­ing train­ing stories

ojorgensenAug 25, 2022, 4:12 PM
3 points

3 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

A Test for Lan­guage Model Consciousness

Ethan PerezAug 25, 2022, 7:41 PM
18 points

8 votes

Overall karma indicates overall quality.

14 comments10 min readLW link

Seek­ing Stu­dent Sub­mis­sions: Edit Your Source Code Contest

ArisAug 26, 2022, 2:08 AM
28 points

9 votes

Overall karma indicates overall quality.

5 comments2 min readLW link

Basin broad­ness de­pends on the size and num­ber of or­thog­o­nal features

Aug 27, 2022, 5:29 PM
34 points

16 votes

Overall karma indicates overall quality.

21 comments6 min readLW link

Suffi­ciently many Godzillas as an al­ign­ment strategy

142857Aug 28, 2022, 12:08 AM
8 points

4 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Ar­tifi­cial Mo­ral Ad­vi­sors: A New Per­spec­tive from Mo­ral Psychology

David GrossAug 28, 2022, 4:37 PM
25 points

4 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(dl.acm.org)

First thing AI will do when it takes over is get fis­sion going

visiaxAug 28, 2022, 5:56 AM
−2 points

4 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Robert Long On Why Ar­tifi­cial Sen­tience Might Matter

Michaël TrazziAug 28, 2022, 5:30 PM
26 points

12 votes

Overall karma indicates overall quality.

5 comments5 min readLW link
(theinsideview.ai)

How Do AI Timelines Affect Ex­is­ten­tial Risk?

Stephen McAleeseAug 29, 2022, 4:57 PM
7 points

5 votes

Overall karma indicates overall quality.

9 comments23 min readLW link

[Question] What is the best cri­tique of AI ex­is­ten­tial risk ar­gu­ments?

joshcAug 30, 2022, 2:18 AM
5 points

5 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

Can We Align a Self-Im­prov­ing AGI?

Peter S. ParkAug 30, 2022, 12:14 AM
8 points

11 votes

Overall karma indicates overall quality.

5 comments11 min readLW link

LessWrong’s pre­dic­tion on apoc­a­lypse due to AGI (Aug 2022)

LetUsTalkAug 29, 2022, 6:46 PM
7 points

8 votes

Overall karma indicates overall quality.

13 comments1 min readLW link

[Question] How can I rec­on­cile the two most likely re­quire­ments for hu­man­i­ties near-term sur­vival.

Erlja Jkdf.Aug 29, 2022, 6:46 PM
1 point

1 vote

Overall karma indicates overall quality.

6 comments1 min readLW link

How likely is de­cep­tive al­ign­ment?

evhubAug 30, 2022, 7:34 PM
72 points

28 votes

Overall karma indicates overall quality.

21 comments60 min readLW link

In­ner Align­ment via Superpowers

Aug 30, 2022, 8:01 PM
37 points

16 votes

Overall karma indicates overall quality.

13 comments4 min readLW link

Three sce­nar­ios of pseudo-al­ign­ment

Eleni AngelouSep 3, 2022, 12:47 PM
9 points

8 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

New 80,000 Hours prob­lem pro­file on ex­is­ten­tial risks from AI

Benjamin HiltonAug 31, 2022, 5:36 PM
28 points

13 votes

Overall karma indicates overall quality.

7 comments7 min readLW link
(80000hours.org)

Sur­vey of NLP Re­searchers: NLP is con­tribut­ing to AGI progress; ma­jor catas­tro­phe plausible

Sam BowmanAug 31, 2022, 1:39 AM
89 points

50 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

In­fra-Ex­er­cises, Part 1

Sep 1, 2022, 5:06 AM
49 points

18 votes

Overall karma indicates overall quality.

9 comments1 min readLW link

Align­ment is hard. Com­mu­ni­cat­ing that, might be harder

Eleni AngelouSep 1, 2022, 4:57 PM
7 points

7 votes

Overall karma indicates overall quality.

8 comments3 min readLW link

A Sur­vey of Foun­da­tional Meth­ods in In­verse Re­in­force­ment Learning

adamkSep 1, 2022, 6:21 PM
16 points

8 votes

Overall karma indicates overall quality.

0 comments12 min readLW link

AI Safety and Neigh­bor­ing Com­mu­ni­ties: A Quick-Start Guide, as of Sum­mer 2022

Sam BowmanSep 1, 2022, 7:15 PM
74 points

37 votes

Overall karma indicates overall quality.

2 comments7 min readLW link

A Richly In­ter­ac­tive AGI Align­ment Chart

lisperatiSep 2, 2022, 12:44 AM
14 points

6 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Re­place­ment for PONR concept

Daniel KokotajloSep 2, 2022, 12:09 AM
44 points

15 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

AI co­or­di­na­tion needs clear wins

evhubSep 1, 2022, 11:41 PM
134 points

63 votes

Overall karma indicates overall quality.

15 comments2 min readLW link

Simulators

janusSep 2, 2022, 12:45 PM
472 points

220 votes

Overall karma indicates overall quality.

103 comments44 min readLW link
(generative.ink)

Laz­i­ness in AI

Richard HenageSep 2, 2022, 5:04 PM
11 points

4 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Agency en­g­ineer­ing: is AI-al­ign­ment “to hu­man in­tent” enough?

catubcSep 2, 2022, 6:14 PM
9 points

8 votes

Overall karma indicates overall quality.

10 comments6 min readLW link

Sticky goals: a con­crete ex­per­i­ment for un­der­stand­ing de­cep­tive alignment

evhubSep 2, 2022, 9:57 PM
35 points

14 votes

Overall karma indicates overall quality.

13 comments3 min readLW link

[Question] Re­quest for Align­ment Re­search Pro­ject Recommendations

Rauno ArikeSep 3, 2022, 3:29 PM
10 points

5 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

[Question] Re­quest for Align­ment Re­search Pro­ject Recommendations

Rauno ArikeSep 3, 2022, 3:29 PM
10 points

5 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Bugs or Fea­tures?

qbolecSep 3, 2022, 7:04 AM
69 points

38 votes

Overall karma indicates overall quality.

9 comments2 min readLW link

Pri­vate al­ign­ment re­search shar­ing and coordination

porbySep 4, 2022, 12:01 AM
54 points

15 votes

Overall karma indicates overall quality.

10 comments5 min readLW link

AXRP Epi­sode 18 - Con­cept Ex­trap­o­la­tion with Stu­art Armstrong

DanielFilanSep 3, 2022, 11:12 PM
10 points

3 votes

Overall karma indicates overall quality.

1 comment39 min readLW link

[Question] Help me find a good Hackathon sub­ject

Charbel-RaphaëlSep 4, 2022, 8:40 AM
6 points

3 votes

Overall karma indicates overall quality.

18 comments1 min readLW link

How To Know What the AI Knows—An ELK Distillation

Fabien RogerSep 4, 2022, 12:46 AM
5 points

2 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

AI Gover­nance Needs Tech­ni­cal Work

MauSep 5, 2022, 10:28 PM
39 points

12 votes

Overall karma indicates overall quality.

1 comment9 min readLW link

Com­mu­nity Build­ing for Grad­u­ate Stu­dents: A Tar­geted Approach

Neil CrawfordSep 6, 2022, 5:17 PM
6 points

5 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

pro­gram searches

Tamsin LeakeSep 5, 2022, 8:04 PM
21 points

10 votes

Overall karma indicates overall quality.

2 comments2 min readLW link
(carado.moe)

Alex Lawsen On Fore­cast­ing AI Progress

Michaël TrazziSep 6, 2022, 9:32 AM
18 points

9 votes

Overall karma indicates overall quality.

0 comments2 min readLW link
(theinsideview.ai)

It’s (not) how you use it

Eleni AngelouSep 7, 2022, 5:15 PM
8 points

3 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

AI-as­sisted list of ten con­crete al­ign­ment things to do right now

lemonhopeSep 7, 2022, 8:38 AM
8 points

3 votes

Overall karma indicates overall quality.

5 comments4 min readLW link

Progress Re­port 7: mak­ing GPT go hur­rdurr in­stead of brrrrrrr

Nathan Helm-BurgerSep 7, 2022, 3:28 AM
21 points

10 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Is there a list of pro­jects to get started with In­ter­pretabil­ity?

Franziska FischerSep 7, 2022, 4:27 AM
8 points

3 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Un­der­stand­ing and avoid­ing value drift

TurnTroutSep 9, 2022, 4:16 AM
40 points

16 votes

Overall karma indicates overall quality.

9 comments6 min readLW link

Linkpost: Github Copi­lot pro­duc­tivity experiment

Daniel KokotajloSep 8, 2022, 4:41 AM
88 points

51 votes

Overall karma indicates overall quality.

4 comments1 min readLW link
(github.blog)

Thoughts on AGI con­scious­ness /​ sentience

Steven ByrnesSep 8, 2022, 4:40 PM
37 points

19 votes

Overall karma indicates overall quality.

37 comments6 min readLW link

What Should AI Owe To Us? Ac­countable and Aligned AI Sys­tems via Con­trac­tu­al­ist AI Alignment

xuanSep 8, 2022, 3:04 PM
30 points

15 votes

Overall karma indicates overall quality.

15 comments25 min readLW link

A rough idea for solv­ing ELK: An ap­proach for train­ing gen­er­al­ist agents like GATO to make plans and de­scribe them to hu­mans clearly and hon­estly.

Michael SoareverixSep 8, 2022, 3:20 PM
2 points

1 vote

Overall karma indicates overall quality.

2 comments2 min readLW link

Dath Ilan’s Views on Stop­gap Corrigibility

David UdellSep 22, 2022, 4:16 PM
50 points

17 votes

Overall karma indicates overall quality.

17 comments13 min readLW link
(www.glowfic.com)

Most Peo­ple Start With The Same Few Bad Ideas

johnswentworthSep 9, 2022, 12:29 AM
161 points

90 votes

Overall karma indicates overall quality.

30 comments3 min readLW link

Over­sight Leagues: The Train­ing Game as a Feature

Paul BricmanSep 9, 2022, 10:08 AM
20 points

9 votes

Overall karma indicates overall quality.

6 comments10 min readLW link

AI al­ign­ment with hu­mans… but with which hu­mans?

geoffreymillerSep 9, 2022, 6:21 PM
11 points

13 votes

Overall karma indicates overall quality.

33 comments3 min readLW link

Eval­u­a­tions pro­ject @ ARC is hiring a re­searcher and a web­dev/​engineer

Beth BarnesSep 9, 2022, 10:46 PM
94 points

29 votes

Overall karma indicates overall quality.

7 comments10 min readLW link

Swap and Scale

Stephen FowlerSep 9, 2022, 10:41 PM
17 points

8 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Alex­aTM − 20 Billion Pa­ram­e­ter Model With Im­pres­sive Performance

MrThinkSep 9, 2022, 9:46 PM
5 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

[Fun][Link] Align­ment SMBC Comic

Gunnar_ZarnckeSep 9, 2022, 9:38 PM
7 points

4 votes

Overall karma indicates overall quality.

2 comments1 min readLW link
(www.smbc-comics.com)

Path de­pen­dence in ML in­duc­tive biases

Sep 10, 2022, 1:38 AM
43 points

16 votes

Overall karma indicates overall quality.

13 comments10 min readLW link

ethics and an­throp­ics of ho­mo­mor­phi­cally en­crypted computations

Tamsin LeakeSep 9, 2022, 10:49 AM
43 points

24 votes

Overall karma indicates overall quality.

49 comments3 min readLW link
(carado.moe)

Join ASAP! (AI Safety Ac­countabil­ity Pro­gramme) 🚀

CallumMcDougallSep 10, 2022, 11:15 AM
19 points

15 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

AI Safety field-build­ing pro­jects I’d like to see

Orpheus16Sep 11, 2022, 11:43 PM
44 points

32 votes

Overall karma indicates overall quality.

7 comments6 min readLW link

[Question] Why do Peo­ple Think In­tel­li­gence Will be “Easy”?

DragonGodSep 12, 2022, 5:32 PM
15 points

9 votes

Overall karma indicates overall quality.

32 comments2 min readLW link

Black Box In­ves­ti­ga­tion Re­search Hackathon

Sep 12, 2022, 7:20 AM
9 points

5 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

Ar­gu­ment against 20% GDP growth from AI within 10 years [Linkpost]

aogSep 12, 2022, 4:08 AM
58 points

22 votes

Overall karma indicates overall quality.

21 comments5 min readLW link
(twitter.com)

Ide­olog­i­cal In­fer­ence Eng­ines: Mak­ing Deon­tol­ogy Differ­en­tiable*

Paul BricmanSep 12, 2022, 12:00 PM
6 points

6 votes

Overall karma indicates overall quality.

0 comments14 min readLW link

Deep Q-Net­works Explained

Jay BaileySep 13, 2022, 12:01 PM
37 points

19 votes

Overall karma indicates overall quality.

4 comments22 min readLW link

Git Re-Basin: Merg­ing Models mod­ulo Per­mu­ta­tion Sym­me­tries [Linkpost]

aogSep 14, 2022, 8:55 AM
21 points

10 votes

Overall karma indicates overall quality.

0 comments2 min readLW link
(arxiv.org)

Some ideas for epis­tles to the AI ethicists

Charlie SteinerSep 14, 2022, 9:07 AM
19 points

5 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

The prob­lem with the me­dia pre­sen­ta­tion of “be­liev­ing in AI”

Roman LeventovSep 14, 2022, 9:05 PM
3 points

5 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

When is in­tent al­ign­ment suffi­cient or nec­es­sary to re­duce AGI con­flict?

Sep 14, 2022, 7:39 PM
32 points

16 votes

Overall karma indicates overall quality.

0 comments9 min readLW link

When would AGIs en­gage in con­flict?

Sep 14, 2022, 7:38 PM
37 points

17 votes

Overall karma indicates overall quality.

3 comments13 min readLW link

Re­spond­ing to ‘Beyond Hyper­an­thro­po­mor­phism’

ukc10014Sep 14, 2022, 8:37 PM
8 points

5 votes

Overall karma indicates overall quality.

0 comments16 min readLW link

How should Deep­Mind’s Chin­chilla re­vise our AI fore­casts?

Cleo NardoSep 15, 2022, 5:54 PM
34 points

18 votes

Overall karma indicates overall quality.

12 comments13 min readLW link

Ra­tional An­i­ma­tions’ Script Writ­ing Contest

WriterSep 15, 2022, 4:56 PM
22 points

9 votes

Overall karma indicates overall quality.

1 comment3 min readLW link

Rep­re­sen­ta­tional Tethers: Ty­ing AI La­tents To Hu­man Ones

Paul BricmanSep 16, 2022, 2:45 PM
30 points

9 votes

Overall karma indicates overall quality.

0 comments16 min readLW link

[Question] Why are we sure that AI will “want” some­thing?

ShmiSep 16, 2022, 8:35 PM
31 points

17 votes

Overall karma indicates overall quality.

58 comments1 min readLW link

Refine Blog­post Day #3: The short­forms I did write

Alexander Gietelink OldenzielSep 16, 2022, 9:03 PM
23 points

7 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Take­aways from our ro­bust in­jury clas­sifier pro­ject [Red­wood Re­search]

dmzSep 17, 2022, 3:55 AM
135 points

60 votes

Overall karma indicates overall quality.

9 comments6 min readLW link

Refine’s Third Blog Post Day/​Week

adamShimiSep 17, 2022, 5:03 PM
18 points

8 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

There is no royal road to alignment

Eleni AngelouSep 18, 2022, 3:33 AM
4 points

3 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

Prize and fast track to al­ign­ment re­search at ALTER

Vanessa KosoySep 17, 2022, 4:58 PM
65 points

25 votes

Overall karma indicates overall quality.

4 comments3 min readLW link

[Question] Up­dates on FLI’s Value Alig­ment Map?

T431Sep 17, 2022, 10:27 PM
17 points

5 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

[Question] Up­dates on FLI’s Value Alig­ment Map?

T431Sep 17, 2022, 10:27 PM
17 points

5 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Ap­ply for men­tor­ship in AI Safety field-building

Orpheus16Sep 17, 2022, 7:06 PM
9 points

8 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(forum.effectivealtruism.org)

Sparse tri­nary weighted RNNs as a path to bet­ter lan­guage model interpretability

Am8ryllisSep 17, 2022, 7:48 PM
19 points

10 votes

Overall karma indicates overall quality.

13 comments3 min readLW link

Pod­casts on sur­veys, slower AI, AI ar­gu­ments, etc

KatjaGraceSep 18, 2022, 7:30 AM
13 points

4 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(worldspiritsockpuppet.com)

In­ner al­ign­ment: what are we point­ing at?

lemonhopeSep 18, 2022, 11:09 AM
7 points

4 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

The In­ter-Agent Facet of AI Alignment

Michael OesterleSep 18, 2022, 8:39 PM
12 points

4 votes

Overall karma indicates overall quality.

1 comment5 min readLW link

Quintin’s al­ign­ment pa­pers roundup—week 2

Quintin PopeSep 19, 2022, 1:41 PM
60 points

22 votes

Overall karma indicates overall quality.

2 comments10 min readLW link

Safety timelines: How long will it take to solve al­ign­ment?

Sep 19, 2022, 12:53 PM
35 points

18 votes

Overall karma indicates overall quality.

7 comments6 min readLW link
(forum.effectivealtruism.org)

Prize idea: Trans­mit MIRI and Eliezer’s worldviews

eliflandSep 19, 2022, 9:21 PM
45 points

27 votes

Overall karma indicates overall quality.

18 comments2 min readLW link

A noob goes to the SERI MATS presentations

Lowell DenningsSep 19, 2022, 5:35 PM
26 points

17 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

How to make your CPU as fast as a GPU—Ad­vances in Spar­sity w/​ Nir Shavit

the gears to ascensionSep 20, 2022, 3:48 AM
0 points

2 votes

Overall karma indicates overall quality.

0 comments27 min readLW link
(www.youtube.com)

Towards de­con­fus­ing wire­head­ing and re­ward maximization

leogaoSep 21, 2022, 12:36 AM
69 points

22 votes

Overall karma indicates overall quality.

7 comments4 min readLW link

Here Be AGI Dragons

Eris DiscordiaSep 21, 2022, 10:28 PM
−2 points

12 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

An­nounc­ing AISIC 2022 - the AI Safety Is­rael Con­fer­ence, Oc­to­ber 19-20

DavidmanheimSep 21, 2022, 7:32 PM
13 points

5 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

AI Risk In­tro 2: Solv­ing The Problem

Sep 22, 2022, 1:55 PM
13 points

5 votes

Overall karma indicates overall quality.

0 comments27 min readLW link

[Question] AI career

ondragonSep 22, 2022, 3:48 AM
2 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Sha­har Avin On How To Reg­u­late Ad­vanced AI Systems

Michaël TrazziSep 23, 2022, 3:46 PM
31 points

9 votes

Overall karma indicates overall quality.

0 comments4 min readLW link
(theinsideview.ai)

The het­ero­gene­ity of hu­man value types: Im­pli­ca­tions for AI alignment

geoffreymillerSep 23, 2022, 5:03 PM
10 points

10 votes

Overall karma indicates overall quality.

2 comments10 min readLW link

In­tel­li­gence as a Platform

Robert KennedySep 23, 2022, 5:51 AM
10 points

4 votes

Overall karma indicates overall quality.

5 comments3 min readLW link

In­ter­pret­ing Neu­ral Net­works through the Poly­tope Lens

Sep 23, 2022, 5:58 PM
123 points

73 votes

Overall karma indicates overall quality.

26 comments33 min readLW link

Un­der what cir­cum­stances have gov­ern­ments can­cel­led AI-type sys­tems?

David GrossSep 23, 2022, 9:11 PM
7 points

2 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(www.carnegieuktrust.org.uk)

[Question] I’m plan­ning to start cre­at­ing more write-ups sum­ma­riz­ing my thoughts on var­i­ous is­sues, mostly re­lated to AI ex­is­ten­tial safety. What do you want to hear my nu­anced takes on?

David Scott Krueger (formerly: capybaralet)Sep 24, 2022, 12:38 PM
9 points

2 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

[Question] Why Do AI re­searchers Rate the Prob­a­bil­ity of Doom So Low?

AorouSep 24, 2022, 2:33 AM
7 points

9 votes

Overall karma indicates overall quality.

6 comments3 min readLW link

AI coöper­a­tion is more pos­si­ble than you think

423175Sep 24, 2022, 9:26 PM
6 points

11 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

An Un­ex­pected GPT-3 De­ci­sion in a Sim­ple Gam­ble

casualphysicsenjoyerSep 25, 2022, 4:46 PM
8 points

3 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Pri­ori­tiz­ing the Arts in re­sponse to AI automation

CaseySep 25, 2022, 2:25 AM
18 points

14 votes

Overall karma indicates overall quality.

11 comments2 min readLW link

Plan­ning ca­pac­ity and daemons

lemonhopeSep 26, 2022, 12:15 AM
2 points

1 vote

Overall karma indicates overall quality.

0 comments5 min readLW link

Re­call and Re­gur­gi­ta­tion in GPT2

Megan KinnimentOct 3, 2022, 7:35 PM
33 points

13 votes

Overall karma indicates overall quality.

1 comment26 min readLW link

[MLSN #5]: Prize Compilation

Dan HSep 26, 2022, 9:55 PM
14 points

4 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

Loss of Align­ment is not the High-Order Bit for AI Risk

yieldthoughtSep 26, 2022, 9:16 PM
14 points

18 votes

Overall karma indicates overall quality.

20 comments2 min readLW link

In­verse Scal­ing Prize: Round 1 Winners

Sep 26, 2022, 7:57 PM
88 points

49 votes

Overall karma indicates overall quality.

16 comments4 min readLW link
(irmckenzie.co.uk)

[Question] Does the ex­is­tence of shared hu­man val­ues im­ply al­ign­ment is “easy”?

MorpheusSep 26, 2022, 6:01 PM
7 points

4 votes

Overall karma indicates overall quality.

14 comments1 min readLW link

Why we’re not found­ing a hu­man-data-for-al­ign­ment org

Sep 27, 2022, 8:14 PM
80 points

33 votes

Overall karma indicates overall quality.

5 comments29 min readLW link
(forum.effectivealtruism.org)

Be Not Afraid

Alex BeymanSep 27, 2022, 10:04 PM
8 points

11 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

Strange Loops—Self-Refer­ence from Num­ber The­ory to AI

ojorgensenSep 28, 2022, 2:10 PM
9 points

6 votes

Overall karma indicates overall quality.

5 comments18 min readLW link

AI Safety Endgame Stories

Ivan VendrovSep 28, 2022, 4:58 PM
27 points

9 votes

Overall karma indicates overall quality.

11 comments11 min readLW link

Es­ti­mat­ing the Cur­rent and Fu­ture Num­ber of AI Safety Researchers

Stephen McAleeseSep 28, 2022, 9:11 PM
24 points

13 votes

Overall karma indicates overall quality.

11 comments9 min readLW link
(forum.effectivealtruism.org)

Clar­ify­ing the Agent-Like Struc­ture Problem

johnswentworthSep 29, 2022, 9:28 PM
53 points

24 votes

Overall karma indicates overall quality.

14 comments6 min readLW link

Emer­gency learning

Stuart_ArmstrongJan 28, 2017, 10:05 AM
13 points

11 votes

Overall karma indicates overall quality.

10 comments4 min readLW link

EAG DC: Meta-Bot­tle­necks in Prevent­ing AI Doom

Joseph BloomSep 30, 2022, 5:53 PM
5 points

5 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

In­ter­est­ing pa­pers: for­mally ver­ify­ing DNNs

the gears to ascensionSep 30, 2022, 8:49 AM
13 points

8 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

linkpost: loss basin visualization

Nathan Helm-BurgerSep 30, 2022, 3:42 AM
14 points

4 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Four us­ages of “loss” in AI

TurnTroutOct 2, 2022, 12:52 AM
42 points

16 votes

Overall karma indicates overall quality.

18 comments5 min readLW link

An­nounc­ing the AI Safety Nudge Com­pe­ti­tion to Help Beat Procrastination

Marc CarauleanuOct 1, 2022, 1:49 AM
10 points

6 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Google could build a con­scious AI in three months

derek shillerOct 1, 2022, 1:24 PM
9 points

13 votes

Overall karma indicates overall quality.

18 comments1 min readLW link

AGI by 2050 prob­a­bil­ity less than 1%

fuminOct 1, 2022, 7:45 PM
−10 points

13 votes

Overall karma indicates overall quality.

4 comments9 min readLW link
(docs.google.com)

[Question] Do an­thropic con­sid­er­a­tions un­der­cut the evolu­tion an­chor from the Bio An­chors re­port?

Ege ErdilOct 1, 2022, 8:02 PM
20 points

10 votes

Overall karma indicates overall quality.

13 comments2 min readLW link

A re­view of the Bio-An­chors report

jylin04Oct 3, 2022, 10:27 AM
45 points

19 votes

Overall karma indicates overall quality.

4 comments1 min readLW link
(docs.google.com)

Data for IRL: What is needed to learn hu­man val­ues?

j_weOct 3, 2022, 9:23 AM
18 points

8 votes

Overall karma indicates overall quality.

6 comments12 min readLW link

my cur­rent out­look on AI risk mitigation

Tamsin LeakeOct 3, 2022, 8:06 PM
58 points

25 votes

Overall karma indicates overall quality.

4 comments11 min readLW link
(carado.moe)

No free lunch the­o­rem is irrelevant

CatneeOct 4, 2022, 12:21 AM
12 points

10 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

Paper+Sum­mary: OMNIGROK: GROKKING BEYOND ALGORITHMIC DATA

Marius HobbhahnOct 4, 2022, 7:22 AM
44 points

30 votes

Overall karma indicates overall quality.

11 comments1 min readLW link
(arxiv.org)

How are you deal­ing with on­tol­ogy iden­ti­fi­ca­tion?

Erik JennerOct 4, 2022, 11:28 PM
33 points

14 votes

Overall karma indicates overall quality.

10 comments3 min readLW link

Reflec­tion Mechanisms as an Align­ment tar­get: A fol­low-up survey

Oct 5, 2022, 2:03 PM
13 points

4 votes

Overall karma indicates overall quality.

2 comments7 min readLW link

Track­ing Com­pute Stocks and Flows: Case Stud­ies?

CullenOct 5, 2022, 5:57 PM
11 points

8 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Char­i­ta­ble Reads of Anti-AGI-X-Risk Ar­gu­ments, Part 1

sstichOct 5, 2022, 5:03 AM
3 points

8 votes

Overall karma indicates overall quality.

4 comments3 min readLW link

Neu­ral Tan­gent Ker­nel Distillation

Oct 5, 2022, 6:11 PM
68 points

26 votes

Overall karma indicates overall quality.

20 comments8 min readLW link

More Re­cent Progress in the The­ory of Neu­ral Networks

jylin04Oct 6, 2022, 4:57 PM
78 points

36 votes

Overall karma indicates overall quality.

6 comments4 min readLW link

Analysing a 2036 Takeover Scenario

ukc10014Oct 6, 2022, 8:48 PM
8 points

3 votes

Overall karma indicates overall quality.

2 comments27 min readLW link

Warn­ing Shots Prob­a­bly Wouldn’t Change The Pic­ture Much

So8resOct 6, 2022, 5:15 AM
111 points

69 votes

Overall karma indicates overall quality.

40 comments2 min readLW link

Align­ment Might Never Be Solved, By Hu­mans or AI

intersticeOct 7, 2022, 4:14 PM
30 points

18 votes

Overall karma indicates overall quality.

6 comments3 min readLW link

linkpost: neuro-sym­bolic hy­brid ai

Nathan Helm-BurgerOct 6, 2022, 9:52 PM
16 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(youtu.be)

Poly­se­man­tic­ity and Ca­pac­ity in Neu­ral Networks

Oct 7, 2022, 5:51 PM
78 points

44 votes

Overall karma indicates overall quality.

9 comments3 min readLW link

[Question] De­liber­ate prac­tice for re­search?

Alex_AltairOct 8, 2022, 3:45 AM
16 points

5 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

[Question] How many GPUs does NVIDIA make?

leogaoOct 8, 2022, 5:54 PM
27 points

10 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

SERI MATS Pro­gram—Win­ter 2022 Cohort

Oct 8, 2022, 7:09 PM
71 points

20 votes

Overall karma indicates overall quality.

12 comments4 min readLW link

[Question] Toy al­ign­ment prob­lem: So­cial Ne­work KPI design

qbolecOct 8, 2022, 10:14 PM
7 points

4 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

My ten­ta­tive in­ter­pretabil­ity re­search agenda—topol­ogy match­ing.

Maxwell ClarkeOct 8, 2022, 10:14 PM
10 points

3 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

[Question] AI Risk Micro­dy­nam­ics Survey

FroolowOct 9, 2022, 8:04 PM
3 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Pos­si­ble miracles

Oct 9, 2022, 6:17 PM
60 points

39 votes

Overall karma indicates overall quality.

33 comments8 min readLW link

The Le­bowski The­o­rem — Char­i­ta­ble Reads of Anti-AGI-X-Risk Ar­gu­ments, Part 2

sstichOct 8, 2022, 10:39 PM
1 point

5 votes

Overall karma indicates overall quality.

10 comments7 min readLW link

Embed­ding AI into AR goggles

aixarOct 9, 2022, 8:08 PM
−12 points

5 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Cat­a­logu­ing Pri­ors in The­ory and Practice

Paul BricmanOct 13, 2022, 12:36 PM
13 points

5 votes

Overall karma indicates overall quality.

8 comments7 min readLW link

Re­sults from the lan­guage model hackathon

Esben KranOct 10, 2022, 8:29 AM
21 points

13 votes

Overall karma indicates overall quality.

1 comment4 min readLW link

Don’t ex­pect AGI any­time soon

cveresOct 10, 2022, 10:38 PM
−14 points

7 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Disen­tan­gling in­ner al­ign­ment failures

Erik JennerOct 10, 2022, 6:50 PM
14 points

8 votes

Overall karma indicates overall quality.

5 comments4 min readLW link

Anony­mous ad­vice: If you want to re­duce AI risk, should you take roles that ad­vance AI ca­pa­bil­ities?

Benjamin HiltonOct 11, 2022, 2:16 PM
54 points

21 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

Pret­tified AI Safety Game Cards

abramdemskiOct 11, 2022, 7:35 PM
46 points

15 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Power-Seek­ing AI and Ex­is­ten­tial Risk

Antonio FrancaOct 11, 2022, 10:50 PM
5 points

6 votes

Overall karma indicates overall quality.

0 comments9 min readLW link

Align­ment 201 curriculum

Richard_NgoOct 12, 2022, 6:03 PM
102 points

40 votes

Overall karma indicates overall quality.

3 comments1 min readLW link
(www.agisafetyfundamentals.com)

Ar­ti­cle Re­view: Google’s AlphaTensor

Robert_AIZIOct 12, 2022, 6:04 PM
8 points

5 votes

Overall karma indicates overall quality.

2 comments10 min readLW link

[Question] Pre­vi­ous Work on Re­cre­at­ing Neu­ral Net­work In­put from In­ter­me­di­ate Layer Activations

bglassOct 12, 2022, 7:28 PM
1 point

2 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

You are bet­ter at math (and al­ign­ment) than you think

trevorOct 13, 2022, 3:07 AM
37 points

16 votes

Overall karma indicates overall quality.

7 comments22 min readLW link
(www.lesswrong.com)

Coun­ter­ar­gu­ments to the ba­sic AI x-risk case

KatjaGraceOct 14, 2022, 1:00 PM
336 points

161 votes

Overall karma indicates overall quality.

122 comments34 min readLW link
(aiimpacts.org)

Another prob­lem with AI con­fine­ment: or­di­nary CPUs can work as ra­dio transmitters

RomanSOct 14, 2022, 8:28 AM
34 points

21 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(news.softpedia.com)

“AGI soon, but Nar­row works Bet­ter”

AnthonyRepettoOct 14, 2022, 9:35 PM
1 point

6 votes

Overall karma indicates overall quality.

9 comments2 min readLW link

[Question] Best re­source to go from “typ­i­cal smart tech-savvy per­son” to “per­son who gets AGI risk ur­gency”?

LironOct 15, 2022, 10:26 PM
14 points

6 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

[Question] Ques­tions about the al­ign­ment problem

GG10Oct 17, 2022, 1:42 AM
−5 points

10 votes

Overall karma indicates overall quality.

13 comments3 min readLW link

[Question] Creat­ing su­per­in­tel­li­gence with­out AGI

AntbOct 17, 2022, 7:01 PM
7 points

5 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

AI Safety Ideas: An Open AI Safety Re­search Platform

Esben KranOct 17, 2022, 5:01 PM
24 points

10 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Is GPT-N bounded by hu­man ca­pac­i­ties? No.

Cleo NardoOct 17, 2022, 11:26 PM
5 points

4 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

A prag­matic met­ric for Ar­tifi­cial Gen­eral Intelligence

lorepieriOct 17, 2022, 10:07 PM
6 points

5 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(lorenzopieri.com)

Is GitHub Copi­lot in le­gal trou­ble?

tcelferactOct 18, 2022, 4:19 PM
34 points

16 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Me­tac­u­lus is build­ing a team ded­i­cated to AI forecasting

ChristianWilliamsOct 18, 2022, 4:08 PM
3 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

[Question] Where can I find solu­tion to the ex­er­cises of AGISF?

Charbel-RaphaëlOct 18, 2022, 2:11 PM
7 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

A con­ver­sa­tion about Katja’s coun­ter­ar­gu­ments to AI risk

Oct 18, 2022, 6:40 PM
43 points

20 votes

Overall karma indicates overall quality.

9 comments33 min readLW link

An Ex­tremely Opinionated An­no­tated List of My Favourite Mechanis­tic In­ter­pretabil­ity Papers

Neel NandaOct 18, 2022, 9:08 PM
66 points

25 votes

Overall karma indicates overall quality.

5 comments12 min readLW link
(www.neelnanda.io)

Distil­led Rep­re­sen­ta­tions Re­search Agenda

Oct 18, 2022, 8:59 PM
15 points

7 votes

Overall karma indicates overall quality.

2 comments8 min readLW link

[Question] Should we push for re­quiring AI train­ing data to be li­censed?

ChristianKlOct 19, 2022, 5:49 PM
38 points

15 votes

Overall karma indicates overall quality.

32 comments1 min readLW link

Hacker-AI and Digi­tal Ghosts – Pre-AGI

Erland WittkotterOct 19, 2022, 3:33 PM
9 points

8 votes

Overall karma indicates overall quality.

7 comments8 min readLW link

Scal­ing Laws for Re­ward Model Overoptimization

Oct 20, 2022, 12:20 AM
86 points

33 votes

Overall karma indicates overall quality.

11 comments1 min readLW link
(arxiv.org)

The her­i­ta­bil­ity of hu­man val­ues: A be­hav­ior ge­netic cri­tique of Shard Theory

geoffreymillerOct 20, 2022, 3:51 PM
63 points

40 votes

Overall karma indicates overall quality.

58 comments21 min readLW link

aisafety.com­mu­nity—A liv­ing doc­u­ment of AI safety communities

Oct 28, 2022, 5:50 PM
52 points

27 votes

Overall karma indicates overall quality.

22 comments1 min readLW link

Tra­jec­to­ries to 2036

ukc10014Oct 20, 2022, 8:23 PM
1 point

2 votes

Overall karma indicates overall quality.

1 comment14 min readLW link

In­tel­li­gent be­havi­our across sys­tems, scales and substrates

Nora_AmmannOct 21, 2022, 5:09 PM
11 points

7 votes

Overall karma indicates overall quality.

0 comments10 min readLW link

A frame­work and open ques­tions for game the­o­retic shard modeling

Garrett BakerOct 21, 2022, 9:40 PM
11 points

4 votes

Overall karma indicates overall quality.

4 comments4 min readLW link

[Question] The Last Year - is there an ex­ist­ing novel about the last year be­fore AI doom?

Luca PetrolatiOct 22, 2022, 8:44 PM
4 points

3 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Em­pow­er­ment is (al­most) All We Need

jacob_cannellOct 23, 2022, 9:48 PM
36 points

23 votes

Overall karma indicates overall quality.

43 comments17 min readLW link

The op­ti­mal timing of spend­ing on AGI safety work; why we should prob­a­bly be spend­ing more now

Tristan CookOct 24, 2022, 5:42 PM
62 points

37 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

A Bare­bones Guide to Mechanis­tic In­ter­pretabil­ity Prerequisites

Neel NandaOct 24, 2022, 8:45 PM
62 points

34 votes

Overall karma indicates overall quality.

8 comments3 min readLW link
(neelnanda.io)

Con­sider try­ing Vivek Heb­bar’s al­ign­ment exercises

Orpheus16Oct 24, 2022, 7:46 PM
36 points

21 votes

Overall karma indicates overall quality.

1 comment4 min readLW link

POWER­play: An open-source toolchain to study AI power-seeking

Edouard HarrisOct 24, 2022, 8:03 PM
22 points

13 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(github.com)

What does it take to defend the world against out-of-con­trol AGIs?

Steven ByrnesOct 25, 2022, 2:47 PM
141 points

62 votes

Overall karma indicates overall quality.

31 comments30 min readLW link

Mechanism De­sign for AI Safety—Read­ing Group Curriculum

Rubi J. HudsonOct 25, 2022, 3:54 AM
7 points

4 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Maps and Blueprint; the Two Sides of the Align­ment Equation

Nora_AmmannOct 25, 2022, 4:29 PM
21 points

9 votes

Overall karma indicates overall quality.

1 comment5 min readLW link

A Walk­through of A Math­e­mat­i­cal Frame­work for Trans­former Circuits

Neel NandaOct 25, 2022, 8:24 PM
49 points

23 votes

Overall karma indicates overall quality.

5 comments1 min readLW link
(www.youtube.com)

Paper: In-con­text Re­in­force­ment Learn­ing with Al­gorithm Distil­la­tion [Deep­mind]

LawrenceCOct 26, 2022, 6:45 PM
28 points

16 votes

Overall karma indicates overall quality.

5 comments1 min readLW link
(arxiv.org)

Ap­ply to the Red­wood Re­search Mechanis­tic In­ter­pretabil­ity Ex­per­i­ment (REMIX), a re­search pro­gram in Berkeley

Oct 27, 2022, 1:32 AM
134 points

51 votes

Overall karma indicates overall quality.

14 comments12 min readLW link

You won’t solve al­ign­ment with­out agent foundations

Mikhail SaminNov 6, 2022, 8:07 AM
21 points

14 votes

Overall karma indicates overall quality.

3 comments8 min readLW link

AI & ML Safety Up­dates W43

Oct 28, 2022, 1:18 PM
9 points

7 votes

Overall karma indicates overall quality.

3 comments3 min readLW link

Prizes for ML Safety Bench­mark Ideas

joshcOct 28, 2022, 2:51 AM
36 points

10 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Me (Steve Byrnes) on the “Brain In­spired” podcast

Steven ByrnesOct 30, 2022, 7:15 PM
26 points

12 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(braininspired.co)

Join the in­ter­pretabil­ity re­search hackathon

Esben KranOct 28, 2022, 4:26 PM
15 points

10 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

In­stru­men­tal ig­nor­ing AI, Dumb but not use­less.

Donald HobsonOct 30, 2022, 4:55 PM
7 points

5 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

«Boundaries», Part 3a: Defin­ing bound­aries as di­rected Markov blankets

Andrew_CritchOct 30, 2022, 6:31 AM
58 points

17 votes

Overall karma indicates overall quality.

13 comments15 min readLW link

[Book] In­ter­pretable Ma­chine Learn­ing: A Guide for Mak­ing Black Box Models Explainable

Esben KranOct 31, 2022, 11:38 AM
19 points

7 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(christophm.github.io)

“Cars and Elephants”: a hand­wavy ar­gu­ment/​anal­ogy against mechanis­tic interpretability

David Scott Krueger (formerly: capybaralet)Oct 31, 2022, 9:26 PM
47 points

22 votes

Overall karma indicates overall quality.

25 comments2 min readLW link

ML Safety Schol­ars Sum­mer 2022 Retrospective

TW123Nov 1, 2022, 3:09 AM
29 points

14 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

What sorts of sys­tems can be de­cep­tive?

Andrei AlexandruOct 31, 2022, 10:00 PM
14 points

6 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [~monthly thread]

Robert MilesNov 1, 2022, 11:23 PM
67 points

30 votes

Overall karma indicates overall quality.

100 comments2 min readLW link

Real-Time Re­search Record­ing: Can a Trans­former Re-Derive Po­si­tional Info?

Neel NandaNov 1, 2022, 11:56 PM
68 points

29 votes

Overall karma indicates overall quality.

14 comments1 min readLW link
(youtu.be)

On the cor­re­spon­dence be­tween AI-mis­al­ign­ment and cog­ni­tive dis­so­nance us­ing a be­hav­ioral eco­nomics model

Stijn BruersNov 1, 2022, 5:39 PM
4 points

3 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

WFW?: Op­por­tu­nity and The­ory of Impact

DavidCorfieldNov 2, 2022, 1:24 AM
1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

AI Safety Needs Great Product Builders

goodgravyNov 2, 2022, 11:33 AM
14 points

7 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

A Mys­tery About High Di­men­sional Con­cept Encoding

Fabien RogerNov 3, 2022, 5:05 PM
46 points

25 votes

Overall karma indicates overall quality.

13 comments7 min readLW link

Ethan Ca­ballero on Bro­ken Neu­ral Scal­ing Laws, De­cep­tion, and Re­cur­sive Self Improvement

Nov 4, 2022, 6:09 PM
14 points

13 votes

Overall karma indicates overall quality.

11 comments5 min readLW link
(theinsideview.ai)

Can we pre­dict the abil­ities of fu­ture AI? MLAISU W44

Nov 4, 2022, 3:19 PM
10 points

6 votes

Overall karma indicates overall quality.

0 comments3 min readLW link
(newsletter.apartresearch.com)

My sum­mary of “Prag­matic AI Safety”

Eleni AngelouNov 5, 2022, 12:54 PM
2 points

4 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Re­view of the Challenge

SD MarlowNov 5, 2022, 6:38 AM
−14 points

12 votes

Overall karma indicates overall quality.

5 comments2 min readLW link

How to store hu­man val­ues on a computer

Oliver SiegelNov 5, 2022, 7:17 PM
−12 points

7 votes

Overall karma indicates overall quality.

17 comments1 min readLW link

Should AI fo­cus on prob­lem-solv­ing or strate­gic plan­ning? Why not both?

Oliver SiegelNov 5, 2022, 7:17 PM
−12 points

5 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

In­stead of tech­ni­cal re­search, more peo­ple should fo­cus on buy­ing time

Nov 5, 2022, 8:43 PM
80 points

79 votes

Overall karma indicates overall quality.

51 comments14 min readLW link

[Question] Is there some kind of back­log or de­lay for data cen­ter AI?

trevorNov 7, 2022, 8:18 AM
5 points

1 vote

Overall karma indicates overall quality.

2 comments1 min readLW link

A Walk­through of In­ter­pretabil­ity in the Wild (w/​ au­thors Kevin Wang, Arthur Conmy & Alexan­dre Variengien)

Neel NandaNov 7, 2022, 10:39 PM
29 points

13 votes

Overall karma indicates overall quality.

15 comments3 min readLW link
(youtu.be)

How could we know that an AGI sys­tem will have good con­se­quences?

So8resNov 7, 2022, 10:42 PM
86 points

44 votes

Overall karma indicates overall quality.

24 comments5 min readLW link

Peo­ple care about each other even though they have im­perfect mo­ti­va­tional poin­t­ers?

TurnTroutNov 8, 2022, 6:15 PM
32 points

16 votes

Overall karma indicates overall quality.

25 comments7 min readLW link

[ASoT] Thoughts on GPT-N

Ulisse MiniNov 8, 2022, 7:14 AM
8 points

5 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

In­verse scal­ing can be­come U-shaped

Edouard HarrisNov 8, 2022, 7:04 PM
27 points

15 votes

Overall karma indicates overall quality.

15 comments1 min readLW link
(arxiv.org)

Counterfactability

Scott GarrabrantNov 7, 2022, 5:39 AM
36 points

11 votes

Overall karma indicates overall quality.

4 comments11 min readLW link

Take­aways from a sur­vey on AI al­ign­ment resources

DanielFilanNov 5, 2022, 11:40 PM
73 points

40 votes

Overall karma indicates overall quality.

9 comments6 min readLW link
(danielfilan.com)

[ASoT] In­stru­men­tal con­ver­gence is useful

Ulisse MiniNov 9, 2022, 8:20 PM
5 points

3 votes

Overall karma indicates overall quality.

9 comments1 min readLW link

Me­satrans­la­tion and Metatranslation

jdpNov 9, 2022, 6:46 PM
23 points

7 votes

Overall karma indicates overall quality.

4 comments11 min readLW link

The In­ter­pretabil­ity Playground

Esben KranNov 10, 2022, 5:15 PM
8 points

4 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(alignmentjam.com)

Align­ment al­lows “non­ro­bust” de­ci­sion-in­fluences and doesn’t re­quire ro­bust grading

TurnTroutNov 29, 2022, 6:23 AM
55 points

15 votes

Overall karma indicates overall quality.

27 comments15 min readLW link

[Question] What are some low-cost out­side-the-box ways to do/​fund al­ign­ment re­search?

trevorNov 11, 2022, 5:25 AM
10 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

In­stru­men­tal con­ver­gence is what makes gen­eral in­tel­li­gence possible

tailcalledNov 11, 2022, 4:38 PM
72 points

33 votes

Overall karma indicates overall quality.

11 comments4 min readLW link

A short cri­tique of Vanessa Kosoy’s PreDCA

Martín SotoNov 13, 2022, 4:00 PM
25 points

8 votes

Overall karma indicates overall quality.

8 comments4 min readLW link

[Question] Why don’t we have self driv­ing cars yet?

Linda LinseforsNov 14, 2022, 12:19 PM
21 points

11 votes

Overall karma indicates overall quality.

16 comments1 min readLW link

Win­ners of the AI Safety Nudge Competition

Marc CarauleanuNov 15, 2022, 1:06 AM
4 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

[Question] Will nan­otech/​biotech be what leads to AI doom?

tailcalledNov 15, 2022, 5:38 PM
4 points

8 votes

Overall karma indicates overall quality.

8 comments2 min readLW link

[Question] What is our cur­rent best in­fo­haz­ard policy for AGI (safety) re­search?

Roman LeventovNov 15, 2022, 10:33 PM
12 points

7 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Disagree­ment with bio an­chors that lead to shorter timelines

Marius HobbhahnNov 16, 2022, 2:40 PM
72 points

36 votes

Overall karma indicates overall quality.

16 comments7 min readLW link

Cur­rent themes in mechanis­tic in­ter­pretabil­ity research

Nov 16, 2022, 2:14 PM
82 points

41 votes

Overall karma indicates overall quality.

3 comments12 min readLW link

[Question] Is there some rea­son LLMs haven’t seen broader use?

tailcalledNov 16, 2022, 8:04 PM
25 points

8 votes

Overall karma indicates overall quality.

27 comments1 min readLW link

AI Fore­cast­ing Re­search Ideas

JsevillamolNov 17, 2022, 5:37 PM
21 points

8 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Re­sults from the in­ter­pretabil­ity hackathon

Nov 17, 2022, 2:51 PM
80 points

39 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

Don’t de­sign agents which ex­ploit ad­ver­sar­ial inputs

Nov 18, 2022, 1:48 AM
60 points

24 votes

Overall karma indicates overall quality.

61 comments12 min readLW link

AI Ethics != Ai Safety

DentinNov 18, 2022, 3:02 AM
2 points

9 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Up­date to Mys­ter­ies of mode col­lapse: text-davinci-002 not RLHF

janusNov 19, 2022, 11:51 PM
69 points

34 votes

Overall karma indicates overall quality.

8 comments2 min readLW link

Limits to the Con­trol­la­bil­ity of AGI

Nov 20, 2022, 7:18 PM
10 points

12 votes

Overall karma indicates overall quality.

2 comments9 min readLW link

[ASoT] Reflec­tivity in Nar­row AI

Ulisse MiniNov 21, 2022, 12:51 AM
6 points

2 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Here’s the exit.

ValentineNov 21, 2022, 6:07 PM
85 points

200 votes

Overall karma indicates overall quality.

138 comments10 min readLW link

Clar­ify­ing wire­head­ing terminology

leogaoNov 24, 2022, 4:53 AM
53 points

25 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

A Walk­through of In-Con­text Learn­ing and In­duc­tion Heads (w/​ Charles Frye) Part 1 of 2

Neel NandaNov 22, 2022, 5:12 PM
20 points

7 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(www.youtube.com)

An­nounc­ing AI safety Men­tors and Mentees

Marius HobbhahnNov 23, 2022, 3:21 PM
54 points

25 votes

Overall karma indicates overall quality.

7 comments10 min readLW link

My take on Ja­cob Can­nell’s take on AGI safety

Steven ByrnesNov 28, 2022, 2:01 PM
61 points

17 votes

Overall karma indicates overall quality.

13 comments30 min readLW link

Don’t al­ign agents to eval­u­a­tions of plans

TurnTroutNov 26, 2022, 9:16 PM
37 points

15 votes

Overall karma indicates overall quality.

46 comments18 min readLW link

[Question] Dumb and ill-posed ques­tion: Is con­cep­tual re­search like this MIRI pa­per on the shut­down prob­lem/​Cor­rigi­bil­ity “real”

joraineNov 24, 2022, 5:08 AM
25 points

14 votes

Overall karma indicates overall quality.

11 comments1 min readLW link

Refin­ing the Sharp Left Turn threat model, part 2: ap­ply­ing al­ign­ment techniques

Nov 25, 2022, 2:36 PM
36 points

13 votes

Overall karma indicates overall quality.

4 comments6 min readLW link
(vkrakovna.wordpress.com)

Pod­cast: Shoshan­nah Tekofsky on skil­ling up in AI safety, vis­it­ing Berkeley, and de­vel­op­ing novel re­search ideas

Orpheus16Nov 25, 2022, 8:47 PM
37 points

16 votes

Overall karma indicates overall quality.

2 comments9 min readLW link

Mechanis­tic anomaly de­tec­tion and ELK

paulfchristianoNov 25, 2022, 6:50 PM
121 points

47 votes

Overall karma indicates overall quality.

17 comments21 min readLW link
(ai-alignment.com)

The First Filter

Nov 26, 2022, 7:37 PM
55 points

35 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Dis­cussing how to al­ign Trans­for­ma­tive AI if it’s de­vel­oped very soon

Nov 28, 2022, 4:17 PM
36 points

16 votes

Overall karma indicates overall quality.

2 comments30 min readLW link

On the Di­plo­macy AI

ZviNov 28, 2022, 1:20 PM
119 points

60 votes

Overall karma indicates overall quality.

29 comments11 min readLW link
(thezvi.wordpress.com)

Why Would AI “Aim” To Defeat Hu­man­ity?

HoldenKarnofskyNov 29, 2022, 7:30 PM
68 points

26 votes

Overall karma indicates overall quality.

9 comments33 min readLW link
(www.cold-takes.com)

Dist­in­guish­ing test from training

So8resNov 29, 2022, 9:41 PM
65 points

41 votes

Overall karma indicates overall quality.

10 comments6 min readLW link

[Question] Do any of the AI Risk eval­u­a­tions fo­cus on hu­mans as the risk?

jmhNov 30, 2022, 3:09 AM
10 points

3 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

Ap­ply to at­tend win­ter AI al­ign­ment work­shops (Dec 28-30 & Jan 3-5) near Berkeley

Dec 1, 2022, 8:46 PM
25 points

12 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

The­o­ries of im­pact for Science of Deep Learning

Marius HobbhahnDec 1, 2022, 2:39 PM
16 points

8 votes

Overall karma indicates overall quality.

0 comments11 min readLW link

In­ner and outer al­ign­ment de­com­pose one hard prob­lem into two ex­tremely hard problems

TurnTroutDec 2, 2022, 2:43 AM
96 points

37 votes

Overall karma indicates overall quality.

18 comments53 min readLW link

The Plan − 2022 Update

johnswentworthDec 1, 2022, 8:43 PM
211 points

123 votes

Overall karma indicates overall quality.

33 comments8 min readLW link

Find­ing gliders in the game of life

paulfchristianoDec 1, 2022, 8:40 PM
91 points

45 votes

Overall karma indicates overall quality.

7 comments16 min readLW link
(ai-alignment.com)

Take 1: We’re not go­ing to re­verse-en­g­ineer the AI.

Charlie SteinerDec 1, 2022, 10:41 PM
38 points

15 votes

Overall karma indicates overall quality.

4 comments4 min readLW link

Un­der­stand­ing goals in com­plex systems

Johannes C. MayerDec 1, 2022, 11:49 PM
9 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(www.youtube.com)

Mas­ter­ing Strat­ego (Deep­mind)

svemirskiDec 2, 2022, 2:21 AM
6 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(www.deepmind.com)

Jailbreak­ing ChatGPT on Re­lease Day

ZviDec 2, 2022, 1:10 PM
237 points

134 votes

Overall karma indicates overall quality.

74 comments6 min readLW link
(thezvi.wordpress.com)

[Question] Did I just catch GPTchat do­ing some­thing un­ex­pect­edly in­sight­ful?

trevorDec 2, 2022, 7:48 AM
9 points

7 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Take 2: Build­ing tools to help build FAI is a le­gi­t­i­mate strat­egy, but it’s dual-use.

Charlie SteinerDec 3, 2022, 12:54 AM
16 points

6 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

Causal scrub­bing: re­sults on in­duc­tion heads

Dec 3, 2022, 12:59 AM
32 points

10 votes

Overall karma indicates overall quality.

0 comments17 min readLW link

Log­i­cal in­duc­tion for soft­ware engineers

Alex FlintDec 3, 2022, 7:55 PM
124 points

46 votes

Overall karma indicates overall quality.

2 comments27 min readLW link

ChatGPT is sur­pris­ingly and un­can­ningly good at pre­tend­ing to be sentient

Victor NovikovDec 3, 2022, 2:47 PM
17 points

7 votes

Overall karma indicates overall quality.

11 comments18 min readLW link

Monthly Shorts 11/​22

CelerDec 5, 2022, 7:30 AM
8 points

2 votes

Overall karma indicates overall quality.

0 comments3 min readLW link
(keller.substack.com)

Take 4: One prob­lem with nat­u­ral ab­strac­tions is there’s too many of them.

Charlie SteinerDec 5, 2022, 10:39 AM
34 points

14 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

The No Free Lunch the­o­rem for dummies

Steven ByrnesDec 5, 2022, 9:46 PM
28 points

11 votes

Overall karma indicates overall quality.

16 comments3 min readLW link

[Link] Why I’m op­ti­mistic about OpenAI’s al­ign­ment approach

janleikeDec 5, 2022, 10:51 PM
93 points

46 votes

Overall karma indicates overall quality.

13 comments1 min readLW link
(aligned.substack.com)

Up­dat­ing my AI timelines

Matthew BarnettDec 5, 2022, 8:46 PM
134 points

73 votes

Overall karma indicates overall quality.

40 comments2 min readLW link

ChatGPT and Ide­olog­i­cal Tur­ing Test

ViliamDec 5, 2022, 9:45 PM
41 points

16 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Ver­ifi­ca­tion Is Not Easier Than Gen­er­a­tion In General

johnswentworthDec 6, 2022, 5:20 AM
56 points

32 votes

Overall karma indicates overall quality.

23 comments1 min readLW link

[Question] What are the ma­jor un­der­ly­ing di­vi­sions in AI safety?

Chris LeongDec 6, 2022, 3:28 AM
5 points

3 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Take 5: Another prob­lem for nat­u­ral ab­strac­tions is laz­i­ness.

Charlie SteinerDec 6, 2022, 7:00 AM
30 points

8 votes

Overall karma indicates overall quality.

4 comments3 min readLW link

Mesa-Op­ti­miz­ers via Grokking

orthonormalDec 6, 2022, 8:05 PM
35 points

17 votes

Overall karma indicates overall quality.

4 comments6 min readLW link

[Question] How do finite fac­tored sets com­pare with phase space?

Alex_AltairDec 6, 2022, 8:05 PM
14 points

4 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Us­ing GPT-Eliezer against ChatGPT Jailbreaking

Dec 6, 2022, 7:54 PM
159 points

125 votes

Overall karma indicates overall quality.

77 comments9 min readLW link

Take 6: CAIS is ac­tu­ally Or­wellian.

Charlie SteinerDec 7, 2022, 1:50 PM
14 points

6 votes

Overall karma indicates overall quality.

5 comments2 min readLW link

[Question] Look­ing for ideas of pub­lic as­sets (stocks, funds, ETFs) that I can in­vest in to have a chance at prof­it­ing from the mass adop­tion and com­mer­cial­iza­tion of AI technology

AnnapurnaDec 7, 2022, 10:35 PM
15 points

13 votes

Overall karma indicates overall quality.

9 comments1 min readLW link

You should con­sider launch­ing an AI startup

joshcDec 8, 2022, 12:28 AM
5 points

19 votes

Overall karma indicates overall quality.

16 comments4 min readLW link

Ma­chine Learn­ing Consent

jefftkDec 8, 2022, 3:50 AM
38 points

14 votes

Overall karma indicates overall quality.

14 comments3 min readLW link
(www.jefftk.com)

Rele­vant to nat­u­ral ab­strac­tions: Eu­clidean Sym­me­try Equiv­ar­i­ant Ma­chine Learn­ing—Overview, Ap­pli­ca­tions, and Open Questions

the gears to ascensionDec 8, 2022, 6:01 PM
7 points

4 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(youtu.be)

AI Safety Seems Hard to Measure

HoldenKarnofskyDec 8, 2022, 7:50 PM
68 points

32 votes

Overall karma indicates overall quality.

5 comments14 min readLW link
(www.cold-takes.com)

[Question] How is the “sharp left turn defined”?

Chris_LeongDec 9, 2022, 12:04 AM
13 points

7 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Linkpost for a gen­er­al­ist al­gorith­mic learner: ca­pa­ble of car­ry­ing out sort­ing, short­est paths, string match­ing, con­vex hull find­ing in one network

lovetheusersDec 9, 2022, 12:02 AM
7 points

3 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(twitter.com)

Timelines ARE rele­vant to al­ign­ment re­search (timelines 2 of ?)

Nathan Helm-BurgerAug 24, 2022, 12:19 AM
11 points

7 votes

Overall karma indicates overall quality.

5 comments6 min readLW link

Pro­saic mis­al­ign­ment from the Solomonoff Predictor

Cleo NardoDec 9, 2022, 5:53 PM
11 points

8 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

[Question] Does a LLM have a util­ity func­tion?

DagonDec 9, 2022, 5:19 PM
16 points

5 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

ML Safety at NeurIPS & Paradig­matic AI Safety? MLAISU W49

Dec 9, 2022, 10:38 AM
14 points

6 votes

Overall karma indicates overall quality.

0 comments4 min readLW link
(newsletter.apartresearch.com)

Take 8: Queer the in­ner/​outer al­ign­ment di­chotomy.

Charlie SteinerDec 9, 2022, 5:46 PM
26 points

14 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

My thoughts on OpenAI’s Align­ment plan

Donald HobsonDec 10, 2022, 10:35 AM
20 points

10 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

[ASoT] Nat­u­ral ab­strac­tions and AlphaZero

Ulisse MiniDec 10, 2022, 5:53 PM
31 points

17 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(arxiv.org)

[Question] How promis­ing are le­gal av­enues to re­strict AI train­ing data?

thehalliardDec 10, 2022, 4:31 PM
9 points

7 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Con­sider us­ing re­versible au­tomata for al­ign­ment research

Alex_AltairDec 11, 2022, 1:00 AM
81 points

38 votes

Overall karma indicates overall quality.

29 comments2 min readLW link

[fic­tion] Our Fi­nal Hour

Mati_RoyDec 11, 2022, 5:49 AM
16 points

22 votes

Overall karma indicates overall quality.

5 comments3 min readLW link

A crisis for on­line com­mu­ni­ca­tion: bots and bot users will over­run the In­ter­net?

Mitchell_PorterDec 11, 2022, 9:11 PM
23 points

12 votes

Overall karma indicates overall quality.

11 comments1 min readLW link

Refram­ing in­ner alignment

davidadDec 11, 2022, 1:53 PM
47 points

14 votes

Overall karma indicates overall quality.

13 comments4 min readLW link

Side-chan­nels: in­put ver­sus output

davidadDec 12, 2022, 12:32 PM
35 points

14 votes

Overall karma indicates overall quality.

9 comments2 min readLW link

Psy­cholog­i­cal Di­sor­ders and Problems

Dec 12, 2022, 6:15 PM
35 points

17 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Prod­ding ChatGPT to solve a ba­sic alge­bra problem

ShmiDec 12, 2022, 4:09 AM
14 points

3 votes

Overall karma indicates overall quality.

6 comments1 min readLW link
(twitter.com)

A brain­teaser for lan­guage models

Adam ScherlisDec 12, 2022, 2:43 AM
46 points

28 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

Take 9: No, RLHF/​IDA/​de­bate doesn’t solve outer al­ign­ment.

Charlie SteinerDec 12, 2022, 11:51 AM
36 points

18 votes

Overall karma indicates overall quality.

14 comments2 min readLW link

12 ca­reer-re­lated ques­tions that may (or may not) be helpful for peo­ple in­ter­ested in al­ign­ment research

Orpheus16Dec 12, 2022, 10:36 PM
18 points

12 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Finite Fac­tored Sets in Pictures

Magdalena WacheDec 11, 2022, 6:49 PM
149 points

72 votes

Overall karma indicates overall quality.

31 comments12 min readLW link

Con­cept ex­trap­o­la­tion for hy­poth­e­sis generation

Dec 12, 2022, 10:09 PM
20 points

12 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

Take 10: Fine-tun­ing with RLHF is aes­thet­i­cally un­satis­fy­ing.

Charlie SteinerDec 13, 2022, 7:04 AM
30 points

10 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

AI al­ign­ment is dis­tinct from its near-term applications

paulfchristianoDec 13, 2022, 7:10 AM
233 points

98 votes

Overall karma indicates overall quality.

5 comments2 min readLW link
(ai-alignment.com)

Okay, I feel it now

g1Dec 13, 2022, 11:01 AM
84 points

64 votes

Overall karma indicates overall quality.

14 comments1 min readLW link

What Does It Mean to Align AI With Hu­man Values?

AlgonDec 13, 2022, 4:56 PM
8 points

4 votes

Overall karma indicates overall quality.

3 comments1 min readLW link
(www.quantamagazine.org)

[Question] Is the ChatGPT-simu­lated Linux vir­tual ma­chine real?

KenoubiDec 13, 2022, 3:41 PM
18 points

10 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

[In­terim re­search re­port] Tak­ing fea­tures out of su­per­po­si­tion with sparse autoencoders

Dec 13, 2022, 3:41 PM
80 points

31 votes

Overall karma indicates overall quality.

10 comments22 min readLW link

Ex­is­ten­tial AI Safety is NOT sep­a­rate from near-term applications

scasperDec 13, 2022, 2:47 PM
37 points

25 votes

Overall karma indicates overall quality.

16 comments3 min readLW link

My AGI safety re­search—2022 re­view, ’23 plans

Steven ByrnesDec 14, 2022, 3:15 PM
34 points

16 votes

Overall karma indicates overall quality.

6 comments6 min readLW link

Try­ing to dis­am­biguate differ­ent ques­tions about whether RLHF is “good”

BuckDec 14, 2022, 4:03 AM
92 points

35 votes

Overall karma indicates overall quality.

40 comments7 min readLW link

Pre­dict­ing GPU performance

Dec 14, 2022, 4:27 PM
59 points

27 votes

Overall karma indicates overall quality.

24 comments1 min readLW link
(epochai.org)

[Question] Is the AI timeline too short to have chil­dren?

YorethDec 14, 2022, 6:32 PM
33 points

18 votes

Overall karma indicates overall quality.

20 comments1 min readLW link

«Boundaries», Part 3b: Align­ment prob­lems in terms of bound­aries

Andrew_CritchDec 14, 2022, 10:34 PM
49 points

14 votes

Overall karma indicates overall quality.

2 comments13 min readLW link

[Question] Is Paul Chris­ti­ano still as op­ti­mistic about Ap­proval-Directed Agents as he was in 2018?

Chris_LeongDec 14, 2022, 11:28 PM
8 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Align­ing al­ign­ment with performance

Marv KDec 14, 2022, 10:19 PM
2 points

1 vote

Overall karma indicates overall quality.

0 comments2 min readLW link

AI Ne­o­re­al­ism: a threat model & suc­cess crite­rion for ex­is­ten­tial safety

davidadDec 15, 2022, 1:42 PM
39 points

16 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

The next decades might be wild

Marius HobbhahnDec 15, 2022, 4:10 PM
157 points

82 votes

Overall karma indicates overall quality.

27 comments41 min readLW link

High-level hopes for AI alignment

HoldenKarnofskyDec 15, 2022, 6:00 PM
42 points

12 votes

Overall karma indicates overall quality.

3 comments19 min readLW link
(www.cold-takes.com)

[Question] How is ARC plan­ning to use ELK?

jacquesthibsDec 15, 2022, 8:11 PM
23 points

11 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

AI over­hangs de­pend on whether al­gorithms, com­pute and data are sub­sti­tutes or complements

NathanBarnardDec 16, 2022, 2:23 AM
2 points

1 vote

Overall karma indicates overall quality.

0 comments3 min readLW link

Paper: Trans­form­ers learn in-con­text by gra­di­ent descent

LawrenceCDec 16, 2022, 11:10 AM
26 points

12 votes

Overall karma indicates overall quality.

11 comments2 min readLW link
(arxiv.org)

How im­por­tant are ac­cu­rate AI timelines for the op­ti­mal spend­ing sched­ule on AI risk in­ter­ven­tions?

Tristan CookDec 16, 2022, 4:05 PM
27 points

10 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Will Machines Ever Rule the World? MLAISU W50

Esben KranDec 16, 2022, 11:03 AM
12 points

2 votes

Overall karma indicates overall quality.

7 comments4 min readLW link
(newsletter.apartresearch.com)

Can we effi­ciently ex­plain model be­hav­iors?

paulfchristianoDec 16, 2022, 7:40 PM
63 points

23 votes

Overall karma indicates overall quality.

0 comments9 min readLW link
(ai-alignment.com)

[Question] Col­lege Selec­tion Ad­vice for Tech­ni­cal Alignment

TempCollegeAskDec 16, 2022, 5:11 PM
11 points

7 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

Paper: Con­sti­tu­tional AI: Harm­less­ness from AI Feed­back (An­thropic)

LawrenceCDec 16, 2022, 10:12 PM
60 points

22 votes

Overall karma indicates overall quality.

10 comments1 min readLW link
(www.anthropic.com)

Pos­i­tive val­ues seem more ro­bust and last­ing than prohibitions

TurnTroutDec 17, 2022, 9:43 PM
42 points

17 votes

Overall karma indicates overall quality.

12 comments2 min readLW link

Take 11: “Align­ing lan­guage mod­els” should be weirder.

Charlie SteinerDec 18, 2022, 2:14 PM
29 points

14 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Why I think that teach­ing philos­o­phy is high impact

Eleni AngelouDec 19, 2022, 3:11 AM
5 points

5 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Event [Berkeley]: Align­ment Col­lab­o­ra­tor Speed-Meeting

Dec 19, 2022, 2:24 AM
18 points

5 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

The ‘Old AI’: Les­sons for AI gov­er­nance from early elec­tric­ity regulation

Dec 19, 2022, 2:42 AM
7 points

3 votes

Overall karma indicates overall quality.

0 comments13 min readLW link

Note on al­gorithms with mul­ti­ple trained components

Steven ByrnesDec 20, 2022, 5:08 PM
19 points

6 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

Why mechanis­tic in­ter­pretabil­ity does not and can­not con­tribute to long-term AGI safety (from mes­sages with a friend)

RemmeltDec 19, 2022, 12:02 PM
8 points

7 votes

Overall karma indicates overall quality.

6 comments31 min readLW link

Next Level Seinfeld

ZviDec 19, 2022, 1:30 PM
45 points

23 votes

Overall karma indicates overall quality.

6 comments1 min readLW link
(thezvi.wordpress.com)

Solu­tion to The Align­ment Problem

AlgonDec 19, 2022, 8:12 PM
10 points

12 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Shard The­ory in Nine Th­e­ses: a Distil­la­tion and Crit­i­cal Appraisal

LawrenceCDec 19, 2022, 10:52 PM
80 points

29 votes

Overall karma indicates overall quality.

14 comments17 min readLW link

The “Min­i­mal La­tents” Ap­proach to Nat­u­ral Abstractions

johnswentworthDec 20, 2022, 1:22 AM
41 points

19 votes

Overall karma indicates overall quality.

14 comments12 min readLW link

Take 12: RLHF’s use is ev­i­dence that orgs will jam RL at real-world prob­lems.

Charlie SteinerDec 20, 2022, 5:01 AM
23 points

5 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

[link, 2019] AI paradigm: in­ter­ac­tive learn­ing from un­la­beled instructions

the gears to ascensionDec 20, 2022, 6:45 AM
2 points

1 vote

Overall karma indicates overall quality.

0 comments2 min readLW link
(jgrizou.github.io)

Dis­cov­er­ing Lan­guage Model Be­hav­iors with Model-Writ­ten Evaluations

Dec 20, 2022, 8:08 PM
45 points

22 votes

Overall karma indicates overall quality.

6 comments1 min readLW link
(www.anthropic.com)

Pod­cast: Tam­era Lan­ham on AI risk, threat mod­els, al­ign­ment pro­pos­als, ex­ter­nal­ized rea­son­ing over­sight, and work­ing at Anthropic

Orpheus16Dec 20, 2022, 9:39 PM
14 points

4 votes

Overall karma indicates overall quality.

2 comments11 min readLW link

Google Search loses to ChatGPT fair and square

ShmiDec 21, 2022, 8:11 AM
12 points

4 votes

Overall karma indicates overall quality.

6 comments1 min readLW link
(www.surgehq.ai)

A Com­pre­hen­sive Mechanis­tic In­ter­pretabil­ity Ex­plainer & Glossary

Neel NandaDec 21, 2022, 12:35 PM
40 points

17 votes

Overall karma indicates overall quality.

0 comments2 min readLW link
(neelnanda.io)

Price’s equa­tion for neu­ral networks

tailcalledDec 21, 2022, 1:09 PM
22 points

4 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

[Question] [DISC] Are Values Ro­bust?

DragonGodDec 21, 2022, 1:00 AM
12 points

3 votes

Overall karma indicates overall quality.

5 comments2 min readLW link

Me­taphor.systems

the gears to ascensionDec 21, 2022, 9:31 PM
9 points

5 votes

Overall karma indicates overall quality.

2 comments1 min readLW link
(metaphor.systems)

The Hu­man’s Hid­den Utility Func­tion (Maybe)

lukeprogJan 23, 2012, 7:39 PM
64 points

59 votes

Overall karma indicates overall quality.

90 comments3 min readLW link

Us­ing vec­tor fields to vi­su­al­ise prefer­ences and make them consistent

Jan 28, 2020, 7:44 PM
41 points

21 votes

Overall karma indicates overall quality.

32 comments11 min readLW link

[Ar­ti­cle re­view] Ar­tifi­cial In­tel­li­gence, Values, and Alignment

MichaelAMar 9, 2020, 12:42 PM
13 points

6 votes

Overall karma indicates overall quality.

5 comments10 min readLW link

Clar­ify­ing some key hy­pothe­ses in AI alignment

Aug 15, 2019, 9:29 PM
78 points

28 votes

Overall karma indicates overall quality.

12 comments9 min readLW link

Failures in tech­nol­ogy fore­cast­ing? A re­ply to Ord and Yudkowsky

MichaelAMay 8, 2020, 12:41 PM
44 points

24 votes

Overall karma indicates overall quality.

19 comments11 min readLW link

[Link and com­men­tary] The Offense-Defense Balance of Scien­tific Knowl­edge: Does Pub­lish­ing AI Re­search Re­duce Mi­suse?

MichaelAFeb 16, 2020, 7:56 PM
24 points

7 votes

Overall karma indicates overall quality.

4 comments3 min readLW link

How can In­ter­pretabil­ity help Align­ment?

May 23, 2020, 4:16 PM
37 points

18 votes

Overall karma indicates overall quality.

3 comments9 min readLW link

A Prob­lem With Patternism

B JacobsMay 19, 2020, 8:16 PM
5 points

4 votes

Overall karma indicates overall quality.

52 comments1 min readLW link

Goal-di­rect­ed­ness is be­hav­ioral, not structural

adamShimiJun 8, 2020, 11:05 PM
6 points

4 votes

Overall karma indicates overall quality.

12 comments3 min readLW link

Learn­ing Deep Learn­ing: Join­ing data sci­ence re­search as a mathematician

magfrumpOct 19, 2017, 7:14 PM
10 points

9 votes

Overall karma indicates overall quality.

4 comments3 min readLW link

Will AI un­dergo dis­con­tin­u­ous progress?

Sammy MartinFeb 21, 2020, 10:16 PM
26 points

19 votes

Overall karma indicates overall quality.

21 comments20 min readLW link

The Value Defi­ni­tion Problem

Sammy MartinNov 18, 2019, 7:56 PM
14 points

9 votes

Overall karma indicates overall quality.

6 comments11 min readLW link

Life at Three Tails of the Bell Curve

lsusrJun 27, 2020, 8:49 AM
63 points

37 votes

Overall karma indicates overall quality.

10 comments4 min readLW link

How do take­off speeds af­fect the prob­a­bil­ity of bad out­comes from AGI?

KRJun 29, 2020, 10:06 PM
15 points

7 votes

Overall karma indicates overall quality.

2 comments8 min readLW link

AI Benefits Post 2: How AI Benefits Differs from AI Align­ment & AI for Good

CullenJun 29, 2020, 5:00 PM
8 points

6 votes

Overall karma indicates overall quality.

7 comments2 min readLW link

Null-box­ing New­comb’s Problem

YitzJul 13, 2020, 4:32 PM
33 points

26 votes

Overall karma indicates overall quality.

10 comments4 min readLW link

No non­sense ver­sion of the “racial al­gorithm bias”

Yuxi_LiuJul 13, 2019, 3:39 PM
115 points

56 votes

Overall karma indicates overall quality.

20 comments2 min readLW link

Ed­u­ca­tion 2.0 — A brand new ed­u­ca­tion system

aryanJul 15, 2020, 10:09 AM
−8 points

8 votes

Overall karma indicates overall quality.

3 comments6 min readLW link

What it means to optimise

Neel NandaJul 25, 2020, 9:40 AM
5 points

7 votes

Overall karma indicates overall quality.

0 comments8 min readLW link
(www.neelnanda.io)

[Question] Where are peo­ple think­ing and talk­ing about global co­or­di­na­tion for AI safety?

Wei DaiMay 22, 2019, 6:24 AM
103 points

37 votes

Overall karma indicates overall quality.

22 comments1 min readLW link

The strat­egy-steal­ing assumption

paulfchristianoSep 16, 2019, 3:23 PM
72 points

24 votes

Overall karma indicates overall quality.

46 comments12 min readLW link3 reviews

Con­ver­sa­tion with Paul Christiano

abergalSep 11, 2019, 11:20 PM
44 points

18 votes

Overall karma indicates overall quality.

6 comments30 min readLW link
(aiimpacts.org)

Tran­scrip­tion of Eliezer’s Jan­uary 2010 video Q&A

curiousepicNov 14, 2011, 5:02 PM
112 points

81 votes

Overall karma indicates overall quality.

9 comments56 min readLW link

Re­sources for AI Align­ment Cartography

GyrodiotApr 4, 2020, 2:20 PM
45 points

22 votes

Overall karma indicates overall quality.

8 comments9 min readLW link

Thoughts on Ben Garfinkel’s “How sure are we about this AI stuff?”

David Scott Krueger (formerly: capybaralet)Feb 6, 2019, 7:09 PM
25 points

12 votes

Overall karma indicates overall quality.

17 comments1 min readLW link

An­nounce­ment: AI al­ign­ment prize round 2 win­ners and next round

cousin_itApr 16, 2018, 3:08 AM
64 points

46 votes

Overall karma indicates overall quality.

29 comments2 min readLW link

An­nounce­ment: AI al­ign­ment prize round 3 win­ners and next round

cousin_itJul 15, 2018, 7:40 AM
93 points

29 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

Se­cu­rity Mind­set and the Lo­gis­tic Suc­cess Curve

Eliezer YudkowskyNov 26, 2017, 3:58 PM
76 points

47 votes

Overall karma indicates overall quality.

48 comments20 min readLW link

Ar­bital scrape

emmabJun 6, 2019, 11:11 PM
89 points

31 votes

Overall karma indicates overall quality.

23 comments1 min readLW link

The Strangest Thing An AI Could Tell You

Eliezer YudkowskyJul 15, 2009, 2:27 AM
116 points

108 votes

Overall karma indicates overall quality.

605 comments2 min readLW link

Self-fulfilling correlations

PhilGoetzAug 26, 2010, 9:07 PM
144 points

118 votes

Overall karma indicates overall quality.

50 comments3 min readLW link

Zoom In: An In­tro­duc­tion to Circuits

evhubMar 10, 2020, 7:36 PM
84 points

28 votes

Overall karma indicates overall quality.

11 comments2 min readLW link
(distill.pub)

Should ethi­cists be in­side or out­side a pro­fes­sion?

Eliezer YudkowskyDec 12, 2018, 1:40 AM
87 points

30 votes

Overall karma indicates overall quality.

6 comments9 min readLW link

Im­plicit extortion

paulfchristianoApr 13, 2018, 4:33 PM
29 points

22 votes

Overall karma indicates overall quality.

16 comments6 min readLW link
(ai-alignment.com)

Bayesian Judo

Eliezer YudkowskyJul 31, 2007, 5:53 AM
87 points

119 votes

Overall karma indicates overall quality.

108 comments1 min readLW link

An­nounc­ing Align­men­tFo­rum.org Beta

RaemonJul 10, 2018, 8:19 PM
67 points

34 votes

Overall karma indicates overall quality.

35 comments2 min readLW link

An­nounc­ing the Align­ment Newsletter

Rohin ShahApr 9, 2018, 9:16 PM
29 points

20 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

He­len Toner on China, CSET, and AI

Rob BensingerApr 21, 2019, 4:10 AM
68 points

26 votes

Overall karma indicates overall quality.

3 comments7 min readLW link
(rationallyspeakingpodcast.org)

A sim­ple en­vi­ron­ment for show­ing mesa misalignment

Matthew BarnettSep 26, 2019, 4:44 AM
70 points

34 votes

Overall karma indicates overall quality.

9 comments2 min readLW link

The E-Coli Test for AI Alignment

johnswentworthDec 16, 2018, 8:10 AM
69 points

26 votes

Overall karma indicates overall quality.

24 comments1 min readLW link

Re­cent Progress in the The­ory of Neu­ral Networks

intersticeDec 4, 2019, 11:11 PM
76 points

27 votes

Overall karma indicates overall quality.

9 comments9 min readLW link

The Art of the Ar­tifi­cial: In­sights from ‘Ar­tifi­cial In­tel­li­gence: A Modern Ap­proach’

TurnTroutMar 25, 2018, 6:55 AM
31 points

20 votes

Overall karma indicates overall quality.

8 comments15 min readLW link

Head­ing off a near-term AGI arms race

lincolnquirkAug 22, 2012, 2:23 PM
10 points

19 votes

Overall karma indicates overall quality.

70 comments1 min readLW link

Out­perform­ing the hu­man Atari benchmark

VaniverMar 31, 2020, 7:33 PM
58 points

23 votes

Overall karma indicates overall quality.

5 comments1 min readLW link
(deepmind.com)

Con­ver­sa­tional Pre­sen­ta­tion of Why Au­toma­tion is Differ­ent This Time

ryan_bJan 17, 2018, 10:11 PM
33 points

30 votes

Overall karma indicates overall quality.

26 comments1 min readLW link

A rant against robots

Lê Nguyên HoangJan 14, 2020, 10:03 PM
64 points

30 votes

Overall karma indicates overall quality.

7 comments5 min readLW link

Clar­ify­ing “AI Align­ment”

paulfchristianoNov 15, 2018, 2:41 PM
64 points

23 votes

Overall karma indicates overall quality.

82 comments3 min readLW link2 reviews

Tiling Agents for Self-Mod­ify­ing AI (OPFAI #2)

Eliezer YudkowskyJun 6, 2013, 8:24 PM
84 points

57 votes

Overall karma indicates overall quality.

259 comments3 min readLW link

EDT solves 5 and 10 with con­di­tional oracles

jessicataSep 30, 2018, 7:57 AM
59 points

19 votes

Overall karma indicates overall quality.

8 comments13 min readLW link

AGI and Friendly AI in the dom­i­nant AI textbook

lukeprogMar 11, 2011, 4:12 AM
73 points

59 votes

Overall karma indicates overall quality.

27 comments3 min readLW link

Ta­boo­ing ‘Agent’ for Pro­saic Alignment

Hjalmar_WijkAug 23, 2019, 2:55 AM
54 points

24 votes

Overall karma indicates overall quality.

10 comments6 min readLW link

Is this what FAI out­reach suc­cess looks like?

Charlie SteinerMar 9, 2018, 1:12 PM
17 points

13 votes

Overall karma indicates overall quality.

3 comments1 min readLW link
(www.youtube.com)

Align­ing a toy model of optimization

paulfchristianoJun 28, 2019, 8:23 PM
52 points

18 votes

Overall karma indicates overall quality.

26 comments3 min readLW link

Deep­Mind ar­ti­cle: AI Safety Gridworlds

Commander ZanderNov 30, 2017, 4:13 PM
24 points

19 votes

Overall karma indicates overall quality.

5 comments1 min readLW link
(deepmind.com)

Bot­world: a cel­lu­lar au­toma­ton for study­ing self-mod­ify­ing agents em­bed­ded in their environment

So8resApr 12, 2014, 12:56 AM
78 points

54 votes

Overall karma indicates overall quality.

55 comments7 min readLW link

“UDT2” and “against UD+ASSA”

Wei DaiMay 12, 2019, 4:18 AM
50 points

20 votes

Overall karma indicates overall quality.

7 comments7 min readLW link

Us­ing ly­ing to de­tect hu­man values

Stuart_ArmstrongMar 15, 2018, 11:37 AM
19 points

21 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Another AI Win­ter?

PeterMcCluskeyDec 25, 2019, 12:58 AM
47 points

17 votes

Overall karma indicates overall quality.

14 comments4 min readLW link
(www.bayesianinvestor.com)

Model­ing AGI Safety Frame­works with Causal In­fluence Diagrams

Ramana KumarJun 21, 2019, 12:50 PM
43 points

13 votes

Overall karma indicates overall quality.

6 comments1 min readLW link
(arxiv.org)

The Ur­gent Meta-Ethics of Friendly Ar­tifi­cial Intelligence

lukeprogFeb 1, 2011, 2:15 PM
76 points

57 votes

Overall karma indicates overall quality.

252 comments1 min readLW link

Henry Kiss­inger: AI Could Mean the End of Hu­man History

ESRogsMay 15, 2018, 8:11 PM
17 points

10 votes

Overall karma indicates overall quality.

12 comments1 min readLW link
(www.theatlantic.com)

Self-con­firm­ing pre­dic­tions can be ar­bi­trar­ily bad

Stuart_ArmstrongMay 3, 2019, 11:34 AM
46 points

19 votes

Overall karma indicates overall quality.

11 comments5 min readLW link

A Vi­su­al­iza­tion of Nick Bostrom’s Superintelligence

[deleted]Jul 23, 2014, 12:24 AM
62 points

46 votes

Overall karma indicates overall quality.

28 comments3 min readLW link

[Question] What are the most plau­si­ble “AI Safety warn­ing shot” sce­nar­ios?

Daniel KokotajloMar 26, 2020, 8:59 PM
35 points

13 votes

Overall karma indicates overall quality.

16 comments1 min readLW link

AGI in a vuln­er­a­ble world

Mar 26, 2020, 12:10 AM
42 points

14 votes

Overall karma indicates overall quality.

21 comments1 min readLW link
(aiimpacts.org)

Three Kinds of Competitiveness

Daniel KokotajloMar 31, 2020, 1:00 AM
36 points

11 votes

Overall karma indicates overall quality.

18 comments5 min readLW link

Biolog­i­cal hu­mans and the ris­ing tide of AI

cousin_itJan 29, 2018, 4:04 PM
22 points

19 votes

Overall karma indicates overall quality.

23 comments1 min readLW link

HLAI 2018 Field Report

Gordon Seidoh WorleyAug 29, 2018, 12:11 AM
48 points

21 votes

Overall karma indicates overall quality.

12 comments5 min readLW link

Mag­i­cal Categories

Eliezer YudkowskyAug 24, 2008, 7:51 PM
65 points

50 votes

Overall karma indicates overall quality.

133 comments9 min readLW link

Align­ment as Translation

johnswentworthMar 19, 2020, 9:40 PM
62 points

20 votes

Overall karma indicates overall quality.

39 comments4 min readLW link

Re­solv­ing hu­man val­ues, com­pletely and adequately

Stuart_ArmstrongMar 30, 2018, 3:35 AM
32 points

14 votes

Overall karma indicates overall quality.

30 comments12 min readLW link

Will trans­parency help catch de­cep­tion? Per­haps not

Matthew BarnettNov 4, 2019, 8:52 PM
43 points

13 votes

Overall karma indicates overall quality.

5 comments7 min readLW link

A dilemma for pro­saic AI alignment

Daniel KokotajloDec 17, 2019, 10:11 PM
40 points

12 votes

Overall karma indicates overall quality.

30 comments3 min readLW link

[1911.08265] Mas­ter­ing Atari, Go, Chess and Shogi by Plan­ning with a Learned Model | Arxiv

DragonGodNov 21, 2019, 1:18 AM
52 points

15 votes

Overall karma indicates overall quality.

4 comments1 min readLW link
(arxiv.org)

Glenn Beck dis­cusses the Sin­gu­lar­ity, cites SI researchers

BrihaspatiJun 12, 2012, 4:45 PM
73 points

55 votes

Overall karma indicates overall quality.

183 comments10 min readLW link

Siren wor­lds and the per­ils of over-op­ti­mised search

Stuart_ArmstrongApr 7, 2014, 11:00 AM
73 points

51 votes

Overall karma indicates overall quality.

417 comments7 min readLW link

Hu­man-Aligned AI Sum­mer School: A Summary

Michaël TrazziAug 11, 2018, 8:11 AM
39 points

13 votes

Overall karma indicates overall quality.

5 comments4 min readLW link

Top 9+2 myths about AI risk

Stuart_ArmstrongJun 29, 2015, 8:41 PM
68 points

48 votes

Overall karma indicates overall quality.

45 comments2 min readLW link

Learn­ing bi­ases and re­wards simultaneously

Rohin ShahJul 6, 2019, 1:45 AM
41 points

13 votes

Overall karma indicates overall quality.

3 comments4 min readLW link

Look­ing for AI Safety Ex­perts to Provide High Level Guidance for RAISE

OferMay 6, 2018, 2:06 AM
17 points

14 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

[Question] How much fund­ing and re­searchers were in AI, and AI Safety, in 2018?

RaemonMar 3, 2019, 9:46 PM
41 points

9 votes

Overall karma indicates overall quality.

11 comments1 min readLW link

Deep learn­ing—deeper flaws?

Richard_NgoSep 24, 2018, 6:40 PM
39 points

18 votes

Overall karma indicates overall quality.

17 comments4 min readLW link
(thinkingcomplete.blogspot.com)

A model of UDT with a con­crete prior over log­i­cal statements

BenyaAug 28, 2012, 9:45 PM
62 points

50 votes

Overall karma indicates overall quality.

24 comments4 min readLW link

Mal­ign gen­er­al­iza­tion with­out in­ter­nal search

Matthew BarnettJan 12, 2020, 6:03 PM
43 points

14 votes

Overall karma indicates overall quality.

12 comments4 min readLW link

An­nounc­ing the sec­ond AI Safety Camp

LachouetteJun 11, 2018, 6:59 PM
34 points

14 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Vaniver’s View on Fac­tored Cognition

VaniverAug 23, 2019, 2:54 AM
48 points

10 votes

Overall karma indicates overall quality.

4 comments8 min readLW link

De­tached Lever Fallacy

Eliezer YudkowskyJul 31, 2008, 6:57 PM
70 points

55 votes

Overall karma indicates overall quality.

41 comments7 min readLW link

When to use quantilization

RyanCareyFeb 5, 2019, 5:17 PM
65 points

19 votes

Overall karma indicates overall quality.

5 comments4 min readLW link

The first AI Safety Camp & onwards

RemmeltJun 7, 2018, 8:13 PM
45 points

23 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

Learn­ing prefer­ences by look­ing at the world

Rohin ShahFeb 12, 2019, 10:25 PM
43 points

13 votes

Overall karma indicates overall quality.

10 comments7 min readLW link
(bair.berkeley.edu)

Sel­ling Nonapples

Eliezer YudkowskyNov 13, 2008, 8:10 PM
71 points

50 votes

Overall karma indicates overall quality.

78 comments7 min readLW link

The AI Align­ment Prob­lem Has Already Been Solved(?) Once

SquirrelInHellApr 22, 2017, 1:24 PM
50 points

36 votes

Overall karma indicates overall quality.

45 comments4 min readLW link
(squirrelinhell.blogspot.com)

Trace README

johnswentworthMar 11, 2020, 9:08 PM
35 points

12 votes

Overall karma indicates overall quality.

1 comment8 min readLW link

[Link] Com­puter im­proves its Civ­i­liza­tion II game­play by read­ing the manual

Kaj_SotalaJul 13, 2011, 12:00 PM
49 points

37 votes

Overall karma indicates overall quality.

5 comments4 min readLW link

Idea: Open Ac­cess AI Safety Journal

Gordon Seidoh WorleyMar 23, 2018, 6:27 PM
28 points

21 votes

Overall karma indicates overall quality.

11 comments1 min readLW link

Another take on agent foun­da­tions: for­mal­iz­ing zero-shot reasoning

zhukeepaJul 1, 2018, 6:12 AM
59 points

27 votes

Overall karma indicates overall quality.

20 comments12 min readLW link

Log­i­cal Up­date­less­ness as a Ro­bust Del­e­ga­tion Problem

Scott GarrabrantOct 27, 2017, 9:16 PM
30 points

14 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

Some thoughts af­ter read­ing Ar­tifi­cial In­tel­li­gence: A Modern Approach

swift_spiralMar 19, 2019, 11:39 PM
38 points

16 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

AI safety with­out goal-di­rected behavior

Rohin ShahJan 7, 2019, 7:48 AM
65 points

25 votes

Overall karma indicates overall quality.

15 comments4 min readLW link

No Univer­sally Com­pel­ling Arguments

Eliezer YudkowskyJun 26, 2008, 8:29 AM
62 points

54 votes

Overall karma indicates overall quality.

57 comments5 min readLW link

What AI Safety Re­searchers Have Writ­ten About the Na­ture of Hu­man Values

avturchinJan 16, 2019, 1:59 PM
50 points

16 votes

Overall karma indicates overall quality.

3 comments15 min readLW link

Disam­biguat­ing “al­ign­ment” and re­lated no­tions

David Scott Krueger (formerly: capybaralet)Jun 5, 2018, 3:35 PM
22 points

13 votes

Overall karma indicates overall quality.

21 comments2 min readLW link

In­duc­tive bi­ases stick around

evhubDec 18, 2019, 7:52 PM
63 points

22 votes

Overall karma indicates overall quality.

14 comments3 min readLW link

Bill Gates: prob­lem of strong AI with con­flict­ing goals “very wor­thy of study and time”

Paul CrowleyJan 22, 2015, 8:21 PM
73 points

52 votes

Overall karma indicates overall quality.

18 comments1 min readLW link

So You Want to Save the World

lukeprogJan 1, 2012, 7:39 AM
54 points

51 votes

Overall karma indicates overall quality.

149 comments12 min readLW link

Me­taphilo­soph­i­cal com­pe­tence can’t be dis­en­tan­gled from alignment

zhukeepaApr 1, 2018, 12:38 AM
32 points

24 votes

Overall karma indicates overall quality.

39 comments3 min readLW link

Some Thoughts on Metaphilosophy

Wei DaiFeb 10, 2019, 12:28 AM
62 points

22 votes

Overall karma indicates overall quality.

27 comments4 min readLW link

Rea­sons com­pute may not drive AI ca­pa­bil­ities growth

Tristan HDec 19, 2018, 10:13 PM
42 points

17 votes

Overall karma indicates overall quality.

10 comments8 min readLW link

Dis­tance Func­tions are Hard

Grue_SlinkyAug 13, 2019, 5:33 PM
31 points

26 votes

Overall karma indicates overall quality.

19 comments6 min readLW link

Take­aways from safety by de­fault interviews

Apr 3, 2020, 5:20 PM
28 points

11 votes

Overall karma indicates overall quality.

2 comments13 min readLW link
(aiimpacts.org)

Bridge Col­lapse: Re­duc­tion­ism as Eng­ineer­ing Problem

Rob BensingerFeb 18, 2014, 10:03 PM
78 points

52 votes

Overall karma indicates overall quality.

62 comments15 min readLW link

Prob­a­bil­ity as Min­i­mal Map

johnswentworthSep 1, 2019, 7:19 PM
49 points

20 votes

Overall karma indicates overall quality.

10 comments5 min readLW link

Policy Alignment

abramdemskiJun 30, 2018, 12:24 AM
50 points

21 votes

Overall karma indicates overall quality.

25 comments8 min readLW link

Stable Poin­t­ers to Value: An Agent Embed­ded in Its Own Utility Function

abramdemskiAug 17, 2017, 12:22 AM
15 points

11 votes

Overall karma indicates overall quality.

9 comments5 min readLW link

Stable Poin­t­ers to Value II: En­vi­ron­men­tal Goals

abramdemskiFeb 9, 2018, 6:03 AM
18 points

13 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

The Ar­gu­ment from Philo­soph­i­cal Difficulty

Wei DaiFeb 10, 2019, 12:28 AM
54 points

20 votes

Overall karma indicates overall quality.

31 comments1 min readLW link

hu­man psy­chol­in­guists: a crit­i­cal appraisal

nostalgebraistDec 31, 2019, 12:20 AM
174 points

85 votes

Overall karma indicates overall quality.

59 comments16 min readLW link2 reviews
(nostalgebraist.tumblr.com)

My take on agent foun­da­tions: for­mal­iz­ing metaphilo­soph­i­cal competence

zhukeepaApr 1, 2018, 6:33 AM
20 points

15 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Cri­tique my Model: The EV of AGI to Selfish Individuals

ozziegooenApr 8, 2018, 8:04 PM
19 points

14 votes

Overall karma indicates overall quality.

9 comments4 min readLW link

AI Safety De­bate and Its Applications

VojtaKovarikJul 23, 2019, 10:31 PM
36 points

18 votes

Overall karma indicates overall quality.

5 comments12 min readLW link

TAISU 2019 Field Report

Gordon Seidoh WorleyOct 15, 2019, 1:09 AM
36 points

20 votes

Overall karma indicates overall quality.

5 comments5 min readLW link

Hu­man-AI Collaboration

Rohin ShahOct 22, 2019, 6:32 AM
42 points

14 votes

Overall karma indicates overall quality.

7 comments2 min readLW link
(bair.berkeley.edu)

An­a­lyz­ing the Prob­lem GPT-3 is Try­ing to Solve

adamShimiAug 6, 2020, 9:58 PM
16 points

7 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

[LINK] Speed su­per­in­tel­li­gence?

Stuart_ArmstrongAug 14, 2014, 3:57 PM
53 points

37 votes

Overall karma indicates overall quality.

20 comments1 min readLW link

A big Sin­gu­lar­ity-themed Hol­ly­wood movie out in April offers many op­por­tu­ni­ties to talk about AI risk

chaosmageJan 7, 2014, 5:48 PM
49 points

39 votes

Overall karma indicates overall quality.

85 comments1 min readLW link

New pa­per: (When) is Truth-tel­ling Fa­vored in AI de­bate?

VojtaKovarikDec 26, 2019, 7:59 PM
32 points

12 votes

Overall karma indicates overall quality.

7 comments5 min readLW link
(medium.com)

Ar­tifi­cial Addition

Eliezer YudkowskyNov 20, 2007, 7:58 AM
68 points

61 votes

Overall karma indicates overall quality.

129 comments6 min readLW link

Ex­plor­ing safe exploration

evhubJan 6, 2020, 9:07 PM
37 points

11 votes

Overall karma indicates overall quality.

8 comments3 min readLW link

‘Dumb’ AI ob­serves and ma­nipu­lates controllers

Stuart_ArmstrongJan 13, 2015, 1:35 PM
52 points

34 votes

Overall karma indicates overall quality.

19 comments2 min readLW link

AI Read­ing Group Thoughts (1/​?): The Man­date of Heaven

AlicornAug 10, 2018, 12:24 AM
45 points

17 votes

Overall karma indicates overall quality.

18 comments4 min readLW link

AI Read­ing Group Thoughts (2/​?): Re­con­struc­tive Psychosurgery

AlicornSep 25, 2018, 4:25 AM
27 points

9 votes

Overall karma indicates overall quality.

6 comments3 min readLW link

(notes on) Policy Desider­ata for Su­per­in­tel­li­gent AI: A Vec­tor Field Approach

Ben PaceFeb 4, 2019, 10:08 PM
43 points

16 votes

Overall karma indicates overall quality.

5 comments7 min readLW link

AI Gover­nance: A Re­search Agenda

habrykaSep 5, 2018, 6:00 PM
25 points

6 votes

Overall karma indicates overall quality.

3 comments1 min readLW link
(www.fhi.ox.ac.uk)

Global on­line de­bate on the gov­er­nance of AI

CarolineJJan 5, 2018, 3:31 PM
8 points

6 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

[AN #61] AI policy and gov­er­nance, from two peo­ple in the field

Rohin ShahAug 5, 2019, 5:00 PM
12 points

6 votes

Overall karma indicates overall quality.

2 comments9 min readLW link
(mailchi.mp)

2019 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

LarksDec 19, 2019, 3:00 AM
130 points

37 votes

Overall karma indicates overall quality.

18 comments62 min readLW link

[Question] What’s wrong with these analo­gies for un­der­stand­ing In­formed Over­sight and IDA?

Wei DaiMar 20, 2019, 9:11 AM
35 points

9 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

The Align­ment Newslet­ter #1: 04/​09/​18

Rohin ShahApr 9, 2018, 4:00 PM
12 points

5 votes

Overall karma indicates overall quality.

3 comments4 min readLW link

The Align­ment Newslet­ter #2: 04/​16/​18

Rohin ShahApr 16, 2018, 4:00 PM
8 points

1 vote

Overall karma indicates overall quality.

0 comments5 min readLW link

The Align­ment Newslet­ter #3: 04/​23/​18

Rohin ShahApr 23, 2018, 4:00 PM
9 points

2 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

The Align­ment Newslet­ter #4: 04/​30/​18

Rohin ShahApr 30, 2018, 4:00 PM
8 points

1 vote

Overall karma indicates overall quality.

0 comments3 min readLW link

The Align­ment Newslet­ter #5: 05/​07/​18

Rohin ShahMay 7, 2018, 4:00 PM
8 points

1 vote

Overall karma indicates overall quality.

0 comments7 min readLW link

The Align­ment Newslet­ter #6: 05/​14/​18

Rohin ShahMay 14, 2018, 4:00 PM
8 points

1 vote

Overall karma indicates overall quality.

0 comments2 min readLW link

The Align­ment Newslet­ter #7: 05/​21/​18

Rohin ShahMay 21, 2018, 4:00 PM
8 points

1 vote

Overall karma indicates overall quality.

0 comments5 min readLW link

The Align­ment Newslet­ter #8: 05/​28/​18

Rohin ShahMay 28, 2018, 4:00 PM
8 points

1 vote

Overall karma indicates overall quality.

0 comments6 min readLW link

The Align­ment Newslet­ter #9: 06/​04/​18

Rohin ShahJun 4, 2018, 4:00 PM
8 points

1 vote

Overall karma indicates overall quality.

0 comments2 min readLW link

The Align­ment Newslet­ter #10: 06/​11/​18

Rohin ShahJun 11, 2018, 4:00 PM
16 points

4 votes

Overall karma indicates overall quality.

0 comments9 min readLW link

The Align­ment Newslet­ter #11: 06/​18/​18

Rohin ShahJun 18, 2018, 4:00 PM
8 points

1 vote

Overall karma indicates overall quality.

0 comments10 min readLW link

The Align­ment Newslet­ter #12: 06/​25/​18

Rohin ShahJun 25, 2018, 4:00 PM
15 points

5 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

Align­ment Newslet­ter #13: 07/​02/​18

Rohin ShahJul 2, 2018, 4:10 PM
70 points

26 votes

Overall karma indicates overall quality.

12 comments8 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #14

Rohin ShahJul 9, 2018, 4:20 PM
14 points

8 votes

Overall karma indicates overall quality.

0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #15: 07/​16/​18

Rohin ShahJul 16, 2018, 4:10 PM
42 points

13 votes

Overall karma indicates overall quality.

0 comments15 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #17

Rohin ShahJul 30, 2018, 4:10 PM
32 points

6 votes

Overall karma indicates overall quality.

0 comments13 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #18

Rohin ShahAug 6, 2018, 4:00 PM
17 points

5 votes

Overall karma indicates overall quality.

0 comments10 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #19

Rohin ShahAug 14, 2018, 2:10 AM
18 points

5 votes

Overall karma indicates overall quality.

0 comments13 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #20

Rohin ShahAug 20, 2018, 4:00 PM
12 points

6 votes

Overall karma indicates overall quality.

2 comments6 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #21

Rohin ShahAug 27, 2018, 4:20 PM
25 points

6 votes

Overall karma indicates overall quality.

0 comments7 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #22

Rohin ShahSep 3, 2018, 4:10 PM
18 points

6 votes

Overall karma indicates overall quality.

0 comments6 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #23

Rohin ShahSep 10, 2018, 5:10 PM
16 points

5 votes

Overall karma indicates overall quality.

0 comments7 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #24

Rohin ShahSep 17, 2018, 4:20 PM
10 points

5 votes

Overall karma indicates overall quality.

6 comments12 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #25

Rohin ShahSep 24, 2018, 4:10 PM
18 points

6 votes

Overall karma indicates overall quality.

3 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #26

Rohin ShahOct 2, 2018, 4:10 PM
13 points

3 votes

Overall karma indicates overall quality.

0 comments7 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #27

Rohin ShahOct 9, 2018, 1:10 AM
16 points

3 votes

Overall karma indicates overall quality.

0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #28

Rohin ShahOct 15, 2018, 9:20 PM
11 points

5 votes

Overall karma indicates overall quality.

0 comments8 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #29

Rohin ShahOct 22, 2018, 4:20 PM
15 points

5 votes

Overall karma indicates overall quality.

0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #30

Rohin ShahOct 29, 2018, 4:10 PM
29 points

13 votes

Overall karma indicates overall quality.

2 comments6 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #31

Rohin ShahNov 5, 2018, 11:50 PM
17 points

3 votes

Overall karma indicates overall quality.

0 comments12 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #32

Rohin ShahNov 12, 2018, 5:20 PM
18 points

4 votes

Overall karma indicates overall quality.

0 comments12 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #33

Rohin ShahNov 19, 2018, 5:20 PM
23 points

7 votes

Overall karma indicates overall quality.

0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #34

Rohin ShahNov 26, 2018, 11:10 PM
24 points

5 votes

Overall karma indicates overall quality.

0 comments10 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #35

Rohin ShahDec 4, 2018, 1:10 AM
15 points

3 votes

Overall karma indicates overall quality.

0 comments6 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #37

Rohin ShahDec 17, 2018, 7:10 PM
25 points

7 votes

Overall karma indicates overall quality.

4 comments10 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #38

Rohin ShahDec 25, 2018, 4:10 PM
9 points

4 votes

Overall karma indicates overall quality.

0 comments8 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #39

Rohin ShahJan 1, 2019, 8:10 AM
32 points

10 votes

Overall karma indicates overall quality.

2 comments5 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #40

Rohin ShahJan 8, 2019, 8:10 PM
21 points

4 votes

Overall karma indicates overall quality.

2 comments5 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #41

Rohin ShahJan 17, 2019, 8:10 AM
22 points

4 votes

Overall karma indicates overall quality.

6 comments10 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #42

Rohin ShahJan 22, 2019, 2:00 AM
20 points

7 votes

Overall karma indicates overall quality.

1 comment10 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #43

Rohin ShahJan 29, 2019, 9:10 PM
14 points

5 votes

Overall karma indicates overall quality.

2 comments13 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #44

Rohin ShahFeb 6, 2019, 8:30 AM
18 points

6 votes

Overall karma indicates overall quality.

0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #45

Rohin ShahFeb 14, 2019, 2:10 AM
25 points

9 votes

Overall karma indicates overall quality.

2 comments8 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #46

Rohin ShahFeb 22, 2019, 12:10 AM
12 points

9 votes

Overall karma indicates overall quality.

0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #48

Rohin ShahMar 11, 2019, 9:10 PM
29 points

13 votes

Overall karma indicates overall quality.

14 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #49

Rohin ShahMar 20, 2019, 4:20 AM
23 points

8 votes

Overall karma indicates overall quality.

1 comment11 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #50

Rohin ShahMar 28, 2019, 6:10 PM
15 points

4 votes

Overall karma indicates overall quality.

2 comments10 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #51

Rohin ShahApr 3, 2019, 4:10 AM
25 points

5 votes

Overall karma indicates overall quality.

2 comments15 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #52

Rohin ShahApr 6, 2019, 1:20 AM
19 points

5 votes

Overall karma indicates overall quality.

1 comment8 min readLW link
(mailchi.mp)

Align­ment Newslet­ter One Year Retrospective

Rohin ShahApr 10, 2019, 6:58 AM
93 points

30 votes

Overall karma indicates overall quality.

31 comments21 min readLW link

Align­ment Newslet­ter #53

Rohin ShahApr 18, 2019, 5:20 PM
20 points

6 votes

Overall karma indicates overall quality.

0 comments8 min readLW link
(mailchi.mp)

[AN #54] Box­ing a finite-hori­zon AI sys­tem to keep it unambitious

Rohin ShahApr 28, 2019, 5:20 AM
20 points

6 votes

Overall karma indicates overall quality.

0 comments8 min readLW link
(mailchi.mp)

[AN #55] Reg­u­la­tory mar­kets and in­ter­na­tional stan­dards as a means of en­sur­ing benefi­cial AI

Rohin ShahMay 5, 2019, 2:20 AM
17 points

6 votes

Overall karma indicates overall quality.

2 comments8 min readLW link
(mailchi.mp)

[AN #56] Should ML re­searchers stop run­ning ex­per­i­ments be­fore mak­ing hy­pothe­ses?

Rohin ShahMay 21, 2019, 2:20 AM
21 points

6 votes

Overall karma indicates overall quality.

8 comments9 min readLW link
(mailchi.mp)

[AN #57] Why we should fo­cus on ro­bust­ness in AI safety, and the analo­gous prob­lems in programming

Rohin ShahJun 5, 2019, 11:20 PM
26 points

9 votes

Overall karma indicates overall quality.

15 comments7 min readLW link
(mailchi.mp)

[AN #58] Mesa op­ti­miza­tion: what it is, and why we should care

Rohin ShahJun 24, 2019, 4:10 PM
54 points

19 votes

Overall karma indicates overall quality.

9 comments8 min readLW link
(mailchi.mp)

[AN #59] How ar­gu­ments for AI risk have changed over time

Rohin ShahJul 8, 2019, 5:20 PM
43 points

10 votes

Overall karma indicates overall quality.

4 comments7 min readLW link
(mailchi.mp)

[AN #60] A new AI challenge: Minecraft agents that as­sist hu­man play­ers in cre­ative mode

Rohin ShahJul 22, 2019, 5:00 PM
23 points

10 votes

Overall karma indicates overall quality.

6 comments9 min readLW link
(mailchi.mp)

[AN #62] Are ad­ver­sar­ial ex­am­ples caused by real but im­per­cep­ti­ble fea­tures?

Rohin ShahAug 22, 2019, 5:10 PM
27 points

11 votes

Overall karma indicates overall quality.

10 comments9 min readLW link
(mailchi.mp)

[AN #63] How ar­chi­tec­ture search, meta learn­ing, and en­vi­ron­ment de­sign could lead to gen­eral intelligence

Rohin ShahSep 10, 2019, 7:10 PM
21 points

8 votes

Overall karma indicates overall quality.

12 comments8 min readLW link
(mailchi.mp)

[AN #64]: Us­ing Deep RL and Re­ward Uncer­tainty to In­cen­tivize Prefer­ence Learning

Rohin ShahSep 16, 2019, 5:10 PM
11 points

5 votes

Overall karma indicates overall quality.

8 comments7 min readLW link
(mailchi.mp)

[AN #65]: Learn­ing use­ful skills by watch­ing hu­mans “play”

Rohin ShahSep 23, 2019, 5:30 PM
11 points

4 votes

Overall karma indicates overall quality.

0 comments9 min readLW link
(mailchi.mp)

[AN #66]: De­com­pos­ing ro­bust­ness into ca­pa­bil­ity ro­bust­ness and al­ign­ment robustness

Rohin ShahSep 30, 2019, 6:00 PM
12 points

6 votes

Overall karma indicates overall quality.

1 comment7 min readLW link
(mailchi.mp)

[AN #67]: Creat­ing en­vi­ron­ments in which to study in­ner al­ign­ment failures

Rohin ShahOct 7, 2019, 5:10 PM
17 points

6 votes

Overall karma indicates overall quality.

0 comments8 min readLW link
(mailchi.mp)

[AN #68]: The at­tain­able util­ity the­ory of impact

Rohin ShahOct 14, 2019, 5:00 PM
17 points

5 votes

Overall karma indicates overall quality.

0 comments8 min readLW link
(mailchi.mp)

[AN #69] Stu­art Rus­sell’s new book on why we need to re­place the stan­dard model of AI

Rohin ShahOct 19, 2019, 12:30 AM
60 points

24 votes

Overall karma indicates overall quality.

12 comments15 min readLW link
(mailchi.mp)

[AN #70]: Agents that help hu­mans who are still learn­ing about their own preferences

Rohin ShahOct 23, 2019, 5:10 PM
16 points

6 votes

Overall karma indicates overall quality.

0 comments9 min readLW link
(mailchi.mp)

[AN #71]: Avoid­ing re­ward tam­per­ing through cur­rent-RF optimization

Rohin ShahOct 30, 2019, 5:10 PM
12 points

5 votes

Overall karma indicates overall quality.

0 comments7 min readLW link
(mailchi.mp)

[AN #72]: Align­ment, ro­bust­ness, method­ol­ogy, and sys­tem build­ing as re­search pri­ori­ties for AI safety

Rohin ShahNov 6, 2019, 6:10 PM
26 points

7 votes

Overall karma indicates overall quality.

4 comments10 min readLW link
(mailchi.mp)

[AN #73]: De­tect­ing catas­trophic failures by learn­ing how agents tend to break

Rohin ShahNov 13, 2019, 6:10 PM
11 points

4 votes

Overall karma indicates overall quality.

0 comments7 min readLW link
(mailchi.mp)

[AN #74]: Separat­ing benefi­cial AI into com­pe­tence, al­ign­ment, and cop­ing with impacts

Rohin ShahNov 20, 2019, 6:20 PM
19 points

7 votes

Overall karma indicates overall quality.

0 comments7 min readLW link
(mailchi.mp)

[AN #75]: Solv­ing Atari and Go with learned game mod­els, and thoughts from a MIRI employee

Rohin ShahNov 27, 2019, 6:10 PM
38 points

11 votes

Overall karma indicates overall quality.

1 comment10 min readLW link
(mailchi.mp)

[AN #76]: How dataset size af­fects ro­bust­ness, and bench­mark­ing safe ex­plo­ra­tion by mea­sur­ing con­straint violations

Rohin ShahDec 4, 2019, 6:10 PM
14 points

6 votes

Overall karma indicates overall quality.

6 comments9 min readLW link
(mailchi.mp)

[AN #77]: Dou­ble de­scent: a unifi­ca­tion of statis­ti­cal the­ory and mod­ern ML practice

Rohin ShahDec 18, 2019, 6:30 PM
21 points

8 votes

Overall karma indicates overall quality.

4 comments14 min readLW link
(mailchi.mp)

[AN #78] For­mal­iz­ing power and in­stru­men­tal con­ver­gence, and the end-of-year AI safety char­ity comparison

Rohin ShahDec 26, 2019, 1:10 AM
26 points

7 votes

Overall karma indicates overall quality.

10 comments9 min readLW link
(mailchi.mp)

[AN #79]: Re­cur­sive re­ward mod­el­ing as an al­ign­ment tech­nique in­te­grated with deep RL

Rohin ShahJan 1, 2020, 6:00 PM
13 points

6 votes

Overall karma indicates overall quality.

0 comments12 min readLW link
(mailchi.mp)

[AN #81]: Univer­sal­ity as a po­ten­tial solu­tion to con­cep­tual difficul­ties in in­tent alignment

Rohin ShahJan 8, 2020, 6:00 PM
31 points

9 votes

Overall karma indicates overall quality.

4 comments11 min readLW link
(mailchi.mp)

[AN #82]: How OpenAI Five dis­tributed their train­ing computation

Rohin ShahJan 15, 2020, 6:20 PM
19 points

6 votes

Overall karma indicates overall quality.

0 comments8 min readLW link
(mailchi.mp)

[AN #83]: Sam­ple-effi­cient deep learn­ing with ReMixMatch

Rohin ShahJan 22, 2020, 6:10 PM
15 points

7 votes

Overall karma indicates overall quality.

4 comments11 min readLW link
(mailchi.mp)

[AN #84] Re­view­ing AI al­ign­ment work in 2018-19

Rohin ShahJan 29, 2020, 6:30 PM
23 points

10 votes

Overall karma indicates overall quality.

0 comments6 min readLW link
(mailchi.mp)

[AN #85]: The nor­ma­tive ques­tions we should be ask­ing for AI al­ign­ment, and a sur­pris­ingly good chatbot

Rohin ShahFeb 5, 2020, 6:20 PM
14 points

6 votes

Overall karma indicates overall quality.

2 comments7 min readLW link
(mailchi.mp)

[AN #86]: Im­prov­ing de­bate and fac­tored cog­ni­tion through hu­man experiments

Rohin ShahFeb 12, 2020, 6:10 PM
14 points

6 votes

Overall karma indicates overall quality.

0 comments9 min readLW link
(mailchi.mp)

[AN #87]: What might hap­pen as deep learn­ing scales even fur­ther?

Rohin ShahFeb 19, 2020, 6:20 PM
28 points

11 votes

Overall karma indicates overall quality.

0 comments4 min readLW link
(mailchi.mp)

[AN #88]: How the prin­ci­pal-agent liter­a­ture re­lates to AI risk

Rohin ShahFeb 27, 2020, 9:10 AM
18 points

6 votes

Overall karma indicates overall quality.

0 comments9 min readLW link
(mailchi.mp)

[AN #89]: A unify­ing for­mal­ism for prefer­ence learn­ing algorithms

Rohin ShahMar 4, 2020, 6:20 PM
16 points

5 votes

Overall karma indicates overall quality.

0 comments9 min readLW link
(mailchi.mp)

[AN #90]: How search land­scapes can con­tain self-re­in­forc­ing feed­back loops

Rohin ShahMar 11, 2020, 5:30 PM
11 points

4 votes

Overall karma indicates overall quality.

6 comments8 min readLW link
(mailchi.mp)

[AN #91]: Con­cepts, im­ple­men­ta­tions, prob­lems, and a bench­mark for im­pact measurement

Rohin ShahMar 18, 2020, 5:10 PM
15 points

5 votes

Overall karma indicates overall quality.

10 comments13 min readLW link
(mailchi.mp)

[AN #92]: Learn­ing good rep­re­sen­ta­tions with con­trastive pre­dic­tive coding

Rohin ShahMar 25, 2020, 5:20 PM
18 points

7 votes

Overall karma indicates overall quality.

1 comment10 min readLW link
(mailchi.mp)

[AN #93]: The Precipice we’re stand­ing at, and how we can back away from it

Rohin ShahApr 1, 2020, 5:10 PM
24 points

6 votes

Overall karma indicates overall quality.

0 comments7 min readLW link
(mailchi.mp)

Fore­cast­ing AI Progress: A Re­search Agenda

Aug 10, 2020, 1:04 AM
39 points

15 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

The Steer­ing Problem

paulfchristianoNov 13, 2018, 5:14 PM
43 points

14 votes

Overall karma indicates overall quality.

12 comments7 min readLW link

Will hu­mans build goal-di­rected agents?

Rohin ShahJan 5, 2019, 1:33 AM
51 points

19 votes

Overall karma indicates overall quality.

43 comments5 min readLW link

Pro­saic AI alignment

paulfchristianoNov 20, 2018, 1:56 PM
40 points

17 votes

Overall karma indicates overall quality.

10 comments8 min readLW link

David Chalmers’ “The Sin­gu­lar­ity: A Philo­soph­i­cal Anal­y­sis”

lukeprogJan 29, 2011, 2:52 AM
55 points

38 votes

Overall karma indicates overall quality.

203 comments4 min readLW link

[Talk] Paul Chris­ti­ano on his al­ign­ment taxonomy

jpSep 27, 2019, 6:37 PM
31 points

10 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(www.youtube.com)

Dreams of AI Design

Eliezer YudkowskyAug 27, 2008, 4:04 AM
26 points

19 votes

Overall karma indicates overall quality.

61 comments5 min readLW link

Qual­i­ta­tive Strate­gies of Friendliness

Eliezer YudkowskyAug 30, 2008, 2:12 AM
30 points

20 votes

Overall karma indicates overall quality.

56 comments12 min readLW link

Or­a­cles, se­quence pre­dic­tors, and self-con­firm­ing predictions

Stuart_ArmstrongMay 3, 2019, 2:09 PM
22 points

9 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

Self-con­firm­ing prophe­cies, and sim­plified Or­a­cle designs

Stuart_ArmstrongJun 28, 2019, 9:57 AM
6 points

3 votes

Overall karma indicates overall quality.

1 comment5 min readLW link

In­vest­ment idea: bas­ket of tech stocks weighted to­wards AI

ioannesAug 12, 2020, 9:30 PM
14 points

10 votes

Overall karma indicates overall quality.

7 comments3 min readLW link

Con­cep­tual is­sues in AI safety: the paradig­matic gap

vedevazzJun 24, 2018, 3:09 PM
33 points

10 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(www.foldl.me)

Disagree­ment with Paul: al­ign­ment induction

Stuart_ArmstrongSep 10, 2018, 1:54 PM
31 points

12 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Largest open col­lec­tion quotes about AI

teradimichJul 12, 2019, 5:18 PM
35 points

15 votes

Overall karma indicates overall quality.

2 comments3 min readLW link
(drive.google.com)

S.E.A.R.L.E’s COBOL room

Stuart_ArmstrongFeb 1, 2013, 8:29 PM
52 points

34 votes

Overall karma indicates overall quality.

36 comments2 min readLW link

In­tro­duc­ing Cor­rigi­bil­ity (an FAI re­search sub­field)

So8resOct 20, 2014, 9:09 PM
52 points

31 votes

Overall karma indicates overall quality.

28 comments3 min readLW link

NES-game play­ing AI [video link and AI-box­ing-re­lated com­ment]

Dr_ManhattanApr 12, 2013, 1:11 PM
42 points

35 votes

Overall karma indicates overall quality.

22 comments1 min readLW link

On un­fix­ably un­safe AGI architectures

Steven ByrnesFeb 19, 2020, 9:16 PM
33 points

14 votes

Overall karma indicates overall quality.

8 comments5 min readLW link

To con­tribute to AI safety, con­sider do­ing AI research

VikaJan 16, 2016, 8:42 PM
39 points

28 votes

Overall karma indicates overall quality.

39 comments2 min readLW link

Ghosts in the Machine

Eliezer YudkowskyJun 17, 2008, 11:29 PM
54 points

40 votes

Overall karma indicates overall quality.

30 comments4 min readLW link

Tech­ni­cal AGI safety re­search out­side AI

Richard_NgoOct 18, 2019, 3:00 PM
43 points

16 votes

Overall karma indicates overall quality.

3 comments3 min readLW link

De­ci­pher­ing China’s AI Dream

Qiaochu_YuanMar 18, 2018, 3:26 AM
12 points

8 votes

Overall karma indicates overall quality.

2 comments1 min readLW link
(www.fhi.ox.ac.uk)

Above-Aver­age AI Scientists

Eliezer YudkowskySep 28, 2008, 11:04 AM
57 points

50 votes

Overall karma indicates overall quality.

97 comments8 min readLW link

The Na­ture of Logic

Eliezer YudkowskyNov 15, 2008, 6:20 AM
37 points

30 votes

Overall karma indicates overall quality.

12 comments10 min readLW link

Or­a­cle paper

Stuart_ArmstrongDec 13, 2017, 2:59 PM
12 points

10 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

AI Align­ment Writ­ing Day Roundup #1

Ben PaceAug 30, 2019, 1:26 AM
32 points

14 votes

Overall karma indicates overall quality.

12 comments1 min readLW link

Notes on the Safety in Ar­tifi­cial In­tel­li­gence conference

UmamiSalamiJul 1, 2016, 12:36 AM
40 points

26 votes

Overall karma indicates overall quality.

15 comments13 min readLW link

Rein­ter­pret­ing “AI and Com­pute”

habrykaDec 25, 2018, 9:12 PM
30 points

9 votes

Overall karma indicates overall quality.

10 comments1 min readLW link
(aiimpacts.org)

AI Safety Pr­ereq­ui­sites Course: Re­vamp and New Lessons

philip_bFeb 3, 2019, 9:04 PM
24 points

10 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

An an­gle of at­tack on Open Prob­lem #1

BenyaAug 18, 2012, 12:08 PM
47 points

33 votes

Overall karma indicates overall quality.

85 comments7 min readLW link

Eval­u­at­ing the fea­si­bil­ity of SI’s plan

JoshuaFoxJan 10, 2013, 8:17 AM
38 points

47 votes

Overall karma indicates overall quality.

188 comments4 min readLW link

Only hu­mans can have hu­man values

PhilGoetzApr 26, 2010, 6:57 PM
51 points

47 votes

Overall karma indicates overall quality.

161 comments17 min readLW link

Mas­ter­ing Chess and Shogi by Self-Play with a Gen­eral Re­in­force­ment Learn­ing Algorithm

DragonGodDec 6, 2017, 6:01 AM
13 points

10 votes

Overall karma indicates overall quality.

4 comments1 min readLW link
(arxiv.org)

Cake, or death!

Stuart_ArmstrongOct 25, 2012, 10:33 AM
46 points

31 votes

Overall karma indicates overall quality.

13 comments4 min readLW link

Self-reg­u­la­tion of safety in AI research

Gordon Seidoh WorleyFeb 25, 2018, 11:17 PM
12 points

10 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

How safe “safe” AI de­vel­op­ment?

Gordon Seidoh WorleyFeb 28, 2018, 11:21 PM
9 points

10 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Stan­ford In­tro to AI course to be taught for free online

Psy-KoshJul 30, 2011, 4:22 PM
38 points

28 votes

Overall karma indicates overall quality.

39 comments1 min readLW link

Bayesian Utility: Rep­re­sent­ing Prefer­ence by Prob­a­bil­ity Measures

Vladimir_NesovJul 27, 2009, 2:28 PM
45 points

21 votes

Overall karma indicates overall quality.

37 comments2 min readLW link

Gains from trade: Slug ver­sus Galaxy—how much would I give up to con­trol you?

Stuart_ArmstrongJul 23, 2013, 7:06 PM
55 points

40 votes

Overall karma indicates overall quality.

67 comments7 min readLW link

Defeat­ing Mun­dane Holo­causts With Robots

lsparrishMay 30, 2011, 10:34 PM
34 points

31 votes

Overall karma indicates overall quality.

28 comments2 min readLW link

As­sum­ing we’ve solved X, could we do Y...

Stuart_ArmstrongDec 11, 2018, 6:13 PM
31 points

14 votes

Overall karma indicates overall quality.

16 comments2 min readLW link

The Stamp Collector

So8resMay 1, 2015, 11:11 PM
45 points

36 votes

Overall karma indicates overall quality.

14 comments6 min readLW link

Sav­ing the world in 80 days: Prologue

Logan RiggsMay 9, 2018, 9:16 PM
12 points

10 votes

Overall karma indicates overall quality.

16 comments2 min readLW link

Pro­ject Pro­posal: Con­sid­er­a­tions for trad­ing off ca­pa­bil­ities and safety im­pacts of AI research

David Scott Krueger (formerly: capybaralet)Aug 6, 2019, 10:22 PM
25 points

17 votes

Overall karma indicates overall quality.

11 comments2 min readLW link

AI Safety Pr­ereq­ui­sites Course: Ba­sic ab­stract rep­re­sen­ta­tions of computation

RAISEMar 13, 2019, 7:38 PM
28 points

10 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

What I Think, If Not Why

Eliezer YudkowskyDec 11, 2008, 5:41 PM
41 points

35 votes

Overall karma indicates overall quality.

103 comments4 min readLW link

RFC: Philo­soph­i­cal Con­ser­vatism in AI Align­ment Research

Gordon Seidoh WorleyMay 15, 2018, 3:29 AM
17 points

10 votes

Overall karma indicates overall quality.

13 comments1 min readLW link

Pre­dicted AI al­ign­ment event/​meet­ing calendar

rmoehnAug 14, 2019, 7:14 AM
29 points

13 votes

Overall karma indicates overall quality.

14 comments1 min readLW link

Sim­plified prefer­ences needed; sim­plified prefer­ences sufficient

Stuart_ArmstrongMar 5, 2019, 7:39 PM
29 points

12 votes

Overall karma indicates overall quality.

6 comments3 min readLW link

Re­ward func­tion learn­ing: the value function

Stuart_ArmstrongApr 24, 2018, 4:29 PM
9 points

7 votes

Overall karma indicates overall quality.

0 comments11 min readLW link

Re­ward func­tion learn­ing: the learn­ing process

Stuart_ArmstrongApr 24, 2018, 12:56 PM
6 points

3 votes

Overall karma indicates overall quality.

11 comments8 min readLW link

Utility ver­sus Re­ward func­tion: par­tial equivalence

Stuart_ArmstrongApr 13, 2018, 2:58 PM
17 points

8 votes

Overall karma indicates overall quality.

5 comments5 min readLW link

Full toy model for prefer­ence learning

Stuart_ArmstrongOct 16, 2019, 11:06 AM
20 points

6 votes

Overall karma indicates overall quality.

2 comments12 min readLW link

New(ish) AI con­trol ideas

Stuart_ArmstrongOct 31, 2017, 12:52 PM
0 points

0 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Rig­ging is a form of wireheading

Stuart_ArmstrongMay 3, 2018, 12:50 PM
11 points

9 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

The re­ward en­g­ineer­ing prob­lem

paulfchristianoJan 16, 2019, 6:47 PM
26 points

6 votes

Overall karma indicates overall quality.

3 comments7 min readLW link

AI co­op­er­a­tion in practice

cousin_itJul 30, 2010, 4:21 PM
37 points

31 votes

Overall karma indicates overall quality.

166 comments1 min readLW link

Ex­am­ples of AI’s be­hav­ing badly

Stuart_ArmstrongJul 16, 2015, 10:01 AM
41 points

27 votes

Overall karma indicates overall quality.

37 comments1 min readLW link

Con­trol­ling Con­stant Programs

Vladimir_NesovSep 5, 2010, 1:45 PM
34 points

38 votes

Overall karma indicates overall quality.

33 comments5 min readLW link

Recom­mended Read­ing for Friendly AI Research

Vladimir_NesovOct 9, 2010, 1:46 PM
36 points

33 votes

Overall karma indicates overall quality.

30 comments2 min readLW link

Autism, Wat­son, the Tur­ing test, and Gen­eral Intelligence

Stuart_ArmstrongSep 24, 2013, 11:00 AM
11 points

10 votes

Overall karma indicates overall quality.

22 comments1 min readLW link

Pes­simism About Un­known Un­knowns In­spires Conservatism

michaelcohenFeb 3, 2020, 2:48 PM
31 points

11 votes

Overall karma indicates overall quality.

2 comments5 min readLW link

The Na­tional Se­cu­rity Com­mis­sion on Ar­tifi­cial In­tel­li­gence Wants You (to sub­mit es­says and ar­ti­cles on the fu­ture of gov­ern­ment AI policy)

quanticleJul 18, 2019, 5:21 PM
30 points

9 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(warontherocks.com)

Sys­tems Eng­ineer­ing and the META Program

ryan_bDec 20, 2018, 8:19 PM
30 points

13 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Hu­man er­rors, hu­man values

PhilGoetzApr 9, 2011, 2:50 AM
45 points

44 votes

Overall karma indicates overall quality.

138 comments1 min readLW link

ISO: Name of Problem

johnswentworthJul 24, 2018, 5:15 PM
28 points

13 votes

Overall karma indicates overall quality.

15 comments1 min readLW link

Muehlhauser-Go­ertzel Dialogue, Part 1

lukeprogMar 16, 2012, 5:12 PM
42 points

30 votes

Overall karma indicates overall quality.

161 comments33 min readLW link

Speci­fi­ca­tion gam­ing ex­am­ples in AI

Samuel RødalNov 10, 2018, 12:00 PM
24 points

9 votes

Overall karma indicates overall quality.

6 comments1 min readLW link
(docs.google.com)

Su­per­in­tel­li­gence Read­ing Group—Sec­tion 1: Past Devel­op­ments and Pre­sent Capabilities

KatjaGraceSep 16, 2014, 1:00 AM
43 points

28 votes

Overall karma indicates overall quality.

233 comments7 min readLW link

[Question] What are the differ­ences be­tween all the iter­a­tive/​re­cur­sive ap­proaches to AI al­ign­ment?

riceissaSep 21, 2019, 2:09 AM
30 points

9 votes

Overall karma indicates overall quality.

14 comments2 min readLW link

Al­gorith­mic Similarity

LukasMAug 23, 2019, 4:39 PM
27 points

17 votes

Overall karma indicates overall quality.

10 comments11 min readLW link

Direc­tions and desider­ata for AI alignment

paulfchristianoJan 13, 2019, 7:47 AM
47 points

10 votes

Overall karma indicates overall quality.

1 comment14 min readLW link

Friendly AI Re­search and Taskification

multifoliateroseDec 14, 2010, 6:30 AM
30 points

31 votes

Overall karma indicates overall quality.

47 comments5 min readLW link

Against easy su­per­in­tel­li­gence: the un­fore­seen fric­tion argument

Stuart_ArmstrongJul 10, 2013, 1:47 PM
39 points

25 votes

Overall karma indicates overall quality.

48 comments5 min readLW link

[Question] Why are the peo­ple who could be do­ing safety re­search, but aren’t, do­ing some­thing else?

Adam SchollAug 29, 2019, 8:51 AM
27 points

6 votes

Overall karma indicates overall quality.

19 comments1 min readLW link

TV’s “Ele­men­tary” Tack­les Friendly AI and X-Risk—“Bella” (Pos­si­ble Spoilers)

pjebyNov 22, 2014, 7:51 PM
48 points

31 votes

Overall karma indicates overall quality.

18 comments2 min readLW link

Univer­sal­ity Unwrapped

adamShimiAug 21, 2020, 6:53 PM
28 points

10 votes

Overall karma indicates overall quality.

2 comments18 min readLW link

AI Risk and Op­por­tu­nity: Hu­man­ity’s Efforts So Far

lukeprogMar 21, 2012, 2:49 AM
53 points

38 votes

Overall karma indicates overall quality.

49 comments23 min readLW link

Learn­ing with catastrophes

paulfchristianoJan 23, 2019, 3:01 AM
27 points

9 votes

Overall karma indicates overall quality.

9 comments4 min readLW link

[Question] De­gree of du­pli­ca­tion and co­or­di­na­tion in pro­jects that ex­am­ine com­put­ing prices, AI progress, and re­lated top­ics?

riceissaApr 23, 2019, 12:27 PM
26 points

10 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

Solv­ing the AI Race Finalists

Gordon Seidoh WorleyJul 19, 2018, 9:04 PM
24 points

10 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(medium.com)

An Agent is a Wor­ldline in Teg­mark V

komponistoJul 12, 2018, 5:12 AM
24 points

13 votes

Overall karma indicates overall quality.

12 comments2 min readLW link

Towards for­mal­iz­ing universality

paulfchristianoJan 13, 2019, 8:39 PM
27 points

6 votes

Overall karma indicates overall quality.

19 comments18 min readLW link

Con­cep­tual Anal­y­sis for AI Align­ment

David Scott Krueger (formerly: capybaralet)Dec 30, 2018, 12:46 AM
26 points

9 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

Gw­ern’s “Why Tool AIs Want to Be Agent AIs: The Power of Agency”

habrykaMay 5, 2019, 5:11 AM
26 points

9 votes

Overall karma indicates overall quality.

3 comments1 min readLW link
(www.gwern.net)

[Question] Why not tool AI?

smitheeJan 19, 2019, 10:18 PM
19 points

8 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

Su­per­in­tel­li­gence 16: Tool AIs

KatjaGraceDec 30, 2014, 2:00 AM
12 points

8 votes

Overall karma indicates overall quality.

37 comments7 min readLW link

Think­ing of tool AIs

Michele CampoloNov 20, 2019, 9:47 PM
6 points

5 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

Re­ply to Holden on ‘Tool AI’

Eliezer YudkowskyJun 12, 2012, 6:00 PM
152 points

124 votes

Overall karma indicates overall quality.

357 comments17 min readLW link

Re­ply to Holden on The Sin­gu­lar­ity Institute

lukeprogJul 10, 2012, 11:20 PM
69 points

54 votes

Overall karma indicates overall quality.

215 comments26 min readLW link

Levels of AI Self-Im­prove­ment

avturchinApr 29, 2018, 11:45 AM
11 points

9 votes

Overall karma indicates overall quality.

0 comments39 min readLW link

AI: re­quire­ments for per­ni­cious policies

Stuart_ArmstrongJul 17, 2015, 2:18 PM
11 points

8 votes

Overall karma indicates overall quality.

3 comments3 min readLW link

Tools want to be­come agents

Stuart_ArmstrongJul 4, 2014, 10:12 AM
24 points

19 votes

Overall karma indicates overall quality.

81 comments1 min readLW link

Su­per­in­tel­li­gence read­ing group

KatjaGraceAug 31, 2014, 2:59 PM
31 points

22 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

Su­per­in­tel­li­gence Read­ing Group 2: Fore­cast­ing AI

KatjaGraceSep 23, 2014, 1:00 AM
17 points

13 votes

Overall karma indicates overall quality.

109 comments11 min readLW link

Su­per­in­tel­li­gence Read­ing Group 3: AI and Uploads

KatjaGraceSep 30, 2014, 1:00 AM
17 points

10 votes

Overall karma indicates overall quality.

139 comments6 min readLW link

SRG 4: Biolog­i­cal Cog­ni­tion, BCIs, Organizations

KatjaGraceOct 7, 2014, 1:00 AM
14 points

10 votes

Overall karma indicates overall quality.

139 comments5 min readLW link

Su­per­in­tel­li­gence 5: Forms of Superintelligence

KatjaGraceOct 14, 2014, 1:00 AM
22 points

14 votes

Overall karma indicates overall quality.

114 comments5 min readLW link

Su­per­in­tel­li­gence 6: In­tel­li­gence ex­plo­sion kinetics

KatjaGraceOct 21, 2014, 1:00 AM
15 points

10 votes

Overall karma indicates overall quality.

68 comments8 min readLW link

Su­per­in­tel­li­gence 7: De­ci­sive strate­gic advantage

KatjaGraceOct 28, 2014, 1:01 AM
18 points

12 votes

Overall karma indicates overall quality.

60 comments6 min readLW link

Su­per­in­tel­li­gence 8: Cog­ni­tive superpowers

KatjaGraceNov 4, 2014, 2:01 AM
14 points

11 votes

Overall karma indicates overall quality.

96 comments6 min readLW link

Su­per­in­tel­li­gence 9: The or­thog­o­nal­ity of in­tel­li­gence and goals

KatjaGraceNov 11, 2014, 2:00 AM
13 points

11 votes

Overall karma indicates overall quality.

80 comments7 min readLW link

Su­per­in­tel­li­gence 10: In­stru­men­tally con­ver­gent goals

KatjaGraceNov 18, 2014, 2:00 AM
13 points

11 votes

Overall karma indicates overall quality.

33 comments5 min readLW link

Su­per­in­tel­li­gence 11: The treach­er­ous turn

KatjaGraceNov 25, 2014, 2:00 AM
16 points

14 votes

Overall karma indicates overall quality.

50 comments6 min readLW link

Su­per­in­tel­li­gence 12: Mal­ig­nant failure modes

KatjaGraceDec 2, 2014, 2:02 AM
15 points

11 votes

Overall karma indicates overall quality.

51 comments5 min readLW link

Su­per­in­tel­li­gence 13: Ca­pa­bil­ity con­trol methods

KatjaGraceDec 9, 2014, 2:00 AM
14 points

9 votes

Overall karma indicates overall quality.

48 comments6 min readLW link

Su­per­in­tel­li­gence 14: Mo­ti­va­tion se­lec­tion methods

KatjaGraceDec 16, 2014, 2:00 AM
9 points

6 votes

Overall karma indicates overall quality.

28 comments5 min readLW link

Su­per­in­tel­li­gence 15: Or­a­cles, ge­nies and sovereigns

KatjaGraceDec 23, 2014, 2:01 AM
11 points

9 votes

Overall karma indicates overall quality.

30 comments7 min readLW link

Su­per­in­tel­li­gence 17: Mul­tipo­lar scenarios

KatjaGraceJan 6, 2015, 6:44 AM
9 points

6 votes

Overall karma indicates overall quality.

38 comments6 min readLW link

Su­per­in­tel­li­gence 18: Life in an al­gorith­mic economy

KatjaGraceJan 13, 2015, 2:00 AM
10 points

5 votes

Overall karma indicates overall quality.

52 comments6 min readLW link

Su­per­in­tel­li­gence 19: Post-tran­si­tion for­ma­tion of a singleton

KatjaGraceJan 20, 2015, 2:00 AM
12 points

8 votes

Overall karma indicates overall quality.

35 comments7 min readLW link

Su­per­in­tel­li­gence 20: The value-load­ing problem

KatjaGraceJan 27, 2015, 2:00 AM
8 points

5 votes

Overall karma indicates overall quality.

21 comments6 min readLW link

Su­per­in­tel­li­gence 21: Value learning

KatjaGraceFeb 3, 2015, 2:01 AM
12 points

8 votes

Overall karma indicates overall quality.

33 comments4 min readLW link

Su­per­in­tel­li­gence 22: Emu­la­tion mod­u­la­tion and in­sti­tu­tional design

KatjaGraceFeb 10, 2015, 2:06 AM
13 points

9 votes

Overall karma indicates overall quality.

11 comments6 min readLW link

Su­per­in­tel­li­gence 23: Co­her­ent ex­trap­o­lated volition

KatjaGraceFeb 17, 2015, 2:00 AM
11 points

6 votes

Overall karma indicates overall quality.

97 comments7 min readLW link

Su­per­in­tel­li­gence 24: Mo­ral­ity mod­els and “do what I mean”

KatjaGraceFeb 24, 2015, 2:00 AM
13 points

8 votes

Overall karma indicates overall quality.

47 comments6 min readLW link

Ob­jec­tions to Co­her­ent Ex­trap­o­lated Volition

XiXiDuNov 22, 2011, 10:32 AM
12 points

45 votes

Overall karma indicates overall quality.

56 comments3 min readLW link

CEV: co­her­ence ver­sus extrapolation

Stuart_ArmstrongSep 22, 2014, 11:24 AM
21 points

17 votes

Overall karma indicates overall quality.

17 comments2 min readLW link

What if AI doesn’t quite go FOOM?

Mass_DriverJun 20, 2010, 12:03 AM
16 points

22 votes

Overall karma indicates overall quality.

191 comments5 min readLW link

Su­per­in­tel­li­gence 25: Com­po­nents list for ac­quiring values

KatjaGraceMar 3, 2015, 2:01 AM
11 points

7 votes

Overall karma indicates overall quality.

12 comments8 min readLW link

Su­per­in­tel­li­gence 26: Science and tech­nol­ogy strategy

KatjaGraceMar 10, 2015, 1:43 AM
14 points

9 votes

Overall karma indicates overall quality.

21 comments6 min readLW link

Su­per­in­tel­li­gence 27: Path­ways and enablers

KatjaGraceMar 17, 2015, 1:00 AM
15 points

11 votes

Overall karma indicates overall quality.

21 comments8 min readLW link

Su­per­in­tel­li­gence 28: Collaboration

KatjaGraceMar 24, 2015, 1:29 AM
13 points

8 votes

Overall karma indicates overall quality.

21 comments6 min readLW link

Su­per­in­tel­li­gence 29: Crunch time

KatjaGraceMar 31, 2015, 4:24 AM
14 points

9 votes

Overall karma indicates overall quality.

27 comments6 min readLW link

Univer­sal agents and util­ity functions

AnjaNov 14, 2012, 4:05 AM
43 points

32 votes

Overall karma indicates overall quality.

38 comments6 min readLW link

Look­ing for re­mote writ­ing part­ners (for AI al­ign­ment re­search)

rmoehnOct 1, 2019, 2:16 AM
23 points

8 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

Self-Su­per­vised Learn­ing and AGI Safety

Steven ByrnesAug 7, 2019, 2:21 PM
29 points

11 votes

Overall karma indicates overall quality.

9 comments12 min readLW link

Which of these five AI al­ign­ment re­search pro­jects ideas are no good?

rmoehnAug 8, 2019, 7:17 AM
25 points

9 votes

Overall karma indicates overall quality.

13 comments1 min readLW link

Un­der­stand­ing understanding

mthqAug 23, 2019, 6:10 PM
24 points

18 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

Eval­u­at­ing Ex­ist­ing Ap­proaches to AGI Alignment

Gordon Seidoh WorleyMar 27, 2018, 7:57 PM
12 points

5 votes

Overall karma indicates overall quality.

0 comments4 min readLW link
(mapandterritory.org)

CEV: a util­i­tar­ian critique

PabloJan 26, 2013, 4:12 PM
32 points

48 votes

Overall karma indicates overall quality.

94 comments5 min readLW link

Vingean Reflec­tion: Reli­able Rea­son­ing for Self-Im­prov­ing Agents

So8resJan 15, 2015, 10:47 PM
37 points

28 votes

Overall karma indicates overall quality.

5 comments9 min readLW link

Slide deck: In­tro­duc­tion to AI Safety

Aryeh EnglanderJan 29, 2020, 3:57 PM
22 points

10 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(drive.google.com)

The Self-Unaware AI Oracle

Steven ByrnesJul 22, 2019, 7:04 PM
21 points

10 votes

Overall karma indicates overall quality.

38 comments8 min readLW link

May Gw­ern.net newslet­ter (w/​GPT-3 com­men­tary)

gwernJun 2, 2020, 3:40 PM
32 points

12 votes

Overall karma indicates overall quality.

7 comments1 min readLW link
(www.gwern.net)

Build a Causal De­ci­sion Theorist

michaelcohenSep 23, 2019, 8:43 PM
1 point

3 votes

Overall karma indicates overall quality.

14 comments4 min readLW link

A trick for Safer GPT-N

RaziedAug 23, 2020, 12:39 AM
7 points

4 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

In­tro­duc­tion To The In­fra-Bayesi­anism Sequence

Aug 26, 2020, 8:31 PM
104 points

42 votes

Overall karma indicates overall quality.

64 comments14 min readLW link2 reviews

Model splin­ter­ing: mov­ing from one im­perfect model to another

Stuart_ArmstrongAug 27, 2020, 11:53 AM
74 points

26 votes

Overall karma indicates overall quality.

10 comments33 min readLW link

Al­gorith­mic Progress in Six Domains

lukeprogAug 3, 2013, 2:29 AM
38 points

29 votes

Overall karma indicates overall quality.

32 comments1 min readLW link

[Question] What are some good ex­am­ples of in­cor­rigi­bil­ity?

RyanCareyApr 28, 2019, 12:22 AM
23 points

6 votes

Overall karma indicates overall quality.

17 comments1 min readLW link

Safely and use­fully spec­tat­ing on AIs op­ti­miz­ing over toy worlds

AlexMennenJul 31, 2018, 6:30 PM
24 points

9 votes

Overall karma indicates overall quality.

16 comments2 min readLW link

Up­dates and ad­di­tions to “Embed­ded Agency”

Aug 29, 2020, 4:22 AM
73 points

23 votes

Overall karma indicates overall quality.

1 comment3 min readLW link

[LINK] Ter­ror­ists tar­get AI researchers

RobertLumleySep 15, 2011, 2:22 PM
32 points

27 votes

Overall karma indicates overall quality.

35 comments1 min readLW link

Analysing: Danger­ous mes­sages from fu­ture UFAI via Oracles

Stuart_ArmstrongNov 22, 2019, 2:17 PM
22 points

11 votes

Overall karma indicates overall quality.

16 comments4 min readLW link

Ex­plor­ing Botworld

So8resApr 30, 2014, 10:29 PM
34 points

22 votes

Overall karma indicates overall quality.

2 comments6 min readLW link

in­ter­pret­ing GPT: the logit lens

nostalgebraistAug 31, 2020, 2:47 AM
158 points

67 votes

Overall karma indicates overall quality.

32 comments11 min readLW link

From GPT to AGI

ChristianKlAug 31, 2020, 1:28 PM
6 points

6 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

Log­i­cal or Con­nec­tion­ist AI?

Eliezer YudkowskyNov 17, 2008, 8:03 AM
39 points

28 votes

Overall karma indicates overall quality.

26 comments9 min readLW link

Ar­tifi­cial In­tel­li­gence and Life Sciences (Why Big Data is not enough to cap­ture biolog­i­cal sys­tems?)

HansNaujJan 15, 2020, 1:59 AM
6 points

8 votes

Overall karma indicates overall quality.

3 comments6 min readLW link

The Case against Killer Robots (link)

D_AlexNov 20, 2012, 7:47 AM
12 points

12 votes

Overall karma indicates overall quality.

25 comments1 min readLW link

Near-Term Risk: Killer Robots a Threat to Free­dom and Democracy

EpiphanyJun 14, 2013, 6:28 AM
15 points

33 votes

Overall karma indicates overall quality.

105 comments2 min readLW link

Muehlhauser-Wang Dialogue

lukeprogApr 22, 2012, 10:40 PM
34 points

27 votes

Overall karma indicates overall quality.

288 comments12 min readLW link

Google may be try­ing to take over the world

[deleted]Jan 27, 2014, 9:33 AM
33 points

30 votes

Overall karma indicates overall quality.

133 comments1 min readLW link

Gw­ern about cen­taurs: there is no chance that any use­ful man+ma­chine com­bi­na­tion will work to­gether for more than 10 years, as hu­mans soon will be only a liability

avturchinDec 15, 2018, 9:32 PM
31 points

12 votes

Overall karma indicates overall quality.

4 comments1 min readLW link
(www.reddit.com)

Q&A with Abram Dem­ski on risks from AI

XiXiDuJan 17, 2012, 9:43 AM
33 points

27 votes

Overall karma indicates overall quality.

71 comments9 min readLW link

Q&A with ex­perts on risks from AI #2

XiXiDuJan 9, 2012, 7:40 PM
22 points

22 votes

Overall karma indicates overall quality.

29 comments7 min readLW link

Let the AI teach you how to flirt

DirectedEvolutionSep 17, 2020, 7:04 PM
47 points

26 votes

Overall karma indicates overall quality.

11 comments2 min readLW link

On­line AI Safety Dis­cus­sion Day

Linda LinseforsOct 8, 2020, 12:11 PM
5 points

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

New(ish) AI con­trol ideas

Stuart_ArmstrongMar 5, 2015, 5:03 PM
34 points

25 votes

Overall karma indicates overall quality.

14 comments3 min readLW link

Not Tak­ing Over the World

Eliezer YudkowskyDec 15, 2008, 10:18 PM
35 points

29 votes

Overall karma indicates overall quality.

97 comments4 min readLW link

Nat­u­ral­is­tic trust among AIs: The parable of the the­sis ad­vi­sor’s theorem

BenyaDec 15, 2013, 8:32 AM
36 points

25 votes

Overall karma indicates overall quality.

20 comments6 min readLW link

The Solomonoff Prior is Malign

Mark XuOct 14, 2020, 1:33 AM
148 points

60 votes

Overall karma indicates overall quality.

52 comments16 min readLW link3 reviews

Twenty-three AI al­ign­ment re­search pro­ject definitions

rmoehnFeb 3, 2020, 10:21 PM
23 points

10 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

When Good­hart­ing is op­ti­mal: lin­ear vs diminish­ing re­turns, un­likely vs likely, and other factors

Stuart_ArmstrongDec 19, 2019, 1:55 PM
24 points

8 votes

Overall karma indicates overall quality.

18 comments7 min readLW link

[Question] As a Washed Up Former Data Scien­tist and Ma­chine Learn­ing Re­searcher What Direc­tion Should I Go In Now?

DarklightOct 19, 2020, 8:13 PM
13 points

7 votes

Overall karma indicates overall quality.

7 comments3 min readLW link

Ar­tifi­cial Mys­te­ri­ous Intelligence

Eliezer YudkowskyDec 7, 2008, 8:05 PM
29 points

22 votes

Overall karma indicates overall quality.

24 comments5 min readLW link

A Pre­ma­ture Word on AI

Eliezer YudkowskyMay 31, 2008, 5:48 PM
26 points

20 votes

Overall karma indicates overall quality.

69 comments8 min readLW link

Let’s reim­ple­ment EURISKO!

cousin_itJun 11, 2009, 4:28 PM
23 points

42 votes

Overall karma indicates overall quality.

162 comments1 min readLW link

Cor­rigi­bil­ity thoughts III: ma­nipu­lat­ing ver­sus deceiving

Stuart_ArmstrongJan 18, 2017, 3:57 PM
3 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

[Question] [Meta] Do you want AIS We­bi­nars?

Linda LinseforsMar 21, 2020, 4:01 PM
18 points

10 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

New ar­ti­cle from Oren Etzioni

Aryeh EnglanderFeb 25, 2020, 3:25 PM
19 points

8 votes

Overall karma indicates overall quality.

19 comments2 min readLW link

Sin­gle­tons Rule OK

Eliezer YudkowskyNov 30, 2008, 4:45 PM
20 points

18 votes

Overall karma indicates overall quality.

47 comments5 min readLW link

“On the Im­pos­si­bil­ity of Su­per­sized Machines”

crmflynnMar 31, 2017, 11:32 PM
24 points

15 votes

Overall karma indicates overall quality.

4 comments1 min readLW link
(philpapers.org)

Non­sen­tient Optimizers

Eliezer YudkowskyDec 27, 2008, 2:32 AM
34 points

27 votes

Overall karma indicates overall quality.

48 comments6 min readLW link

Build­ing Some­thing Smarter

Eliezer YudkowskyNov 2, 2008, 5:00 PM
22 points

16 votes

Overall karma indicates overall quality.

57 comments4 min readLW link

Let’s Read: an es­say on AI Theology

Yuxi_LiuJul 4, 2019, 7:50 AM
22 points

10 votes

Overall karma indicates overall quality.

9 comments7 min readLW link

Wanted: Python open source volunteers

Eliezer YudkowskyMar 11, 2009, 4:59 AM
16 points

16 votes

Overall karma indicates overall quality.

13 comments1 min readLW link

Equil­ibrium and prior se­lec­tion prob­lems in mul­ti­po­lar deployment

JesseCliftonApr 2, 2020, 8:06 PM
20 points

10 votes

Overall karma indicates overall quality.

11 comments11 min readLW link

[Question] The Si­mu­la­tion Epiphany Problem

Koen.HoltmanOct 31, 2019, 10:12 PM
15 points

10 votes

Overall karma indicates overall quality.

13 comments4 min readLW link

Chang­ing ac­cepted pub­lic opinion and Skynet

RokoMay 22, 2009, 11:05 AM
17 points

20 votes

Overall karma indicates overall quality.

71 comments2 min readLW link

In­tro­duc­ing CADIE

MBlumeApr 1, 2009, 7:32 AM
0 points

15 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

Deep­mind Plans for Rat-Level AI

moridinamaelAug 18, 2016, 4:26 PM
34 points

23 votes

Overall karma indicates overall quality.

9 comments1 min readLW link

“Robot sci­en­tists can think for them­selves”

CronoDASApr 2, 2009, 9:16 PM
−1 points

4 votes

Overall karma indicates overall quality.

11 comments1 min readLW link

Au­tomat­ing rea­son­ing about the fu­ture at Ought

jungofthewonNov 9, 2020, 9:51 PM
17 points

5 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(ought.org)

Neu­ral pro­gram syn­the­sis is a dan­ger­ous technology

syllogismJan 12, 2018, 4:19 PM
10 points

9 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

New, Brief Pop­u­lar-Level In­tro­duc­tion to AI Risks and Superintelligence

LyleNJan 23, 2015, 3:43 PM
33 points

23 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

In the be­gin­ning, Dart­mouth cre­ated the AI and the hype

Stuart_ArmstrongJan 24, 2013, 4:49 PM
33 points

23 votes

Overall karma indicates overall quality.

22 comments1 min readLW link

Fun­da­men­tal Philo­soph­i­cal Prob­lems In­her­ent in AI discourse

AlexSadlerSep 16, 2018, 9:03 PM
23 points

14 votes

Overall karma indicates overall quality.

1 comment17 min readLW link

Re­search Pri­ori­ties for Ar­tifi­cial In­tel­li­gence: An Open Letter

jimrandomhJan 11, 2015, 7:52 PM
38 points

24 votes

Overall karma indicates overall quality.

11 comments1 min readLW link

[Question] How can I help re­search Friendly AI?

avichapmanJul 9, 2019, 12:15 AM
22 points

8 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

FAI Re­search Con­straints and AGI Side Effects

JustinShovelainJun 3, 2015, 7:25 PM
26 points

23 votes

Overall karma indicates overall quality.

59 comments7 min readLW link

[Question] How to deal with a mis­lead­ing con­fer­ence talk about AI risk?

rmoehnJun 27, 2019, 9:04 PM
21 points

9 votes

Overall karma indicates overall quality.

13 comments4 min readLW link

Im­pli­ca­tions of Quan­tum Com­put­ing for Ar­tifi­cial In­tel­li­gence Align­ment Research

Aug 22, 2019, 10:33 AM
24 points

12 votes

Overall karma indicates overall quality.

3 comments13 min readLW link

[Question] How can labour pro­duc­tivity growth be an in­di­ca­tor of au­toma­tion?

PolytoposNov 16, 2020, 9:16 PM
2 points

1 vote

Overall karma indicates overall quality.

5 comments1 min readLW link

[Question] Should I do it?

MrLightNov 19, 2020, 1:08 AM
−3 points

3 votes

Overall karma indicates overall quality.

16 comments2 min readLW link

My in­tel­lec­tual influences

Richard_NgoNov 22, 2020, 6:00 PM
92 points

39 votes

Overall karma indicates overall quality.

1 comment5 min readLW link
(thinkingcomplete.blogspot.com)

Del­e­gated agents in prac­tice: How com­pa­nies might end up sel­l­ing AI ser­vices that act on be­half of con­sumers and coal­i­tions, and what this im­plies for safety research

RemmeltNov 26, 2020, 11:17 AM
7 points

4 votes

Overall karma indicates overall quality.

5 comments4 min readLW link

SETI Predictions

hippkeNov 30, 2020, 8:09 PM
23 points

11 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

What hap­pens when your be­liefs fully propagate

AlexeiFeb 14, 2012, 7:53 AM
29 points

51 votes

Overall karma indicates overall quality.

79 comments7 min readLW link

In­ter­ac­tive ex­plo­ra­tion of LessWrong and other large col­lec­tions of documents

Dec 20, 2020, 7:06 PM
49 points

22 votes

Overall karma indicates overall quality.

9 comments10 min readLW link

[Question] Will AGI have “hu­man” flaws?

Agustinus TheodorusDec 23, 2020, 3:43 AM
1 point

1 vote

Overall karma indicates overall quality.

2 comments1 min readLW link

Op­ti­mum num­ber of sin­gle points of failure

Douglas_ReayMar 14, 2018, 1:30 PM
7 points

6 votes

Overall karma indicates overall quality.

4 comments4 min readLW link

Don’t put all your eggs in one basket

Douglas_ReayMar 15, 2018, 8:07 AM
5 points

6 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

Defect or Cooperate

Douglas_ReayMar 16, 2018, 2:12 PM
4 points

4 votes

Overall karma indicates overall quality.

5 comments6 min readLW link

En­vi­ron­ments for kil­ling AIs

Douglas_ReayMar 17, 2018, 3:23 PM
3 points

4 votes

Overall karma indicates overall quality.

1 comment9 min readLW link

The ad­van­tage of not be­ing open-ended

Douglas_ReayMar 18, 2018, 1:50 PM
7 points

5 votes

Overall karma indicates overall quality.

2 comments6 min readLW link

Metamorphosis

Douglas_ReayApr 12, 2018, 9:53 PM
2 points

3 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Believ­able Promises

Douglas_ReayApr 16, 2018, 4:17 PM
5 points

4 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Trust­wor­thy Computing

Douglas_ReayApr 10, 2018, 7:55 AM
9 points

4 votes

Overall karma indicates overall quality.

1 comment6 min readLW link

Edge of the Cliff

akaTricksterJan 5, 2021, 5:21 PM
1 point

1 vote

Overall karma indicates overall quality.

0 comments5 min readLW link

[Question] How is re­in­force­ment learn­ing pos­si­ble in non-sen­tient agents?

SomeoneKindJan 5, 2021, 8:57 PM
3 points

2 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

AI Align­ment Us­ing Re­v­erse Simulation

Sven NilsenJan 12, 2021, 8:48 PM
1 point

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

A toy model of the con­trol problem

Stuart_ArmstrongSep 16, 2015, 2:59 PM
36 points

23 votes

Overall karma indicates overall quality.

24 comments3 min readLW link

On the na­ture of pur­pose

Nora_AmmannJan 22, 2021, 8:30 AM
28 points

13 votes

Overall karma indicates overall quality.

15 comments9 min readLW link

Learn­ing Nor­ma­tivity: Language

BunthutFeb 5, 2021, 10:26 PM
14 points

3 votes

Overall karma indicates overall quality.

4 comments8 min readLW link

Sin­gu­lar­ity&phase tran­si­tion-2. A pri­ori prob­a­bil­ity and ways to check.

Valentin2026Feb 8, 2021, 2:21 AM
1 point

1 vote

Overall karma indicates overall quality.

0 comments3 min readLW link

Non­per­son Predicates

Eliezer YudkowskyDec 27, 2008, 1:47 AM
52 points

50 votes

Overall karma indicates overall quality.

176 comments6 min readLW link

Map­ping the Con­cep­tual Ter­ri­tory in AI Ex­is­ten­tial Safety and Alignment

jbkjrFeb 12, 2021, 7:55 AM
15 points

6 votes

Overall karma indicates overall quality.

0 comments26 min readLW link

2021-03-01 Na­tional Library of Medicine Pre­sen­ta­tion: “At­las of AI: Map­ping the so­cial and eco­nomic forces be­hind AI”

IrenicTruthFeb 17, 2021, 6:23 PM
1 point

1 vote

Overall karma indicates overall quality.

0 comments2 min readLW link

Chaotic era: avoid or sur­vive?

Valentin2026Feb 22, 2021, 1:34 AM
3 points

2 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

Suffer­ing-Fo­cused Ethics in the In­finite Uni­verse. How can we re­deem our­selves if Mul­ti­verse Im­mor­tal­ity is real and sub­jec­tive death is im­pos­si­ble.

Szymon KucharskiFeb 24, 2021, 9:02 PM
−3 points

6 votes

Overall karma indicates overall quality.

4 comments70 min readLW link

AIDun­geon 3.1

Yair HalberstadtMar 1, 2021, 5:56 AM
2 points

2 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Phys­i­cal­ism im­plies ex­pe­rience never dies. So what am I go­ing to ex­pe­rience af­ter it does?

Szymon KucharskiMar 14, 2021, 2:45 PM
−2 points

10 votes

Overall karma indicates overall quality.

1 comment30 min readLW link

An An­tropic Ar­gu­ment for Post-sin­gu­lar­ity Antinatalism

monkaapMar 16, 2021, 5:40 PM
3 points

9 votes

Overall karma indicates overall quality.

4 comments3 min readLW link

[Question] Is a Self-Iter­at­ing AGI Vuln­er­a­ble to Thomp­son-style Tro­jans?

sxaeMar 25, 2021, 2:46 PM
15 points

5 votes

Overall karma indicates overall quality.

7 comments3 min readLW link

AI or­a­cles on blockchain

CaravaggioApr 6, 2021, 8:13 PM
5 points

6 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

What if AGI is near?

Wulky WilkinsenApr 14, 2021, 12:05 AM
11 points

12 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Re­view of “Why AI is Harder Than We Think”

electroswingApr 30, 2021, 6:14 PM
40 points

16 votes

Overall karma indicates overall quality.

10 comments8 min readLW link

Thoughts on the Align­ment Im­pli­ca­tions of Scal­ing Lan­guage Models

leogaoJun 2, 2021, 9:32 PM
79 points

35 votes

Overall karma indicates overall quality.

11 comments17 min readLW link

[Question] Sup­pose $1 billion is given to AI Safety. How should it be spent?

hunterglennMay 15, 2021, 11:24 PM
23 points

11 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Con­trol­ling In­tel­li­gent Agents The Only Way We Know How: Ideal Bureau­cratic Struc­ture (IBS)

Justin BullockMay 24, 2021, 12:53 PM
11 points

11 votes

Overall karma indicates overall quality.

11 comments6 min readLW link

Cu­rated con­ver­sa­tions with brilli­ant rationalists

spencergMay 28, 2021, 2:23 PM
153 points

77 votes

Overall karma indicates overall quality.

18 comments6 min readLW link

Se­cu­rity Mind­set and Or­di­nary Paranoia

Eliezer YudkowskyNov 25, 2017, 5:53 PM
98 points

66 votes

Overall karma indicates overall quality.

24 comments29 min readLW link

The Anti-Carter Basilisk

Jon GilbertMay 26, 2021, 10:56 PM
0 points

2 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Pa­ram­e­ter counts in Ma­chine Learning

Jun 19, 2021, 4:04 PM
47 points

26 votes

Overall karma indicates overall quality.

16 comments7 min readLW link

Ir­ra­tional Modesty

Tomás B.Jun 20, 2021, 7:38 PM
132 points

61 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

[Question] Thoughts on a “Se­quences In­spired” PhD Topic

goose000Jun 17, 2021, 8:36 PM
7 points

7 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

Some al­ter­na­tives to “Friendly AI”

lukeprogJun 15, 2014, 7:53 PM
30 points

24 votes

Overall karma indicates overall quality.

44 comments2 min readLW link

In­tel­li­gence with­out Consciousness

Andrew VlahosJul 7, 2021, 5:27 AM
13 points

9 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

[Question] What would it look like if it looked like AGI was very near?

Tomás B.Jul 12, 2021, 3:22 PM
52 points

29 votes

Overall karma indicates overall quality.

25 comments1 min readLW link

Is the ar­gu­ment that AI is an xrisk valid?

MACannonJul 19, 2021, 1:20 PM
5 points

13 votes

Overall karma indicates overall quality.

62 comments1 min readLW link
(onlinelibrary.wiley.com)

[Question] Jay­ne­sian in­ter­pre­ta­tion—How does “es­ti­mat­ing prob­a­bil­ities” make sense?

Haziq MuhammadJul 21, 2021, 9:36 PM
4 points

3 votes

Overall karma indicates overall quality.

40 comments1 min readLW link

The biolog­i­cal in­tel­li­gence explosion

Rob LucasJul 25, 2021, 1:08 PM
8 points

4 votes

Overall karma indicates overall quality.

6 comments4 min readLW link

[Question] Do Bayesi­ans like Bayesian model Aver­ag­ing?

Haziq MuhammadAug 2, 2021, 12:24 PM
4 points

3 votes

Overall karma indicates overall quality.

13 comments1 min readLW link

[Question] Ques­tion about Test-sets and Bayesian ma­chine learn­ing

Haziq MuhammadAug 9, 2021, 5:16 PM
2 points

1 vote

Overall karma indicates overall quality.

8 comments1 min readLW link

[Question] Halpern’s pa­per—A re­fu­ta­tion of Cox’s the­o­rem?

Haziq MuhammadAug 11, 2021, 9:25 AM
11 points

6 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

New GPT-3 competitor

Quintin PopeAug 12, 2021, 7:05 AM
32 points

22 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

[Question] Jaynes-Cox Prob­a­bil­ity: Are plau­si­bil­ities ob­jec­tive?

Haziq MuhammadAug 12, 2021, 2:23 PM
9 points

4 votes

Overall karma indicates overall quality.

17 comments1 min readLW link

A gen­tle apoc­a­lypse

pchvykovAug 16, 2021, 5:03 AM
3 points

4 votes

Overall karma indicates overall quality.

5 comments3 min readLW link

[Question] Is it worth mak­ing a database for moral pre­dic­tions?

Jonas HallgrenAug 16, 2021, 2:51 PM
1 point

1 vote

Overall karma indicates overall quality.

0 comments2 min readLW link

Cyn­i­cal ex­pla­na­tions of FAI crit­ics (in­clud­ing my­self)

Wei DaiAug 13, 2012, 9:19 PM
31 points

32 votes

Overall karma indicates overall quality.

49 comments1 min readLW link

[Question] Has Van Horn fixed Cox’s the­o­rem?

Haziq MuhammadAug 29, 2021, 6:36 PM
9 points

5 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

The Gover­nance Prob­lem and the “Pretty Good” X-Risk

Zach Stein-PerlmanAug 29, 2021, 6:00 PM
5 points

4 votes

Overall karma indicates overall quality.

2 comments11 min readLW link

Limits of and to (ar­tifi­cial) Intelligence

MoritzGAug 25, 2019, 10:16 PM
1 point

1 vote

Overall karma indicates overall quality.

3 comments7 min readLW link

Grokking the In­ten­tional Stance

jbkjrAug 31, 2021, 3:49 PM
41 points

20 votes

Overall karma indicates overall quality.

20 comments20 min readLW link

In­tel­li­gence, Fast and Slow

Mateusz MazurkiewiczSep 1, 2021, 7:52 PM
−3 points

4 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

[Question] Is LessWrong dead with­out Cox’s the­o­rem?

Haziq MuhammadSep 4, 2021, 5:45 AM
−2 points

4 votes

Overall karma indicates overall quality.

88 comments1 min readLW link

Align­ment via man­u­ally im­ple­ment­ing the util­ity function

ChantielSep 7, 2021, 8:20 PM
1 point

2 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

Pivot!

Carlos RamirezSep 12, 2021, 8:39 PM
−19 points

10 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

The Me­taethics and Nor­ma­tive Ethics of AGI Value Align­ment: Many Ques­tions, Some Implications

Eleos Arete CitriniSep 16, 2021, 4:13 PM
6 points

4 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

Why will AI be dan­ger­ous?

LegionnaireFeb 4, 2022, 11:41 PM
37 points

24 votes

Overall karma indicates overall quality.

14 comments1 min readLW link

Oc­cam’s Ra­zor and the Univer­sal Prior

Peter ChatainOct 3, 2021, 3:23 AM
22 points

15 votes

Overall karma indicates overall quality.

5 comments21 min readLW link

We’re Red­wood Re­search, we do ap­plied al­ign­ment re­search, AMA

Nate ThomasOct 6, 2021, 5:51 AM
56 points

17 votes

Overall karma indicates overall quality.

3 comments2 min readLW link
(forum.effectivealtruism.org)

[LINK] Wait But Why—The AI Revolu­tion Part 2

Adam ZernerFeb 4, 2015, 4:02 PM
27 points

18 votes

Overall karma indicates overall quality.

88 comments1 min readLW link

Slate Star Codex Notes on the Asilo­mar Con­fer­ence on Benefi­cial AI

Gunnar_ZarnckeFeb 7, 2017, 12:14 PM
24 points

14 votes

Overall karma indicates overall quality.

8 comments1 min readLW link
(slatestarcodex.com)

Three Ap­proaches to “Friendli­ness”

Wei DaiJul 17, 2013, 7:46 AM
32 points

23 votes

Overall karma indicates overall quality.

86 comments3 min readLW link

P₂B: Plan to P₂B Better

Oct 24, 2021, 3:21 PM
33 points

17 votes

Overall karma indicates overall quality.

14 comments6 min readLW link

A Roadmap to a Post-Scarcity Economy

lorepieriOct 30, 2021, 9:04 AM
3 points

2 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

What is the link be­tween al­tru­ism and in­tel­li­gence?

Ruralvisitor83Nov 3, 2021, 11:59 PM
3 points

2 votes

Overall karma indicates overall quality.

13 comments1 min readLW link

Model­ing the im­pact of safety agendas

Ben CottierNov 5, 2021, 7:46 PM
51 points

13 votes

Overall karma indicates overall quality.

6 comments10 min readLW link

[Question] Does any­one know what Marvin Min­sky is talk­ing about here?

delton137Nov 19, 2021, 12:56 AM
1 point

1 vote

Overall karma indicates overall quality.

6 comments3 min readLW link

In­te­grat­ing Three Models of (Hu­man) Cognition

jbkjrNov 23, 2021, 1:06 AM
29 points

14 votes

Overall karma indicates overall quality.

4 comments32 min readLW link

[Question] I cur­rently trans­late AGI-re­lated texts to Rus­sian. Is that use­ful?

TapataktNov 27, 2021, 5:51 PM
29 points

11 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

Ques­tion/​Is­sue with the 5/​10 Problem

acgtNov 29, 2021, 10:45 AM
6 points

2 votes

Overall karma indicates overall quality.

3 comments3 min readLW link

Can solip­sism be dis­proven?

nx2059Dec 4, 2021, 8:24 AM
−2 points

9 votes

Overall karma indicates overall quality.

5 comments2 min readLW link

[Question] Misc. ques­tions about EfficientZero

Daniel KokotajloDec 4, 2021, 7:45 PM
51 points

19 votes

Overall karma indicates overall quality.

17 comments1 min readLW link

Fram­ing ap­proaches to al­ign­ment and the hard prob­lem of AI cognition

ryan_greenblattDec 15, 2021, 7:06 PM
8 points

4 votes

Overall karma indicates overall quality.

15 comments27 min readLW link

HIRING: In­form and shape a new pro­ject on AI safety at Part­ner­ship on AI

madhu_likaDec 7, 2021, 7:37 PM
1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

What role should evolu­tion­ary analo­gies play in un­der­stand­ing AI take­off speeds?

anson.hoDec 11, 2021, 1:19 AM
14 points

8 votes

Overall karma indicates overall quality.

0 comments42 min readLW link

Mo­ti­va­tions, Nat­u­ral Selec­tion, and Cur­ricu­lum Engineering

Oliver SourbutDec 16, 2021, 1:07 AM
16 points

5 votes

Overall karma indicates overall quality.

0 comments42 min readLW link

Emer­gent mod­u­lar­ity and safety

Richard_NgoOct 21, 2021, 1:54 AM
31 points

13 votes

Overall karma indicates overall quality.

15 comments3 min readLW link

Ev­i­dence Sets: Towards In­duc­tive-Bi­ases based Anal­y­sis of Pro­saic AGI

bayesian_kittenDec 16, 2021, 10:41 PM
22 points

10 votes

Overall karma indicates overall quality.

10 comments21 min readLW link

Univer­sal­ity and the “Filter”

maggiehayesDec 16, 2021, 12:47 AM
10 points

9 votes

Overall karma indicates overall quality.

3 comments11 min readLW link

[Question] Can you prove that 0 = 1?

purplelightFeb 4, 2022, 9:31 PM
−10 points

12 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Ex­pec­ta­tions In­fluence Real­ity (and AI)

purplelightFeb 4, 2022, 9:31 PM
0 points

8 votes

Overall karma indicates overall quality.

3 comments7 min readLW link

[Question] What ques­tions do you have about do­ing work on AI safety?

peterbarnettDec 21, 2021, 4:36 PM
13 points

7 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

Re­views of “Is power-seek­ing AI an ex­is­ten­tial risk?”

Joe CarlsmithDec 16, 2021, 8:48 PM
76 points

21 votes

Overall karma indicates overall quality.

20 comments1 min readLW link

Elic­it­ing La­tent Knowl­edge Via Hy­po­thet­i­cal Sensors

John_MaxwellDec 30, 2021, 3:53 PM
38 points

19 votes

Overall karma indicates overall quality.

2 comments6 min readLW link

Lat­eral Think­ing (AI safety HPMOR fan­fic)

SlytherinsMonsterJan 2, 2022, 11:50 PM
75 points

46 votes

Overall karma indicates overall quality.

9 comments5 min readLW link

SONN : What’s Next ?

D𝜋Jan 9, 2022, 8:15 AM
−17 points

13 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

An Open Philan­thropy grant pro­posal: Causal rep­re­sen­ta­tion learn­ing of hu­man preferences

PabloAMCJan 11, 2022, 11:28 AM
19 points

11 votes

Overall karma indicates overall quality.

6 comments8 min readLW link

Ac­tion: Help ex­pand fund­ing for AI Safety by co­or­di­nat­ing on NSF response

Evan R. MurphyJan 19, 2022, 10:47 PM
23 points

15 votes

Overall karma indicates overall quality.

8 comments3 min readLW link

Emo­tions = Re­ward Functions

jpyykkoJan 20, 2022, 6:46 PM
16 points

6 votes

Overall karma indicates overall quality.

10 comments5 min readLW link

[Question] Is AI Align­ment a pseu­do­science?

mocny-chlapikJan 23, 2022, 10:32 AM
21 points

41 votes

Overall karma indicates overall quality.

41 comments1 min readLW link

De­con­fus­ing Deception

J BostockJan 29, 2022, 4:43 PM
26 points

11 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

Re­vis­it­ing Brave New World Re­vis­ited (Chap­ter 3)

Justin BullockFeb 1, 2022, 5:17 PM
5 points

3 votes

Overall karma indicates overall quality.

0 comments10 min readLW link

[Question] Do mesa-op­ti­miza­tion prob­lems cor­re­late with low-slack?

sudoFeb 4, 2022, 9:11 PM
1 point

1 vote

Overall karma indicates overall quality.

1 comment1 min readLW link

Can the laws of physics/​na­ture pre­vent hell?

superads91Feb 6, 2022, 8:39 PM
−7 points

9 votes

Overall karma indicates overall quality.

10 comments2 min readLW link

Ngo and Yud­kowsky on sci­en­tific rea­son­ing and pivotal acts

Feb 21, 2022, 8:54 PM
51 points

28 votes

Overall karma indicates overall quality.

13 comments35 min readLW link

Bet­ter a Brave New World than a dead one

YitzFeb 25, 2022, 11:11 PM
8 points

8 votes

Overall karma indicates overall quality.

5 comments4 min readLW link

Be­ing an in­di­vi­d­ual al­ign­ment grantmaker

A_donorFeb 28, 2022, 8:02 PM
64 points

24 votes

Overall karma indicates overall quality.

5 comments2 min readLW link

How to de­velop safe superintelligence

martillopartMar 1, 2022, 9:57 PM
−5 points

6 votes

Overall karma indicates overall quality.

3 comments13 min readLW link

Deep Dives: My Ad­vice for Pur­su­ing Work in Re­search

scasperMar 11, 2022, 5:56 PM
21 points

19 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

One pos­si­ble ap­proach to de­velop the best pos­si­ble gen­eral learn­ing algorithm

martillopartMar 14, 2022, 7:24 PM
3 points

3 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

[Question] Our time in his­tory as ev­i­dence for simu­la­tion the­ory?

Garrett GarzonieMar 18, 2022, 3:35 AM
3 points

4 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

The weak­est ar­gu­ments for and against hu­man level AI

Stuart_ArmstrongAug 15, 2012, 11:04 AM
22 points

17 votes

Overall karma indicates overall quality.

34 comments1 min readLW link

Chris­ti­ano and Yud­kowsky on AI pre­dic­tions and hu­man intelligence

Eliezer YudkowskyFeb 23, 2022, 9:34 PM
69 points

33 votes

Overall karma indicates overall quality.

35 comments42 min readLW link

Even more cu­rated con­ver­sa­tions with brilli­ant rationalists

spencergMar 21, 2022, 11:49 PM
57 points

32 votes

Overall karma indicates overall quality.

0 comments15 min readLW link

Man­hat­tan pro­ject for al­igned AI

Chris van MerwijkMar 27, 2022, 11:41 AM
34 points

18 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

Gears-Level Men­tal Models of Trans­former Interpretability

RowanWangMar 29, 2022, 8:09 PM
56 points

27 votes

Overall karma indicates overall quality.

4 comments6 min readLW link

Meta wants to use AI to write Wikipe­dia ar­ti­cles; I am Ner­vous™

YitzMar 30, 2022, 7:05 PM
14 points

8 votes

Overall karma indicates overall quality.

12 comments1 min readLW link

[Question] If AGI were com­ing in a year, what should we do?

MichaelStJulesApr 1, 2022, 12:41 AM
20 points

8 votes

Overall karma indicates overall quality.

16 comments1 min readLW link

On Agent In­cen­tives to Ma­nipu­late Hu­man Feed­back in Multi-Agent Re­ward Learn­ing Scenarios

Francis Rhys WardApr 3, 2022, 6:20 PM
27 points

10 votes

Overall karma indicates overall quality.

11 comments8 min readLW link

[Question] How to write a LW se­quence to learn a topic?

PabloAMCApr 3, 2022, 8:09 PM
3 points

2 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Save Hu­man­ity! Breed Sapi­ent Oc­to­puses!

Yair HalberstadtApr 5, 2022, 6:39 PM
54 points

31 votes

Overall karma indicates overall quality.

17 comments1 min readLW link

What Should We Op­ti­mize—A Conversation

Johannes C. MayerApr 7, 2022, 3:47 AM
1 point

1 vote

Overall karma indicates overall quality.

0 comments14 min readLW link

The Ex­plana­tory Gap of AI

David ValdmanApr 7, 2022, 6:28 PM
1 point

4 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Progress re­port 3: clus­ter­ing trans­former neurons

Nathan Helm-BurgerApr 5, 2022, 11:13 PM
5 points

3 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

God­shat­ter Ver­sus Leg­i­bil­ity: A Fun­da­men­tally Differ­ent Ap­proach To AI Alignment

LukeOnlineApr 9, 2022, 9:43 PM
11 points

21 votes

Overall karma indicates overall quality.

14 comments7 min readLW link

Is Fish­e­rian Ru­n­away Gra­di­ent Hack­ing?

Ryan KiddApr 10, 2022, 1:47 PM
15 points

6 votes

Overall karma indicates overall quality.

7 comments4 min readLW link

The Glitch And Notes On Digi­tal Beings

GhvstApr 11, 2022, 7:46 PM
−4 points

2 votes

Overall karma indicates overall quality.

0 comments2 min readLW link
(ghvsted.com)

Post-his­tory is writ­ten by the martyrs

VeedracApr 11, 2022, 3:45 PM
37 points

19 votes

Overall karma indicates overall quality.

2 comments19 min readLW link
(www.royalroad.com)

An AI-in-a-box suc­cess model

azsantoskApr 11, 2022, 10:28 PM
16 points

8 votes

Overall karma indicates overall quality.

1 comment10 min readLW link

Ra­tion­al­ist Should Win. Not Dy­ing with Dig­nity and Fund­ing WBE.

CitizenTenApr 12, 2022, 2:14 AM
23 points

20 votes

Overall karma indicates overall quality.

15 comments5 min readLW link

Re­ward model hack­ing as a challenge for re­ward learning

Erik JennerApr 12, 2022, 9:39 AM
25 points

10 votes

Overall karma indicates overall quality.

1 comment9 min readLW link

Is tech­ni­cal AI al­ign­ment re­search a net pos­i­tive?

cranberry_bearApr 12, 2022, 1:07 PM
4 points

13 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

Another list of the­o­ries of im­pact for interpretability

Beth BarnesApr 13, 2022, 1:29 PM
32 points

20 votes

Overall karma indicates overall quality.

1 comment5 min readLW link

Some rea­sons why a pre­dic­tor wants to be a consequentialist

Lauro LangoscoApr 15, 2022, 3:02 PM
23 points

9 votes

Overall karma indicates overall quality.

16 comments5 min readLW link

Red­wood Re­search is hiring for sev­eral roles (Oper­a­tions and Tech­ni­cal)

Apr 14, 2022, 4:57 PM
29 points

13 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

[Question] Con­vince me that hu­man­ity *isn’t* doomed by AGI

YitzApr 15, 2022, 5:26 PM
60 points

32 votes

Overall karma indicates overall quality.

53 comments1 min readLW link

Another ar­gu­ment that you will let the AI out of the box

Garrett BakerApr 19, 2022, 9:54 PM
8 points

7 votes

Overall karma indicates overall quality.

16 comments2 min readLW link

For ev­ery choice of AGI difficulty, con­di­tion­ing on grad­ual take-off im­plies shorter timelines.

Francis Rhys WardApr 21, 2022, 7:44 AM
29 points

20 votes

Overall karma indicates overall quality.

13 comments3 min readLW link

Reflec­tions on My Own Miss­ing Mood

Lone PineApr 21, 2022, 4:19 PM
51 points

28 votes

Overall karma indicates overall quality.

25 comments5 min readLW link

Key ques­tions about ar­tifi­cial sen­tience: an opinionated guide

RobboApr 25, 2022, 12:09 PM
45 points

24 votes

Overall karma indicates overall quality.

31 comments18 min readLW link

[Question] What is be­ing im­proved in re­cur­sive self im­prove­ment?

Lone PineApr 25, 2022, 6:30 PM
7 points

4 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

Why Copi­lot Ac­cel­er­ates Timelines

Michaël TrazziApr 26, 2022, 10:06 PM
35 points

14 votes

Overall karma indicates overall quality.

14 comments7 min readLW link

[Question] Is it de­sir­able for the first AGI to be con­scious?

Charbel-RaphaëlMay 1, 2022, 9:29 PM
5 points

5 votes

Overall karma indicates overall quality.

12 comments1 min readLW link

[Question] What Was Your Best /​ Most Suc­cess­ful DALL-E 2 Prompt?

EvidentialMay 4, 2022, 3:16 AM
1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

Ne­go­ti­at­ing Up and Down the Si­mu­la­tion Hier­ar­chy: Why We Might Sur­vive the Unal­igned Singularity

David UdellMay 4, 2022, 4:21 AM
24 points

18 votes

Overall karma indicates overall quality.

16 comments2 min readLW link

High-stakes al­ign­ment via ad­ver­sar­ial train­ing [Red­wood Re­search re­port]

May 5, 2022, 12:59 AM
136 points

62 votes

Overall karma indicates overall quality.

29 comments9 min readLW link

Deriv­ing Con­di­tional Ex­pected Utility from Pareto-Effi­cient Decisions

Thomas KwaMay 5, 2022, 3:21 AM
23 points

7 votes

Overall karma indicates overall quality.

1 comment6 min readLW link

Tran­scripts of in­ter­views with AI researchers

Vael GatesMay 9, 2022, 5:57 AM
160 points

62 votes

Overall karma indicates overall quality.

8 comments2 min readLW link

Agency As a Nat­u­ral Abstraction

Thane RuthenisMay 13, 2022, 6:02 PM
55 points

22 votes

Overall karma indicates overall quality.

9 comments13 min readLW link

Pre­dict­ing the Elec­tions with Deep Learn­ing—Part 1 - Results

Quentin ChenevierMay 14, 2022, 12:54 PM
0 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

On sav­ing one’s world

Rob BensingerMay 17, 2022, 7:53 PM
190 points

95 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

In defence of flailing

acylhalideJun 18, 2022, 5:26 AM
10 points

11 votes

Overall karma indicates overall quality.

14 comments4 min readLW link

Re­shap­ing the AI Industry

Thane RuthenisMay 29, 2022, 10:54 PM
143 points

75 votes

Overall karma indicates overall quality.

34 comments21 min readLW link

Science for the Pos­si­ble World

Zechen ZhangMay 23, 2022, 2:01 PM
7 points

3 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

Syn­thetic Me­dia and The Fu­ture of Film

ifalphaMay 24, 2022, 5:54 AM
35 points

17 votes

Overall karma indicates overall quality.

13 comments8 min readLW link

Ex­plain­ing in­ner al­ign­ment to myself

Jeremy GillenMay 24, 2022, 11:10 PM
9 points

7 votes

Overall karma indicates overall quality.

2 comments10 min readLW link

A dis­cus­sion of the pa­per, “Large Lan­guage Models are Zero-Shot Rea­son­ers”

HiroSakurabaMay 26, 2022, 3:55 PM
7 points

6 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

On in­ner and outer al­ign­ment, and their confusion

Nina PanicksseryMay 26, 2022, 9:56 PM
6 points

5 votes

Overall karma indicates overall quality.

7 comments4 min readLW link

RL with KL penalties is bet­ter seen as Bayesian inference

May 25, 2022, 9:23 AM
90 points

38 votes

Overall karma indicates overall quality.

15 comments12 min readLW link

Bits of Op­ti­miza­tion Can Only Be Lost Over A Distance

johnswentworthMay 23, 2022, 6:55 PM
26 points

9 votes

Overall karma indicates overall quality.

15 comments2 min readLW link

Gra­da­tions of Agency

Daniel KokotajloMay 23, 2022, 1:10 AM
40 points

17 votes

Overall karma indicates overall quality.

6 comments5 min readLW link

Utilitarianism

C S SRUTHIMay 28, 2022, 7:35 PM
0 points

2 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Distil­led—AGI Safety from First Principles

Harrison GMay 29, 2022, 12:57 AM
8 points

8 votes

Overall karma indicates overall quality.

1 comment14 min readLW link

Mul­ti­ple AIs in boxes, eval­u­at­ing each other’s alignment

Moebius314May 29, 2022, 8:36 AM
7 points

9 votes

Overall karma indicates overall quality.

0 comments14 min readLW link

The im­pact you might have work­ing on AI safety

Fabien RogerMay 29, 2022, 4:31 PM
5 points

3 votes

Overall karma indicates overall quality.

1 comment4 min readLW link

My SERI MATS Application

Daniel PalekaMay 30, 2022, 2:04 AM
16 points

11 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

[Question] A ter­rify­ing var­i­ant of Boltz­mann’s brains problem

Zeruel017May 30, 2022, 8:08 PM
5 points

5 votes

Overall karma indicates overall quality.

12 comments4 min readLW link

The Re­v­erse Basilisk

Dunning K.May 30, 2022, 11:10 PM
15 points

22 votes

Overall karma indicates overall quality.

23 comments2 min readLW link

The Hard In­tel­li­gence Hy­poth­e­sis and Its Bear­ing on Suc­ces­sion In­duced Foom

DragonGodMay 31, 2022, 7:04 PM
10 points

8 votes

Overall karma indicates overall quality.

7 comments4 min readLW link

Machines vs Memes Part 1: AI Align­ment and Memetics

Harriet FarlowMay 31, 2022, 10:03 PM
16 points

8 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

[Question] What will hap­pen when an all-reach­ing AGI starts at­tempt­ing to fix hu­man char­ac­ter flaws?

Michael BrightJun 1, 2022, 6:45 PM
1 point

3 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

New co­op­er­a­tion mechanism—quadratic fund­ing with­out a match­ing pool

Filip SondejJun 5, 2022, 1:55 PM
11 points

8 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Miriam Ye­vick on why both sym­bols and net­works are nec­es­sary for ar­tifi­cial minds

Bill BenzonJun 6, 2022, 8:34 AM
1 point

1 vote

Overall karma indicates overall quality.

0 comments4 min readLW link

Six Di­men­sions of Oper­a­tional Ad­e­quacy in AGI Projects

Eliezer YudkowskyMay 30, 2022, 5:00 PM
270 points

108 votes

Overall karma indicates overall quality.

65 comments13 min readLW link

Grokking “Fore­cast­ing TAI with biolog­i­cal an­chors”

anson.hoJun 6, 2022, 6:58 PM
34 points

15 votes

Overall karma indicates overall quality.

0 comments14 min readLW link

Who mod­els the mod­els that model mod­els? An ex­plo­ra­tion of GPT-3′s in-con­text model fit­ting ability

LovreJun 7, 2022, 7:37 PM
112 points

67 votes

Overall karma indicates overall quality.

14 comments9 min readLW link

Pitch­ing an Align­ment Softball

mu_(negative)Jun 7, 2022, 4:10 AM
47 points

26 votes

Overall karma indicates overall quality.

13 comments10 min readLW link

[Question] Con­fused Thoughts on AI After­life (se­ri­ously)

EpiritoJun 7, 2022, 2:37 PM
−6 points

5 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Trans­former Re­search Ques­tions from Stained Glass Windows

StefanHexJun 8, 2022, 12:38 PM
4 points

3 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Elic­it­ing La­tent Knowl­edge (ELK) - Distil­la­tion/​Summary

Marius HobbhahnJun 8, 2022, 1:18 PM
49 points

24 votes

Overall karma indicates overall quality.

2 comments21 min readLW link

Towards Gears-Level Un­der­stand­ing of Agency

Thane RuthenisJun 16, 2022, 10:00 PM
24 points

8 votes

Overall karma indicates overall quality.

4 comments18 min readLW link

Vael Gates: Risks from Ad­vanced AI (June 2022)

Vael GatesJun 14, 2022, 12:54 AM
38 points

17 votes

Overall karma indicates overall quality.

2 comments30 min readLW link

Ex­plor­ing Mild Be­havi­our in Embed­ded Agents

Megan KinnimentJun 27, 2022, 6:56 PM
21 points

15 votes

Overall karma indicates overall quality.

3 comments18 min readLW link

Oper­a­tional­iz­ing two tasks in Gary Mar­cus’s AGI challenge

Bill BenzonJun 9, 2022, 6:31 PM
10 points

5 votes

Overall karma indicates overall quality.

3 comments8 min readLW link

A plau­si­ble story about AI risk.

DeLesley HutchinsJun 10, 2022, 2:08 AM
14 points

10 votes

Overall karma indicates overall quality.

1 comment4 min readLW link

I No Longer Believe In­tel­li­gence to be “Mag­i­cal”

DragonGodJun 10, 2022, 8:58 AM
31 points

23 votes

Overall karma indicates overall quality.

34 comments6 min readLW link

[Question] Why don’t you in­tro­duce re­ally im­pres­sive peo­ple you per­son­ally know to AI al­ign­ment (more of­ten)?

VerdenJun 11, 2022, 3:59 PM
33 points

16 votes

Overall karma indicates overall quality.

15 comments1 min readLW link

Godzilla Strategies

johnswentworthJun 11, 2022, 3:44 PM
151 points

96 votes

Overall karma indicates overall quality.

65 comments3 min readLW link

In­tu­itive Ex­pla­na­tion of AIXI

Thomas LarsenJun 12, 2022, 9:41 PM
13 points

7 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Train­ing Trace Priors

Adam JermynJun 13, 2022, 2:22 PM
12 points

8 votes

Overall karma indicates overall quality.

17 comments4 min readLW link

Why multi-agent safety is im­por­tant

Akbir KhanJun 14, 2022, 9:23 AM
8 points

6 votes

Overall karma indicates overall quality.

2 comments10 min readLW link

Con­tra EY: Can AGI de­stroy us with­out trial & er­ror?

nsokolskyJun 13, 2022, 6:26 PM
124 points

86 votes

Overall karma indicates overall quality.

76 comments15 min readLW link

A Modest Pivotal Act

anonymousaisafetyJun 13, 2022, 7:24 PM
−15 points

24 votes

Overall karma indicates overall quality.

1 comment5 min readLW link

OpenAI: GPT-based LLMs show abil­ity to dis­crim­i­nate be­tween its own wrong an­swers, but in­abil­ity to ex­plain how/​why it makes that dis­crim­i­na­tion, even as model scales

Aditya JainJun 13, 2022, 11:33 PM
14 points

6 votes

Overall karma indicates overall quality.

5 comments1 min readLW link
(openai.com)

Re­sources I send to AI re­searchers about AI safety

Vael GatesJun 14, 2022, 2:24 AM
62 points

30 votes

Overall karma indicates overall quality.

12 comments10 min readLW link

In­ves­ti­gat­ing causal un­der­stand­ing in LLMs

Jun 14, 2022, 1:57 PM
28 points

16 votes

Overall karma indicates overall quality.

4 comments13 min readLW link

[Question] How Do You Quan­tify [Physics In­ter­fac­ing] Real World Ca­pa­bil­ities?

DragonGodJun 14, 2022, 2:49 PM
17 points

7 votes

Overall karma indicates overall quality.

1 comment4 min readLW link

Cryp­to­graphic Life: How to tran­scend in a sub-light­speed world via Ho­mo­mor­phic encryption

GololJun 14, 2022, 7:22 PM
1 point

1 vote

Overall karma indicates overall quality.

0 comments3 min readLW link

Align­ment Risk Doesn’t Re­quire Superintelligence

JustisMillsJun 15, 2022, 3:12 AM
35 points

18 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

Multi­gate Priors

Adam JermynJun 15, 2022, 7:30 PM
4 points

2 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

In­fo­haz­ards and in­fer­en­tial distances

acylhalideJun 16, 2022, 7:59 AM
8 points

6 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

Ap­ply to the Ma­chine Learn­ing For Good boot­camp in France

Alexandre VariengienJun 17, 2022, 7:32 AM
10 points

5 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Adap­ta­tion Ex­ecu­tors and the Telos Margin

PlinthistJun 20, 2022, 1:06 PM
2 points

2 votes

Overall karma indicates overall quality.

8 comments5 min readLW link

Causal con­fu­sion as an ar­gu­ment against the scal­ing hypothesis

Jun 20, 2022, 10:54 AM
83 points

32 votes

Overall karma indicates overall quality.

30 comments18 min readLW link

[Question] What is the most prob­a­ble AI?

Zeruel017Jun 20, 2022, 11:26 PM
−2 points

3 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

Reflec­tion Mechanisms as an Align­ment tar­get: A survey

Jun 22, 2022, 3:05 PM
28 points

13 votes

Overall karma indicates overall quality.

1 comment14 min readLW link

The Limits of Automation

milkandcigarettesJun 23, 2022, 6:03 PM
5 points

2 votes

Overall karma indicates overall quality.

1 comment5 min readLW link
(milkandcigarettes.com)

Con­ver­sa­tion with Eliezer: What do you want the sys­tem to do?

Orpheus16Jun 25, 2022, 5:36 PM
112 points

69 votes

Overall karma indicates overall quality.

38 comments2 min readLW link

[Yann Le­cun] A Path Towards Au­tonomous Ma­chine In­tel­li­gence

DragonGodJun 27, 2022, 7:24 PM
38 points

19 votes

Overall karma indicates overall quality.

12 comments1 min readLW link
(openreview.net)

Yann LeCun, A Path Towards Au­tonomous Ma­chine In­tel­li­gence [link]

Bill BenzonJun 27, 2022, 11:29 PM
5 points

7 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Doom doubts—is in­ner al­ign­ment a likely prob­lem?

CrissmanJun 28, 2022, 12:42 PM
6 points

6 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

What suc­cess looks like

Jun 28, 2022, 2:38 PM
19 points

10 votes

Overall karma indicates overall quality.

4 comments1 min readLW link
(forum.effectivealtruism.org)

La­tent Ad­ver­sar­ial Training

Adam JermynJun 29, 2022, 8:04 PM
24 points

13 votes

Overall karma indicates overall quality.

9 comments5 min readLW link

He­donis­tic Iso­topes:

TrozxzrJun 30, 2022, 4:49 PM
1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

[Question] What about tran­shu­mans and be­yond?

AlignmentMirrorJul 2, 2022, 1:58 PM
7 points

7 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

New US Se­nate Bill on X-Risk Miti­ga­tion [Linkpost]

Evan R. MurphyJul 4, 2022, 1:25 AM
35 points

23 votes

Overall karma indicates overall quality.

12 comments1 min readLW link
(www.hsgac.senate.gov)

When is it ap­pro­pri­ate to use statis­ti­cal mod­els and prob­a­bil­ities for de­ci­sion mak­ing ?

Younes KamelJul 5, 2022, 12:34 PM
10 points

3 votes

Overall karma indicates overall quality.

7 comments4 min readLW link
(youneskamel.substack.com)

How hu­man­ity would re­spond to slow take­off, with take­aways from the en­tire COVID-19 pan­demic

Noosphere89Jul 6, 2022, 5:52 PM
4 points

4 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

Four So­cietal In­ter­ven­tions to Im­prove our AGI Position

Rafael CosmanJul 6, 2022, 6:32 PM
−6 points

9 votes

Overall karma indicates overall quality.

2 comments6 min readLW link
(rafaelcosman.com)

Deep neu­ral net­works are not opaque.

jem-mosigJul 6, 2022, 6:03 PM
22 points

21 votes

Overall karma indicates overall quality.

14 comments3 min readLW link

Co­op­er­a­tion with and be­tween AGI\’s

PeterMcCluskeyJul 7, 2022, 4:45 PM
10 points

3 votes

Overall karma indicates overall quality.

3 comments10 min readLW link
(www.bayesianinvestor.com)

Mak­ing it harder for an AGI to “trick” us, with STVs

Tor Økland BarstadJul 9, 2022, 2:42 PM
14 points

5 votes

Overall karma indicates overall quality.

5 comments22 min readLW link

Grouped Loss may dis­fa­vor dis­con­tin­u­ous capabilities

Adam JermynJul 9, 2022, 5:22 PM
14 points

4 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

We are now at the point of deep­fake job interviews

trevorJul 10, 2022, 3:37 AM
6 points

5 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(www.businessinsider.com)

Ac­cept­abil­ity Ver­ifi­ca­tion: A Re­search Agenda

Jul 12, 2022, 8:11 PM
43 points

15 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(docs.google.com)

Find­ing Skele­tons on Rashomon Ridge

Jul 24, 2022, 10:31 PM
30 points

15 votes

Overall karma indicates overall quality.

2 comments7 min readLW link

A note about differ­en­tial tech­nolog­i­cal development

So8resJul 15, 2022, 4:46 AM
178 points

81 votes

Overall karma indicates overall quality.

31 comments6 min readLW link

How In­ter­pretabil­ity can be Impactful

Connall GarrodJul 18, 2022, 12:06 AM
18 points

8 votes

Overall karma indicates overall quality.

0 comments37 min readLW link

AI Hiroshima (Does A Vivid Ex­am­ple Of Destruc­tion Fore­stall Apoca­lypse?)

SableJul 18, 2022, 12:06 PM
4 points

4 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

Bounded com­plex­ity of solv­ing ELK and its implications

Rubi J. HudsonJul 19, 2022, 6:56 AM
10 points

3 votes

Overall karma indicates overall quality.

4 comments18 min readLW link

Abram Dem­ski’s ELK thoughts and pro­posal—distillation

Rubi J. HudsonJul 19, 2022, 6:57 AM
15 points

7 votes

Overall karma indicates overall quality.

4 comments16 min readLW link

Help ARC eval­u­ate ca­pa­bil­ities of cur­rent lan­guage mod­els (still need peo­ple)

Beth BarnesJul 19, 2022, 4:55 AM
94 points

35 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

A Cri­tique of AI Align­ment Pessimism

ExCephJul 19, 2022, 2:28 AM
8 points

7 votes

Overall karma indicates overall quality.

1 comment9 min readLW link

Model­ling Deception

Garrett BakerJul 18, 2022, 9:21 PM
15 points

3 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

En­light­en­ment Values in a Vuln­er­a­ble World

Maxwell TabarrokJul 20, 2022, 7:52 PM
15 points

7 votes

Overall karma indicates overall quality.

6 comments31 min readLW link
(maximumprogress.substack.com)

AI Safety Cheat­sheet /​ Quick Reference

Zohar JacksonJul 20, 2022, 9:39 AM
3 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(github.com)

Coun­ter­ing ar­gu­ments against work­ing on AI safety

Rauno ArikeJul 20, 2022, 6:23 PM
6 points

3 votes

Overall karma indicates overall quality.

2 comments7 min readLW link

Why AGI Timeline Re­search/​Dis­course Might Be Overrated

Noosphere89Jul 20, 2022, 8:26 PM
5 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(forum.effectivealtruism.org)

Con­nor Leahy on Dy­ing with Dig­nity, EleutherAI and Conjecture

Michaël TrazziJul 22, 2022, 6:44 PM
176 points

69 votes

Overall karma indicates overall quality.

29 comments14 min readLW link
(theinsideview.ai)

Brain­storm of things that could force an AI team to burn their lead

So8resJul 24, 2022, 11:58 PM
103 points

39 votes

Overall karma indicates overall quality.

4 comments13 min readLW link

Align­ment be­ing im­pos­si­ble might be bet­ter than it be­ing re­ally difficult

Martín SotoJul 25, 2022, 11:57 PM
12 points

9 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

AI ethics vs AI alignment

Wei DaiJul 26, 2022, 1:08 PM
4 points

12 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

NeurIPS ML Safety Work­shop 2022

Dan HJul 26, 2022, 3:28 PM
72 points

39 votes

Overall karma indicates overall quality.

2 comments1 min readLW link
(neurips2022.mlsafety.org)

Quan­tum Ad­van­tage in Learn­ing from Experiments

Dennis TowneJul 27, 2022, 3:49 PM
5 points

3 votes

Overall karma indicates overall quality.

5 comments1 min readLW link
(ai.googleblog.com)

AGI ruin sce­nar­ios are likely (and dis­junc­tive)

So8resJul 27, 2022, 3:21 AM
148 points

61 votes

Overall karma indicates overall quality.

37 comments6 min readLW link

A Quick Note on AI Scal­ing Asymptotes

alyssavanceMay 25, 2022, 2:55 AM
43 points

13 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

[Question] How likely do you think worse-than-ex­tinc­tion type fates to be?

span1Aug 1, 2022, 4:08 AM
3 points

4 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

[Question] I want to donate some money (not much, just what I can af­ford) to AGI Align­ment re­search, to what­ever or­ga­ni­za­tion has the best chance of mak­ing sure that AGI goes well and doesn’t kill us all. What are my best op­tions, where can I make the most differ­ence per dol­lar?

lumenwritesAug 2, 2022, 12:08 PM
15 points

9 votes

Overall karma indicates overall quality.

9 comments1 min readLW link

Law-Fol­low­ing AI 4: Don’t Rely on Vi­car­i­ous Liability

CullenAug 2, 2022, 11:26 PM
5 points

2 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

Ex­ter­nal­ized rea­son­ing over­sight: a re­search di­rec­tion for lan­guage model alignment

tameraAug 3, 2022, 12:03 PM
103 points

56 votes

Overall karma indicates overall quality.

22 comments6 min readLW link

Trans­former lan­guage mod­els are do­ing some­thing more general

NumendilAug 3, 2022, 9:13 PM
44 points

27 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

Three pillars for avoid­ing AGI catas­tro­phe: Tech­ni­cal al­ign­ment, de­ploy­ment de­ci­sions, and coordination

LintzAAug 3, 2022, 11:15 PM
17 points

9 votes

Overall karma indicates overall quality.

0 comments12 min readLW link

Sur­prised by ELK re­port’s coun­terex­am­ple to De­bate, IDA

Evan R. MurphyAug 4, 2022, 2:12 AM
18 points

11 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Bias to­wards sim­ple func­tions; ap­pli­ca­tion to al­ign­ment?

DavidHolmesAug 18, 2022, 4:15 PM
3 points

3 votes

Overall karma indicates overall quality.

7 comments2 min readLW link

What do ML re­searchers think about AI in 2022?

KatjaGraceAug 4, 2022, 3:40 PM
217 points

88 votes

Overall karma indicates overall quality.

33 comments3 min readLW link
(aiimpacts.org)

Deon­tol­ogy and Tool AI

Nathan1123Aug 5, 2022, 5:20 AM
4 points

3 votes

Overall karma indicates overall quality.

5 comments6 min readLW link

Bridg­ing Ex­pected Utility Max­i­miza­tion and Optimization

Daniel HerrmannAug 5, 2022, 8:18 AM
23 points

6 votes

Overall karma indicates overall quality.

5 comments14 min readLW link

Coun­ter­fac­tu­als are Con­fus­ing be­cause of an On­tolog­i­cal Shift

Chris_LeongAug 5, 2022, 7:03 PM
17 points

9 votes

Overall karma indicates overall quality.

35 comments2 min readLW link

A Data limited future

Donald HobsonAug 6, 2022, 2:56 PM
52 points

29 votes

Overall karma indicates overall quality.

25 comments2 min readLW link

A Com­mu­nity for Un­der­stand­ing Con­scious­ness: Rais­ing r/​MathPie

NavjotツAug 7, 2022, 8:17 AM
−12 points

4 votes

Overall karma indicates overall quality.

0 comments3 min readLW link
(www.reddit.com)

Com­plex­ity No Bar to AI (Or, why Com­pu­ta­tional Com­plex­ity mat­ters less than you think for real life prob­lems)

Noosphere89Aug 7, 2022, 7:55 PM
17 points

9 votes

Overall karma indicates overall quality.

14 comments3 min readLW link
(www.gwern.net)

A suffi­ciently para­noid pa­per­clip maximizer

RomanSAug 8, 2022, 11:17 AM
17 points

17 votes

Overall karma indicates overall quality.

10 comments2 min readLW link

Steganog­ra­phy in Chain of Thought Reasoning

A RayAug 8, 2022, 3:47 AM
49 points

26 votes

Overall karma indicates overall quality.

13 comments6 min readLW link

In­ter­pretabil­ity/​Tool-ness/​Align­ment/​Cor­rigi­bil­ity are not Composable

johnswentworthAug 8, 2022, 6:05 PM
111 points

53 votes

Overall karma indicates overall quality.

8 comments3 min readLW link

How (not) to choose a re­search project

Aug 9, 2022, 12:26 AM
76 points

32 votes

Overall karma indicates overall quality.

11 comments7 min readLW link

Team Shard Sta­tus Report

David UdellAug 9, 2022, 5:33 AM
38 points

16 votes

Overall karma indicates overall quality.

8 comments3 min readLW link

[Question] How would two su­per­in­tel­li­gent AIs in­ter­act, if they are un­al­igned with each other?

Nathan1123Aug 9, 2022, 6:58 PM
4 points

2 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

The Host Minds of HBO’s West­world.

NerretAug 12, 2022, 6:53 PM
1 point

1 vote

Overall karma indicates overall quality.

0 comments3 min readLW link

Anti-squat­ted AI x-risk do­mains index

plexAug 12, 2022, 12:01 PM
50 points

24 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

The Dumbest Pos­si­ble Gets There First

ArtaxerxesAug 13, 2022, 10:20 AM
35 points

23 votes

Overall karma indicates overall quality.

7 comments2 min readLW link

[Question] The OpenAI play­ground for GPT-3 is a ter­rible in­ter­face. Is there any great lo­cal (or web) app for ex­plor­ing/​learn­ing with lan­guage mod­els?

avivAug 13, 2022, 4:34 PM
2 points

4 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

I missed the crux of the al­ign­ment prob­lem the whole time

zeshenAug 13, 2022, 10:11 AM
53 points

25 votes

Overall karma indicates overall quality.

7 comments3 min readLW link

An Un­canny Prison

Nathan1123Aug 13, 2022, 9:40 PM
3 points

7 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

[Question] What is the prob­a­bil­ity that a su­per­in­tel­li­gent, sen­tient AGI is ac­tu­ally in­fea­si­ble?

Nathan1123Aug 14, 2022, 10:41 PM
−3 points

5 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Re­in­force­ment Learn­ing Goal Mis­gen­er­al­iza­tion: Can we guess what kind of goals are se­lected by de­fault?

Oct 25, 2022, 8:48 PM
9 points

6 votes

Overall karma indicates overall quality.

1 comment4 min readLW link

What’s Gen­eral-Pur­pose Search, And Why Might We Ex­pect To See It In Trained ML Sys­tems?

johnswentworthAug 15, 2022, 10:48 PM
103 points

42 votes

Overall karma indicates overall quality.

15 comments10 min readLW link

Dis­cov­er­ing Agents

zac_kentonAug 18, 2022, 5:33 PM
56 points

32 votes

Overall karma indicates overall quality.

8 comments6 min readLW link

In­ter­pretabil­ity Tools Are an At­tack Channel

Thane RuthenisAug 17, 2022, 6:47 PM
42 points

17 votes

Overall karma indicates overall quality.

22 comments1 min readLW link

Con­di­tion­ing, Prompts, and Fine-Tuning

Adam JermynAug 17, 2022, 8:52 PM
32 points

9 votes

Overall karma indicates overall quality.

9 comments4 min readLW link

De­bate AI and the De­ci­sion to Re­lease an AI

Chris_LeongJan 17, 2019, 2:36 PM
9 points

4 votes

Overall karma indicates overall quality.

18 comments3 min readLW link

What’s the Least Im­pres­sive Thing GPT-4 Won’t be Able to Do

AlgonAug 20, 2022, 7:48 PM
75 points

35 votes

Overall karma indicates overall quality.

80 comments1 min readLW link

The Align­ment Prob­lem Needs More Pos­i­tive Fiction

NetcentricaAug 21, 2022, 10:01 PM
4 points

6 votes

Overall karma indicates overall quality.

2 comments5 min readLW link

AI al­ign­ment as “nav­i­gat­ing the space of in­tel­li­gent be­havi­our”

Nora_AmmannAug 23, 2022, 1:28 PM
18 points

9 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

AGI Timelines Are Mostly Not Strate­gi­cally Rele­vant To Alignment

johnswentworthAug 23, 2022, 8:15 PM
44 points

45 votes

Overall karma indicates overall quality.

35 comments1 min readLW link

[Question] Would you ask a ge­nie to give you the solu­tion to al­ign­ment?

sudoAug 24, 2022, 1:29 AM
6 points

3 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Ethan Perez on the In­verse Scal­ing Prize, Lan­guage Feed­back and Red Teaming

Michaël TrazziAug 24, 2022, 4:35 PM
25 points

13 votes

Overall karma indicates overall quality.

0 comments3 min readLW link
(theinsideview.ai)

Prepar­ing for the apoc­a­lypse might help pre­vent it

OcracokeAug 25, 2022, 12:18 AM
1 point

4 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Your posts should be on arXiv

JanBAug 25, 2022, 10:35 AM
136 points

73 votes

Overall karma indicates overall quality.

39 comments3 min readLW link

The Solomonoff prior is ma­lign. It’s not a big deal.

Charlie SteinerAug 25, 2022, 8:25 AM
38 points

18 votes

Overall karma indicates overall quality.

9 comments7 min readLW link

AI strat­egy nearcasting

HoldenKarnofskyAug 25, 2022, 5:26 PM
79 points

32 votes

Overall karma indicates overall quality.

3 comments9 min readLW link

Com­mon mis­con­cep­tions about OpenAI

Jacob_HiltonAug 25, 2022, 2:02 PM
226 points

148 votes

Overall karma indicates overall quality.

138 comments5 min readLW link

AI Risk in Terms of Un­sta­ble Nu­clear Software

Thane RuthenisAug 26, 2022, 6:49 PM
29 points

13 votes

Overall karma indicates overall quality.

1 comment6 min readLW link

What’s the Most Im­pres­sive Thing That GPT-4 Could Plau­si­bly Do?

bayesedAug 26, 2022, 3:34 PM
23 points

12 votes

Overall karma indicates overall quality.

24 comments1 min readLW link

Tak­ing the pa­ram­e­ters which seem to mat­ter and ro­tat­ing them un­til they don’t

Garrett BakerAug 26, 2022, 6:26 PM
117 points

58 votes

Overall karma indicates overall quality.

48 comments1 min readLW link

An­nual AGI Bench­mark­ing Event

Lawrence PhillipsAug 27, 2022, 12:06 AM
24 points

14 votes

Overall karma indicates overall quality.

3 comments2 min readLW link
(www.metaculus.com)

Is there a benefit in low ca­pa­bil­ity AI Align­ment re­search?

LettiAug 26, 2022, 11:51 PM
1 point

1 vote

Overall karma indicates overall quality.

1 comment2 min readLW link

Help Un­der­stand­ing Prefer­ences And Evil

NetcentricaAug 27, 2022, 3:42 AM
6 points

5 votes

Overall karma indicates overall quality.

7 comments2 min readLW link

Solv­ing Align­ment by “solv­ing” semantics

Q HomeAug 27, 2022, 4:17 AM
15 points

8 votes

Overall karma indicates overall quality.

10 comments26 min readLW link

An In­tro­duc­tion to Cur­rent The­o­ries of Consciousness

hohenheimAug 28, 2022, 5:55 PM
59 points

32 votes

Overall karma indicates overall quality.

44 comments49 min readLW link

*New* Canada AI Safety & Gover­nance community

Wyatt Tessari L'AlliéAug 29, 2022, 6:45 PM
21 points

10 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Are Gen­er­a­tive World Models a Mesa-Op­ti­miza­tion Risk?

Thane RuthenisAug 29, 2022, 6:37 PM
12 points

5 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

How might we al­ign trans­for­ma­tive AI if it’s de­vel­oped very soon?

HoldenKarnofskyAug 29, 2022, 3:42 PM
107 points

37 votes

Overall karma indicates overall quality.

17 comments45 min readLW link

Wor­lds Where Iter­a­tive De­sign Fails

johnswentworthAug 30, 2022, 8:48 PM
144 points

67 votes

Overall karma indicates overall quality.

26 comments10 min readLW link

[Question] How might we make bet­ter use of AI ca­pa­bil­ities re­search for al­ign­ment pur­poses?

Jemal YoungAug 31, 2022, 4:19 AM
11 points

8 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

ML Model At­tri­bu­tion Challenge [Linkpost]

aogAug 30, 2022, 7:34 PM
11 points

6 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(mlmac.io)

I Tripped and Be­came GPT! (And How This Up­dated My Timelines)

FrankophoneSep 1, 2022, 5:56 PM
31 points

26 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

[Question] Can some­one ex­plain to me why most re­searchers think al­ign­ment is prob­a­bly some­thing that is hu­manly tractable?

iamthouthouartiSep 3, 2022, 1:12 AM
32 points

13 votes

Overall karma indicates overall quality.

11 comments1 min readLW link

An Up­date on Academia vs. In­dus­try (one year into my fac­ulty job)

David Scott Krueger (formerly: capybaralet)Sep 3, 2022, 8:43 PM
118 points

70 votes

Overall karma indicates overall quality.

18 comments4 min readLW link

Fram­ing AI Childhoods

David UdellSep 6, 2022, 11:40 PM
37 points

13 votes

Overall karma indicates overall quality.

8 comments4 min readLW link

A Game About AI Align­ment (& Meta-Ethics): What Are the Must Haves?

JonathanErhardtSep 5, 2022, 7:55 AM
18 points

9 votes

Overall karma indicates overall quality.

13 comments2 min readLW link

Is train­ing data go­ing to be diluted by AI-gen­er­ated con­tent?

Hannes ThurnherrSep 7, 2022, 6:13 PM
10 points

3 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

Turn­ing What­sApp Chat Data into Prompt-Re­sponse Form for Fine-Tuning

casualphysicsenjoyerSep 8, 2022, 8:05 PM
1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

[An email with a bunch of links I sent an ex­pe­rienced ML re­searcher in­ter­ested in learn­ing about Align­ment /​ x-safety.]

David Scott Krueger (formerly: capybaralet)Sep 8, 2022, 10:28 PM
46 points

23 votes

Overall karma indicates overall quality.

1 comment5 min readLW link

Mon­i­tor­ing for de­cep­tive alignment

evhubSep 8, 2022, 11:07 PM
118 points

39 votes

Overall karma indicates overall quality.

7 comments9 min readLW link

Samotsvety’s AI risk forecasts

eliflandSep 9, 2022, 4:01 AM
44 points

21 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Ought will host a fac­tored cog­ni­tion “Lab Meet­ing”

Sep 9, 2022, 11:46 PM
35 points

19 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

AI Risk In­tro 1: Ad­vanced AI Might Be Very Bad

Sep 11, 2022, 10:57 AM
43 points

26 votes

Overall karma indicates overall quality.

13 comments30 min readLW link

An in­ves­ti­ga­tion into when agents may be in­cen­tivized to ma­nipu­late our be­liefs.

Felix HofstätterSep 13, 2022, 5:08 PM
15 points

5 votes

Overall karma indicates overall quality.

0 comments14 min readLW link

Risk aver­sion and GPT-3

casualphysicsenjoyerSep 13, 2022, 8:50 PM
1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

[Question] Would a Misal­igned SSI Really Kill Us All?

DragonGodSep 14, 2022, 12:15 PM
6 points

5 votes

Overall karma indicates overall quality.

7 comments6 min readLW link

[Question] Why Do Peo­ple Think Hu­mans Are Stupid?

DragonGodSep 14, 2022, 1:55 PM
21 points

20 votes

Overall karma indicates overall quality.

39 comments3 min readLW link

Pre­cise P(doom) isn’t very im­por­tant for pri­ori­ti­za­tion or strategy

harsimonySep 14, 2022, 5:19 PM
18 points

9 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Co­or­di­nate-Free In­ter­pretabil­ity Theory

johnswentworthSep 14, 2022, 11:33 PM
41 points

19 votes

Overall karma indicates overall quality.

14 comments5 min readLW link

Ca­pa­bil­ity and Agency as Corner­stones of AI risk ­— My cur­rent model

wilmSep 15, 2022, 8:25 AM
10 points

7 votes

Overall karma indicates overall quality.

4 comments12 min readLW link

[Question] Are Hu­man Brains Univer­sal?

DragonGodSep 15, 2022, 3:15 PM
16 points

11 votes

Overall karma indicates overall quality.

28 comments5 min readLW link

Should AI learn hu­man val­ues, hu­man norms or some­thing else?

Q HomeSep 17, 2022, 6:19 AM
5 points

6 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

The ELK Fram­ing I’ve Used

sudoSep 19, 2022, 10:28 AM
4 points

2 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

[Question] If we have Hu­man-level chat­bots, won’t we end up be­ing ruled by pos­si­ble peo­ple?

Erlja Jkdf.Sep 20, 2022, 1:59 PM
5 points

5 votes

Overall karma indicates overall quality.

13 comments1 min readLW link

Char­ac­ter alignment

p.b.Sep 20, 2022, 8:27 AM
22 points

9 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Cryp­tocur­rency Ex­ploits Show the Im­por­tance of Proac­tive Poli­cies for AI X-Risk

eSpencerSep 20, 2022, 5:53 PM
1 point

3 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Do­ing over­sight from the very start of train­ing seems hard

peterbarnettSep 20, 2022, 5:21 PM
14 points

4 votes

Overall karma indicates overall quality.

3 comments3 min readLW link

Trends in Train­ing Dataset Sizes

Pablo VillalobosSep 21, 2022, 3:47 PM
24 points

9 votes

Overall karma indicates overall quality.

2 comments5 min readLW link
(epochai.org)

Two rea­sons we might be closer to solv­ing al­ign­ment than it seems

Sep 24, 2022, 8:00 PM
56 points

39 votes

Overall karma indicates overall quality.

9 comments4 min readLW link

Fund­ing is All You Need: Get­ting into Grad School by Hack­ing the NSF GRFP Fellowship

hapaninSep 22, 2022, 9:39 PM
93 points

43 votes

Overall karma indicates overall quality.

9 comments12 min readLW link

[Question] Papers to start get­ting into NLP-fo­cused al­ign­ment research

FeraidoonSep 24, 2022, 11:53 PM
6 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

How to Study Un­safe AGI’s safely (and why we might have no choice)

PunoxysmMar 7, 2014, 7:24 AM
10 points

19 votes

Overall karma indicates overall quality.

47 comments5 min readLW link

On Generality

Eris DiscordiaSep 26, 2022, 4:06 AM
2 points

2 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Oren’s Field Guide of Bad AGI Outcomes

Eris DiscordiaSep 26, 2022, 4:06 AM
0 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Sum­mary of ML Safety Course

zeshenSep 27, 2022, 1:05 PM
6 points

3 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

My Thoughts on the ML Safety Course

zeshenSep 27, 2022, 1:15 PM
49 points

26 votes

Overall karma indicates overall quality.

3 comments17 min readLW link

Re­ward IS the Op­ti­miza­tion Target

CarnSep 28, 2022, 5:59 PM
−1 points

9 votes

Overall karma indicates overall quality.

3 comments5 min readLW link

A Library and Tu­to­rial for Fac­tored Cog­ni­tion with Lan­guage Models

Sep 28, 2022, 6:15 PM
47 points

24 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Will Values and Com­pe­ti­tion De­cou­ple?

intersticeSep 28, 2022, 4:27 PM
15 points

7 votes

Overall karma indicates overall quality.

11 comments17 min readLW link

Make-A-Video by Meta AI

P.Sep 29, 2022, 5:07 PM
9 points

8 votes

Overall karma indicates overall quality.

4 comments1 min readLW link
(makeavideo.studio)

Open ap­pli­ca­tion to be­come an AI safety pro­ject mentor

Charbel-RaphaëlSep 29, 2022, 11:27 AM
7 points

5 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(docs.google.com)

It mat­ters when the first sharp left turn happens

Adam JermynSep 29, 2022, 8:12 PM
35 points

15 votes

Overall karma indicates overall quality.

9 comments4 min readLW link

Eli’s re­view of “Is power-seek­ing AI an ex­is­ten­tial risk?”

eliflandSep 30, 2022, 12:21 PM
58 points

17 votes

Overall karma indicates overall quality.

0 comments3 min readLW link
(docs.google.com)

[Question] Rank the fol­low­ing based on like­li­hood to nul­lify AI-risk

AorouSep 30, 2022, 11:15 AM
3 points

3 votes

Overall karma indicates overall quality.

1 comment4 min readLW link

Distri­bu­tion Shifts and The Im­por­tance of AI Safety

Leon LangSep 29, 2022, 10:38 PM
17 points

9 votes

Overall karma indicates overall quality.

2 comments12 min readLW link

[Question] What Is the Idea Be­hind (Un-)Su­per­vised Learn­ing and Re­in­force­ment Learn­ing?

MorpheusSep 30, 2022, 4:48 PM
9 points

6 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

(Struc­tural) Sta­bil­ity of Cou­pled Optimizers

Paul BricmanSep 30, 2022, 11:28 AM
25 points

8 votes

Overall karma indicates overall quality.

0 comments10 min readLW link

Where I cur­rently dis­agree with Ryan Green­blatt’s ver­sion of the ELK approach

So8resSep 29, 2022, 9:18 PM
63 points

26 votes

Overall karma indicates overall quality.

7 comments5 min readLW link

Paper: Large Lan­guage Models Can Self-im­prove [Linkpost]

Evan R. MurphyOct 2, 2022, 1:29 AM
52 points

30 votes

Overall karma indicates overall quality.

14 comments1 min readLW link
(openreview.net)

[Question] Is there a cul­ture over­hang?

Aleksi LiimatainenOct 3, 2022, 7:26 AM
18 points

9 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Vi­su­al­iz­ing Learned Rep­re­sen­ta­tions of Rice Disease

muhia_beeOct 3, 2022, 9:09 AM
7 points

3 votes

Overall karma indicates overall quality.

0 comments4 min readLW link
(indecisive-sand-24a.notion.site)

If you want to learn tech­ni­cal AI safety, here’s a list of AI safety courses, read­ing lists, and resources

KatWoodsOct 3, 2022, 12:43 PM
12 points

6 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Frontline of AGI Align­ment

SD MarlowOct 4, 2022, 3:47 AM
−10 points

5 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(robothouse.substack.com)

Hu­mans aren’t fit­ness maximizers

So8resOct 4, 2022, 1:31 AM
52 points

30 votes

Overall karma indicates overall quality.

45 comments5 min readLW link

Smoke with­out fire is scary

Adam JermynOct 4, 2022, 9:08 PM
49 points

26 votes

Overall karma indicates overall quality.

22 comments4 min readLW link

CHAI, As­sis­tance Games, And Fully-Up­dated Defer­ence [Scott Alexan­der]

lberglundOct 4, 2022, 5:04 PM
21 points

11 votes

Overall karma indicates overall quality.

1 comment17 min readLW link
(astralcodexten.substack.com)

Gen­er­a­tive, Epi­sodic Ob­jec­tives for Safe AI

Michael GlassOct 5, 2022, 11:18 PM
11 points

6 votes

Overall karma indicates overall quality.

3 comments8 min readLW link

[Linkpost] “Blueprint for an AI Bill of Rights”—Office of Science and Tech­nol­ogy Policy, USA (2022)

T431Oct 5, 2022, 4:42 PM
8 points

7 votes

Overall karma indicates overall quality.

4 comments2 min readLW link
(www.whitehouse.gov)

The Answer

Alex BeymanOct 5, 2022, 9:23 PM
−3 points

7 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

The prob­a­bil­ity that Ar­tifi­cial Gen­eral In­tel­li­gence will be de­vel­oped by 2043 is ex­tremely low.

cveresOct 6, 2022, 6:05 PM
−14 points

8 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

The Shape of Things to Come

Alex BeymanOct 7, 2022, 4:11 PM
12 points

16 votes

Overall karma indicates overall quality.

3 comments8 min readLW link

The Slow Reveal

Alex BeymanOct 9, 2022, 3:16 AM
3 points

8 votes

Overall karma indicates overall quality.

0 comments24 min readLW link

What does it mean for an AGI to be ‘safe’?

So8resOct 7, 2022, 4:13 AM
72 points

33 votes

Overall karma indicates overall quality.

32 comments3 min readLW link

Boolean Prim­i­tives for Cou­pled Optimizers

Paul BricmanOct 7, 2022, 6:02 PM
9 points

5 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

Anal­y­sis: US re­stricts GPU sales to China

aogOct 7, 2022, 6:38 PM
94 points

43 votes

Overall karma indicates overall quality.

58 comments5 min readLW link

[Question] Bro­ken Links for the Au­dio Ver­sion of 2021 MIRI Conversations

KriegerOct 8, 2022, 4:16 PM
1 point

1 vote

Overall karma indicates overall quality.

1 comment1 min readLW link

Don’t leave your finger­prints on the future

So8resOct 8, 2022, 12:35 AM
93 points

49 votes

Overall karma indicates overall quality.

32 comments5 min readLW link

Let’s talk about un­con­trol­lable AI

Karl von WendtOct 9, 2022, 10:34 AM
12 points

11 votes

Overall karma indicates overall quality.

6 comments3 min readLW link

Les­sons learned from talk­ing to >100 aca­demics about AI safety

Marius HobbhahnOct 10, 2022, 1:16 PM
207 points

98 votes

Overall karma indicates overall quality.

16 comments12 min readLW link

When re­port­ing AI timelines, be clear who you’re (not) defer­ring to

Sam ClarkeOct 10, 2022, 2:24 PM
37 points

16 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Nat­u­ral Cat­e­gories Update

Logan ZoellnerOct 10, 2022, 3:19 PM
29 points

13 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

Up­dates and Clarifications

SD MarlowOct 11, 2022, 5:34 AM
−5 points

2 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

My ar­gu­ment against AGI

cveresOct 12, 2022, 6:33 AM
3 points

7 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

In­stru­men­tal con­ver­gence in sin­gle-agent systems

Oct 12, 2022, 12:24 PM
27 points

16 votes

Overall karma indicates overall quality.

4 comments8 min readLW link
(www.gladstone.ai)

A strange twist on the road to AGI

cveresOct 12, 2022, 11:27 PM
−8 points

7 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Perfect Enemy

Alex BeymanOct 13, 2022, 8:23 AM
−2 points

10 votes

Overall karma indicates overall quality.

0 comments46 min readLW link

A stub­born un­be­liever fi­nally gets the depth of the AI al­ign­ment problem

aelwoodOct 13, 2022, 3:16 PM
17 points

12 votes

Overall karma indicates overall quality.

8 comments3 min readLW link
(pursuingreality.substack.com)

Misal­ign­ment-by-de­fault in multi-agent systems

Oct 13, 2022, 3:38 PM
17 points

8 votes

Overall karma indicates overall quality.

8 comments20 min readLW link
(www.gladstone.ai)

Nice­ness is unnatural

So8resOct 13, 2022, 1:30 AM
98 points

44 votes

Overall karma indicates overall quality.

18 comments8 min readLW link

The Vi­talik Bu­terin Fel­low­ship in AI Ex­is­ten­tial Safety is open for ap­pli­ca­tions!

Xin Chen, CynthiaOct 13, 2022, 6:32 PM
21 points

8 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Greed Is the Root of This Evil

Thane RuthenisOct 13, 2022, 8:40 PM
21 points

8 votes

Overall karma indicates overall quality.

4 comments8 min readLW link

Con­tra shard the­ory, in the con­text of the di­a­mond max­i­mizer problem

So8resOct 13, 2022, 11:51 PM
84 points

43 votes

Overall karma indicates overall quality.

16 comments2 min readLW link

An­thro­po­mor­phic AI and Sand­boxed Vir­tual Uni­verses

jacob_cannellSep 3, 2010, 7:02 PM
4 points

45 votes

Overall karma indicates overall quality.

124 comments5 min readLW link

In­stru­men­tal con­ver­gence: scale and phys­i­cal interactions

Oct 14, 2022, 3:50 PM
15 points

6 votes

Overall karma indicates overall quality.

0 comments17 min readLW link
(www.gladstone.ai)

Prov­ably Hon­est—A First Step

Srijanak DeNov 5, 2022, 7:18 PM
10 points

10 votes

Overall karma indicates overall quality.

2 comments8 min readLW link

They gave LLMs ac­cess to physics simulators

ryan_bOct 17, 2022, 9:21 PM
50 points

28 votes

Overall karma indicates overall quality.

18 comments1 min readLW link
(arxiv.org)

De­ci­sion the­ory does not im­ply that we get to have nice things

So8resOct 18, 2022, 3:04 AM
142 points

75 votes

Overall karma indicates overall quality.

53 comments26 min readLW link

[Question] How easy is it to su­per­vise pro­cesses vs out­comes?

Noosphere89Oct 18, 2022, 5:48 PM
3 points

4 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

How To Make Pre­dic­tion Mar­kets Use­ful For Align­ment Work

johnswentworthOct 18, 2022, 7:01 PM
86 points

40 votes

Overall karma indicates overall quality.

18 comments2 min readLW link

The re­ward func­tion is already how well you ma­nipu­late humans

KerryOct 19, 2022, 1:52 AM
20 points

9 votes

Overall karma indicates overall quality.

9 comments2 min readLW link

Co­op­er­a­tors are more pow­er­ful than agents

Ivan VendrovOct 21, 2022, 8:02 PM
14 points

9 votes

Overall karma indicates overall quality.

7 comments3 min readLW link

Log­i­cal De­ci­sion The­o­ries: Our fi­nal failsafe?

Noosphere89Oct 25, 2022, 12:51 PM
−6 points

6 votes

Overall karma indicates overall quality.

8 comments1 min readLW link
(www.lesswrong.com)

[Question] Sim­ple ques­tion about cor­rigi­bil­ity and val­ues in AI.

jmhOct 22, 2022, 2:59 AM
6 points

1 vote

Overall karma indicates overall quality.

1 comment1 min readLW link

Newslet­ter for Align­ment Re­search: The ML Safety Updates

Esben KranOct 22, 2022, 4:17 PM
14 points

12 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

“Origi­nal­ity is noth­ing but ju­di­cious imi­ta­tion”—Voltaire

VestoziaOct 23, 2022, 7:00 PM
0 points

2 votes

Overall karma indicates overall quality.

0 comments13 min readLW link

AI re­searchers an­nounce Neu­roAI agenda

Cameron BergOct 24, 2022, 12:14 AM
37 points

22 votes

Overall karma indicates overall quality.

12 comments6 min readLW link
(arxiv.org)

AGI in our life­times is wish­ful thinking

niknobleOct 24, 2022, 11:53 AM
−4 points

31 votes

Overall karma indicates overall quality.

21 comments8 min readLW link

ques­tion-an­swer coun­ter­fac­tual intervals

Tamsin LeakeOct 24, 2022, 1:08 PM
8 points

3 votes

Overall karma indicates overall quality.

0 comments4 min readLW link
(carado.moe)

Why some peo­ple be­lieve in AGI, but I don’t.

cveresOct 26, 2022, 3:09 AM
−15 points

12 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

[Question] Is the Orthog­o­nal­ity Th­e­sis true for hu­mans?

Noosphere89Oct 27, 2022, 2:41 PM
12 points

11 votes

Overall karma indicates overall quality.

18 comments1 min readLW link

Wor­ld­view iPeo­ple—Fu­ture Fund’s AI Wor­ld­view Prize

Toni MUENDELOct 28, 2022, 1:53 AM
−22 points

8 votes

Overall karma indicates overall quality.

4 comments9 min readLW link

Causal scrub­bing: Appendix

Dec 3, 2022, 12:58 AM
16 points

4 votes

Overall karma indicates overall quality.

0 comments20 min readLW link

Beyond Kol­mogorov and Shannon

Oct 25, 2022, 3:13 PM
60 points

37 votes

Overall karma indicates overall quality.

14 comments5 min readLW link

Method of state­ments: an al­ter­na­tive to taboo

Q HomeNov 16, 2022, 10:57 AM
7 points

4 votes

Overall karma indicates overall quality.

0 comments41 min readLW link

Causal Scrub­bing: a method for rigor­ously test­ing in­ter­pretabil­ity hy­pothe­ses [Red­wood Re­search]

Dec 3, 2022, 12:58 AM
130 points

46 votes

Overall karma indicates overall quality.

9 comments20 min readLW link

Some Les­sons Learned from Study­ing Indi­rect Ob­ject Iden­ti­fi­ca­tion in GPT-2 small

Oct 28, 2022, 11:55 PM
86 points

30 votes

Overall karma indicates overall quality.

5 comments9 min readLW link
(arxiv.org)

Causal scrub­bing: re­sults on a paren bal­ance checker

Dec 3, 2022, 12:59 AM
26 points

8 votes

Overall karma indicates overall quality.

0 comments30 min readLW link

AI as a Civ­i­liza­tional Risk Part 1/​6: His­tor­i­cal Priors

PashaKamyshevOct 29, 2022, 9:59 PM
2 points

12 votes

Overall karma indicates overall quality.

2 comments7 min readLW link

AI as a Civ­i­liza­tional Risk Part 2/​6: Be­hav­ioral Modification

PashaKamyshevOct 30, 2022, 4:57 PM
9 points

5 votes

Overall karma indicates overall quality.

0 comments10 min readLW link

AI as a Civ­i­liza­tional Risk Part 3/​6: Anti-econ­omy and Sig­nal Pollution

PashaKamyshevOct 31, 2022, 5:03 PM
7 points

11 votes

Overall karma indicates overall quality.

4 comments14 min readLW link

AI as a Civ­i­liza­tional Risk Part 4/​6: Bioweapons and Philos­o­phy of Modification

PashaKamyshevNov 1, 2022, 8:50 PM
7 points

5 votes

Overall karma indicates overall quality.

1 comment8 min readLW link

AI as a Civ­i­liza­tional Risk Part 5/​6: Re­la­tion­ship be­tween C-risk and X-risk

PashaKamyshevNov 3, 2022, 2:19 AM
2 points

7 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

AI as a Civ­i­liza­tional Risk Part 6/​6: What can be done

PashaKamyshevNov 3, 2022, 7:48 PM
2 points

3 votes

Overall karma indicates overall quality.

3 comments4 min readLW link

Am I se­cretly ex­cited for AI get­ting weird?

porbyOct 29, 2022, 10:16 PM
98 points

65 votes

Overall karma indicates overall quality.

4 comments4 min readLW link

“Nor­mal” is the equil­ibrium state of past op­ti­miza­tion processes

Alex_AltairOct 30, 2022, 7:03 PM
77 points

33 votes

Overall karma indicates overall quality.

5 comments5 min readLW link

love, not competition

Tamsin LeakeOct 30, 2022, 7:44 PM
31 points

22 votes

Overall karma indicates overall quality.

20 comments1 min readLW link
(carado.moe)

My (naive) take on Risks from Learned Optimization

artkpvOct 31, 2022, 10:59 AM
7 points

4 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Embed­ding safety in ML development

zeshenOct 31, 2022, 12:27 PM
24 points

12 votes

Overall karma indicates overall quality.

1 comment18 min readLW link

Au­dit­ing games for high-level interpretability

Paul CologneseNov 1, 2022, 10:44 AM
28 points

10 votes

Overall karma indicates overall quality.

1 comment7 min readLW link

pub­lish­ing al­ign­ment re­search and infohazards

Tamsin LeakeOct 31, 2022, 6:02 PM
69 points

29 votes

Overall karma indicates overall quality.

10 comments1 min readLW link
(carado.moe)

Cau­tion when in­ter­pret­ing Deep­mind’s In-con­text RL paper

Sam MarksNov 1, 2022, 2:42 AM
104 points

44 votes

Overall karma indicates overall quality.

6 comments4 min readLW link

AGI and the fu­ture: Is a fu­ture with AGI and hu­mans al­ive ev­i­dence that AGI is not a threat to our ex­is­tence?

LetUsTalkNov 1, 2022, 7:37 AM
4 points

8 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

Threat Model Liter­a­ture Review

Nov 1, 2022, 11:03 AM
55 points

26 votes

Overall karma indicates overall quality.

4 comments25 min readLW link

Clar­ify­ing AI X-risk

Nov 1, 2022, 11:03 AM
102 points

49 votes

Overall karma indicates overall quality.

23 comments4 min readLW link

a ca­sual in­tro to AI doom and alignment

Tamsin LeakeNov 1, 2022, 4:38 PM
12 points

14 votes

Overall karma indicates overall quality.

0 comments4 min readLW link
(carado.moe)

[Question] Which Is­sues in Con­cep­tual Align­ment have been For­mal­ised or Ob­served (or not)?

ojorgensenNov 1, 2022, 10:32 PM
4 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Ques­tions about Value Lock-in, Pa­ter­nal­ism, and Empowerment

Sam F. BrownNov 16, 2022, 3:33 PM
12 points

6 votes

Overall karma indicates overall quality.

2 comments12 min readLW link
(sambrown.eu)

Why do we post our AI safety plans on the In­ter­net?

Peter S. ParkNov 3, 2022, 4:02 PM
3 points

13 votes

Overall karma indicates overall quality.

4 comments11 min readLW link

Mechanis­tic In­ter­pretabil­ity as Re­v­erse Eng­ineer­ing (fol­low-up to “cars and elephants”)

David Scott Krueger (formerly: capybaralet)Nov 3, 2022, 11:19 PM
28 points

13 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

[Question] Are al­ign­ment re­searchers de­vot­ing enough time to im­prov­ing their re­search ca­pac­ity?

Carson JonesNov 4, 2022, 12:58 AM
13 points

8 votes

Overall karma indicates overall quality.

3 comments3 min readLW link

[Question] Don’t you think RLHF solves outer al­ign­ment?

Charbel-RaphaëlNov 4, 2022, 12:36 AM
2 points

3 votes

Overall karma indicates overall quality.

19 comments1 min readLW link

A new­comer’s guide to the tech­ni­cal AI safety field

zeshenNov 4, 2022, 2:29 PM
30 points

14 votes

Overall karma indicates overall quality.

1 comment10 min readLW link

Toy Models and Tegum Products

Adam JermynNov 4, 2022, 6:51 PM
27 points

10 votes

Overall karma indicates overall quality.

7 comments5 min readLW link

For ELK truth is mostly a distraction

c.troutNov 4, 2022, 9:14 PM
32 points

13 votes

Overall karma indicates overall quality.

0 comments21 min readLW link

In­ter­pret­ing sys­tems as solv­ing POMDPs: a step to­wards a for­mal un­der­stand­ing of agency [pa­per link]

the gears to ascensionNov 5, 2022, 1:06 AM
12 points

5 votes

Overall karma indicates overall quality.

2 comments1 min readLW link
(www.semanticscholar.org)

When can a mimic sur­prise you? Why gen­er­a­tive mod­els han­dle seem­ingly ill-posed problems

David JohnstonNov 5, 2022, 1:19 PM
8 points

7 votes

Overall karma indicates overall quality.

4 comments16 min readLW link

The Slip­pery Slope from DALLE-2 to Deep­fake Anarchy

scasperNov 5, 2022, 2:53 PM
16 points

21 votes

Overall karma indicates overall quality.

9 comments11 min readLW link

[Question] Can we get around Godel’s In­com­plete­ness the­o­rems and Tur­ing un­de­cid­able prob­lems via in­finite com­put­ers?

Noosphere89Nov 5, 2022, 6:01 PM
−10 points

7 votes

Overall karma indicates overall quality.

12 comments1 min readLW link

Recom­mend HAIST re­sources for as­sess­ing the value of RLHF-re­lated al­ign­ment research

Nov 5, 2022, 8:58 PM
26 points

11 votes

Overall karma indicates overall quality.

9 comments3 min readLW link

[Question] Has any­one in­creased their AGI timelines?

Darren McKeeNov 6, 2022, 12:03 AM
38 points

21 votes

Overall karma indicates overall quality.

13 comments1 min readLW link

Ap­ply­ing su­per­in­tel­li­gence with­out col­lu­sion

Eric DrexlerNov 8, 2022, 6:08 PM
88 points

40 votes

Overall karma indicates overall quality.

57 comments4 min readLW link

A philoso­pher’s cri­tique of RLHF

TW123Nov 7, 2022, 2:42 AM
55 points

26 votes

Overall karma indicates overall quality.

8 comments2 min readLW link

4 Key As­sump­tions in AI Safety

PrometheusNov 7, 2022, 10:50 AM
20 points

11 votes

Overall karma indicates overall quality.

5 comments7 min readLW link

Hacker-AI – Does it already ex­ist?

Erland WittkotterNov 7, 2022, 2:01 PM
3 points

8 votes

Overall karma indicates overall quality.

11 comments11 min readLW link

Loss of con­trol of AI is not a likely source of AI x-risk

squekNov 7, 2022, 6:44 PM
−6 points

4 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Mys­ter­ies of mode collapse

janusNov 8, 2022, 10:37 AM
213 points

98 votes

Overall karma indicates overall quality.

35 comments14 min readLW link

Some ad­vice on in­de­pen­dent research

Marius HobbhahnNov 8, 2022, 2:46 PM
41 points

18 votes

Overall karma indicates overall quality.

4 comments10 min readLW link

A first suc­cess story for Outer Align­ment: In­struc­tGPT

Noosphere89Nov 8, 2022, 10:52 PM
6 points

10 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(openai.com)

A caveat to the Orthog­o­nal­ity Thesis

Wuschel SchulzNov 9, 2022, 3:06 PM
36 points

16 votes

Overall karma indicates overall quality.

10 comments2 min readLW link

Try­ing to Make a Treach­er­ous Mesa-Optimizer

MadHatterNov 9, 2022, 6:07 PM
87 points

39 votes

Overall karma indicates overall quality.

13 comments4 min readLW link
(attentionspan.blog)

Is full self-driv­ing an AGI-com­plete prob­lem?

kraemahzNov 10, 2022, 2:04 AM
5 points

3 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

The har­ness­ing of complexity

geduardoNov 10, 2022, 6:44 PM
6 points

7 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

[Question] I there a demo of “You can’t fetch the coffee if you’re dead”?

Ram RachumNov 10, 2022, 6:41 PM
8 points

6 votes

Overall karma indicates overall quality.

9 comments1 min readLW link

LessWrong Poll on AGI

Niclas KupperNov 10, 2022, 1:13 PM
12 points

8 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Value For­ma­tion: An Over­ar­ch­ing Model

Thane RuthenisNov 15, 2022, 5:16 PM
27 points

7 votes

Overall karma indicates overall quality.

9 comments34 min readLW link

[simu­la­tion] 4chan user claiming to be the at­tor­ney hired by Google’s sen­tient chat­bot LaMDA shares wild de­tails of encounter

janusNov 10, 2022, 9:39 PM
11 points

12 votes

Overall karma indicates overall quality.

1 comment13 min readLW link
(generative.ink)

Why I’m Work­ing On Model Ag­nos­tic Interpretability

Jessica RumbelowNov 11, 2022, 9:24 AM
28 points

18 votes

Overall karma indicates overall quality.

9 comments2 min readLW link

Are fund­ing op­tions for AI Safety threat­ened? W45

Nov 11, 2022, 1:00 PM
7 points

4 votes

Overall karma indicates overall quality.

0 comments3 min readLW link
(newsletter.apartresearch.com)

How likely are ma­lign pri­ors over ob­jec­tives? [aborted WIP]

David JohnstonNov 11, 2022, 5:36 AM
−2 points

2 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

Is AI Gain-of-Func­tion re­search a thing?

MadHatterNov 12, 2022, 2:33 AM
8 points

4 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

Vanessa Kosoy’s PreDCA, distilled

Martín SotoNov 12, 2022, 11:38 AM
16 points

12 votes

Overall karma indicates overall quality.

17 comments5 min readLW link

fully al­igned sin­gle­ton as a solu­tion to everything

Tamsin LeakeNov 12, 2022, 6:19 PM
6 points

12 votes

Overall karma indicates overall quality.

2 comments2 min readLW link
(carado.moe)

Ways to buy time

Nov 12, 2022, 7:31 PM
26 points

17 votes

Overall karma indicates overall quality.

21 comments12 min readLW link

Char­ac­ter­iz­ing In­trin­sic Com­po­si­tion­al­ity in Trans­form­ers with Tree Projections

Ulisse MiniNov 13, 2022, 9:46 AM
12 points

7 votes

Overall karma indicates overall quality.

2 comments1 min readLW link
(arxiv.org)

I (with the help of a few more peo­ple) am plan­ning to cre­ate an in­tro­duc­tion to AI Safety that a smart teenager can un­der­stand. What am I miss­ing?

TapataktNov 14, 2022, 4:12 PM
3 points

5 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Will we run out of ML data? Ev­i­dence from pro­ject­ing dataset size trends

Pablo VillalobosNov 14, 2022, 4:42 PM
74 points

39 votes

Overall karma indicates overall quality.

12 comments2 min readLW link
(epochai.org)

The limited up­side of interpretability

Peter S. ParkNov 15, 2022, 6:46 PM
13 points

16 votes

Overall karma indicates overall quality.

11 comments1 min readLW link

[Question] Is the speed of train­ing large mod­els go­ing to in­crease sig­nifi­cantly in the near fu­ture due to Cere­bras An­dromeda?

Amal Nov 15, 2022, 10:50 PM
11 points

6 votes

Overall karma indicates overall quality.

11 comments1 min readLW link

Un­pack­ing “Shard The­ory” as Hunch, Ques­tion, The­ory, and Insight

Jacy Reese AnthisNov 16, 2022, 1:54 PM
29 points

21 votes

Overall karma indicates overall quality.

9 comments2 min readLW link

The two con­cep­tions of Ac­tive In­fer­ence: an in­tel­li­gence ar­chi­tec­ture and a the­ory of agency

Roman LeventovNov 16, 2022, 9:30 AM
7 points

4 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Eng­ineer­ing Monose­man­tic­ity in Toy Models

Nov 18, 2022, 1:43 AM
72 points

28 votes

Overall karma indicates overall quality.

6 comments3 min readLW link
(arxiv.org)

[Question] Is there any policy for a fair treat­ment of AIs whose friendli­ness is in doubt?

nahojNov 18, 2022, 7:01 PM
15 points

7 votes

Overall karma indicates overall quality.

9 comments1 min readLW link

The Ground Truth Prob­lem (Or, Why Eval­u­at­ing In­ter­pretabil­ity Meth­ods Is Hard)

Jessica RumbelowNov 17, 2022, 11:06 AM
26 points

12 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

Mas­sive Scal­ing Should be Frowned Upon

harsimonyNov 17, 2022, 8:43 AM
7 points

9 votes

Overall karma indicates overall quality.

6 comments5 min readLW link

How AI Fails Us: A non-tech­ni­cal view of the Align­ment Problem

testingthewatersNov 18, 2022, 7:02 PM
7 points

3 votes

Overall karma indicates overall quality.

0 comments2 min readLW link
(ethics.harvard.edu)

LLMs may cap­ture key com­po­nents of hu­man agency

catubcNov 17, 2022, 8:14 PM
21 points

10 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

AGIs may value in­trin­sic re­wards more than ex­trin­sic ones

catubcNov 17, 2022, 9:49 PM
8 points

4 votes

Overall karma indicates overall quality.

6 comments4 min readLW link

The econ­omy as an anal­ogy for ad­vanced AI systems

Nov 15, 2022, 11:16 AM
26 points

8 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Cog­ni­tive sci­ence and failed AI fore­casts

Eleni AngelouNov 24, 2022, 9:02 PM
0 points

5 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

A Short Dialogue on the Mean­ing of Re­ward Functions

Nov 19, 2022, 9:04 PM
40 points

24 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

[Question] Up­dates on scal­ing laws for foun­da­tion mod­els from ′ Tran­scend­ing Scal­ing Laws with 0.1% Ex­tra Com­pute’

Nick_GreigNov 18, 2022, 12:46 PM
15 points

8 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Distil­la­tion of “How Likely Is De­cep­tive Align­ment?”

NickGabsNov 18, 2022, 4:31 PM
20 points

10 votes

Overall karma indicates overall quality.

3 comments10 min readLW link

The Disas­trously Con­fi­dent And Inac­cu­rate AI

Sharat Jacob JacobNov 18, 2022, 7:06 PM
13 points

9 votes

Overall karma indicates overall quality.

0 comments13 min readLW link

gen­er­al­ized wireheading

Tamsin LeakeNov 18, 2022, 8:18 PM
21 points

12 votes

Overall karma indicates overall quality.

7 comments2 min readLW link
(carado.moe)

By De­fault, GPTs Think In Plain Sight

Fabien RogerNov 19, 2022, 7:15 PM
60 points

40 votes

Overall karma indicates overall quality.

16 comments9 min readLW link

ARC pa­per: For­mal­iz­ing the pre­sump­tion of independence

Erik JennerNov 20, 2022, 1:22 AM
88 points

39 votes

Overall karma indicates overall quality.

2 comments2 min readLW link
(arxiv.org)

Planes are still decades away from dis­plac­ing most bird jobs

guzeyNov 25, 2022, 4:49 PM
156 points

93 votes

Overall karma indicates overall quality.

13 comments3 min readLW link

Scott Aaron­son on “Re­form AI Align­ment”

ShmiNov 20, 2022, 10:20 PM
39 points

32 votes

Overall karma indicates overall quality.

17 comments1 min readLW link
(scottaaronson.blog)

How Should AIS Re­late To Its Fun­ders? W46

Nov 21, 2022, 3:58 PM
6 points

2 votes

Overall karma indicates overall quality.

1 comment3 min readLW link
(newsletter.apartresearch.com)

Benefits/​Risks of Scott Aaron­son’s Ortho­dox/​Re­form Fram­ing for AI Alignment

JeremyyNov 21, 2022, 5:54 PM
2 points

1 vote

Overall karma indicates overall quality.

1 comment1 min readLW link

[Heb­bian Nat­u­ral Ab­strac­tions] Introduction

Nov 21, 2022, 8:34 PM
34 points

13 votes

Overall karma indicates overall quality.

3 comments4 min readLW link
(www.snellessen.com)

Mis­cel­la­neous First-Pass Align­ment Thoughts

NickGabsNov 21, 2022, 9:23 PM
12 points

5 votes

Overall karma indicates overall quality.

4 comments10 min readLW link

Meta AI an­nounces Cicero: Hu­man-Level Di­plo­macy play (with di­alogue)

Jacy Reese AnthisNov 22, 2022, 4:50 PM
95 points

49 votes

Overall karma indicates overall quality.

64 comments1 min readLW link
(www.science.org)

An­nounc­ing AI Align­ment Awards: $100k re­search con­tests about goal mis­gen­er­al­iza­tion & corrigibility

Nov 22, 2022, 10:19 PM
69 points

33 votes

Overall karma indicates overall quality.

20 comments4 min readLW link

Brute-forc­ing the uni­verse: a non-stan­dard shot at di­a­mond alignment

Martín SotoNov 22, 2022, 10:36 PM
6 points

3 votes

Overall karma indicates overall quality.

0 comments20 min readLW link

Si­mu­la­tors, con­straints, and goal ag­nos­ti­cism: por­bynotes vol. 1

porbyNov 23, 2022, 4:22 AM
36 points

7 votes

Overall karma indicates overall quality.

2 comments35 min readLW link

Sets of ob­jec­tives for a multi-ob­jec­tive RL agent to optimize

Nov 23, 2022, 6:49 AM
11 points

4 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

Hu­man-level Di­plo­macy was my fire alarm

Lao MeinNov 23, 2022, 10:05 AM
51 points

36 votes

Overall karma indicates overall quality.

15 comments3 min readLW link

Ex nihilo

Hopkins StanleyNov 23, 2022, 2:38 PM
1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

Cor­rigi­bil­ity Via Thought-Pro­cess Deference

Thane RuthenisNov 24, 2022, 5:06 PM
13 points

7 votes

Overall karma indicates overall quality.

5 comments9 min readLW link

Con­jec­ture: a ret­ro­spec­tive af­ter 8 months of work

Nov 23, 2022, 5:10 PM
183 points

101 votes

Overall karma indicates overall quality.

9 comments8 min readLW link

Con­jec­ture Se­cond Hiring Round

Nov 23, 2022, 5:11 PM
85 points

38 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

In­ject­ing some num­bers into the AGI de­bate—by Boaz Barak

JsevillamolNov 23, 2022, 4:10 PM
12 points

3 votes

Overall karma indicates overall quality.

0 comments3 min readLW link
(windowsontheory.org)

Hu­man-level Full-Press Di­plo­macy (some bare facts).

Cleo NardoNov 22, 2022, 8:59 PM
50 points

25 votes

Overall karma indicates overall quality.

7 comments3 min readLW link

When AI solves a game, fo­cus on the game’s me­chan­ics, not its theme.

Cleo NardoNov 23, 2022, 7:16 PM
81 points

47 votes

Overall karma indicates overall quality.

7 comments2 min readLW link

[Question] What is the best source to ex­plain short AI timelines to a skep­ti­cal per­son?

trevorNov 23, 2022, 5:19 AM
4 points

2 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Steer­ing Be­havi­our: Test­ing for (Non-)My­opia in Lan­guage Models

Dec 5, 2022, 8:28 PM
37 points

16 votes

Overall karma indicates overall quality.

16 comments10 min readLW link

The man and the tool

pedroalvaradoNov 25, 2022, 7:51 PM
1 point

3 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Gliders in Lan­guage Models

Alexandre VariengienNov 25, 2022, 12:38 AM
27 points

18 votes

Overall karma indicates overall quality.

11 comments10 min readLW link

The AI Safety com­mu­nity has four main work groups, Strat­egy, Gover­nance, Tech­ni­cal and Move­ment Building

peterslatteryNov 25, 2022, 3:45 AM
0 points

7 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

Us­ing mechanis­tic in­ter­pretabil­ity to find in-dis­tri­bu­tion failure in toy transformers

Charlie GeorgeNov 28, 2022, 7:39 PM
6 points

3 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

In­tu­itions by ML re­searchers may get pro­gres­sively worse con­cern­ing likely can­di­dates for trans­for­ma­tive AI

Viktor RehnbergNov 25, 2022, 3:49 PM
7 points

2 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Guardian AI (Misal­igned sys­tems are all around us.)

Jessica RumbelowNov 25, 2022, 3:55 PM
15 points

8 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

Three Align­ment Schemas & Their Problems

Shoshannah TekofskyNov 26, 2022, 4:25 AM
16 points

10 votes

Overall karma indicates overall quality.

1 comment6 min readLW link

Re­ward Is Not Ne­c­es­sary: How To Create A Com­po­si­tional Self-Pre­serv­ing Agent For Life-Long Learning

CapybasiliskNov 27, 2022, 2:05 PM
3 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(arxiv.org)

Re­view: LOVE in a simbox

PeterMcCluskeyNov 27, 2022, 5:41 PM
32 points

11 votes

Overall karma indicates overall quality.

4 comments9 min readLW link
(bayesianinvestor.com)

Su­per­in­tel­li­gent AI is nec­es­sary for an amaz­ing fu­ture, but far from sufficient

So8resOct 31, 2022, 9:16 PM
115 points

55 votes

Overall karma indicates overall quality.

46 comments34 min readLW link

[Question] How to cor­rect for mul­ti­plic­ity with AI-gen­er­ated mod­els?

Lao MeinNov 28, 2022, 3:51 AM
4 points

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

Is Con­struc­tor The­ory a use­ful tool for AI al­ign­ment?

A.H.Nov 29, 2022, 12:35 PM
11 points

6 votes

Overall karma indicates overall quality.

8 comments26 min readLW link

Multi-Com­po­nent Learn­ing and S-Curves

Nov 30, 2022, 1:37 AM
57 points

21 votes

Overall karma indicates overall quality.

24 comments7 min readLW link

Sub­sets and quo­tients in interpretability

Erik JennerDec 2, 2022, 11:13 PM
24 points

12 votes

Overall karma indicates overall quality.

1 comment7 min readLW link

Ne­glected cause: au­to­mated fraud de­tec­tion in academia through image analysis

Lao MeinNov 30, 2022, 5:52 AM
10 points

4 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

AGI Im­pos­si­ble due to En­ergy Constrains

TheKlausNov 30, 2022, 6:48 PM
−8 points

12 votes

Overall karma indicates overall quality.

13 comments1 min readLW link

Master plan spec: needs au­dit (logic and co­op­er­a­tive AI)

QuinnNov 30, 2022, 6:10 AM
12 points

5 votes

Overall karma indicates overall quality.

5 comments7 min readLW link

AI takeover table­top RPG: “The Treach­er­ous Turn”

Daniel KokotajloNov 30, 2022, 7:16 AM
51 points

22 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Has AI gone too far?

Boston AndersonNov 30, 2022, 6:49 PM
−15 points

7 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Seek­ing sub­mis­sions for short AI-safety course proposals

SergioDec 1, 2022, 12:32 AM
3 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Did ChatGPT just gaslight me?

TW123Dec 1, 2022, 5:41 AM
123 points

81 votes

Overall karma indicates overall quality.

45 comments9 min readLW link
(equonc.substack.com)

Safe Devel­op­ment of Hacker-AI Coun­ter­mea­sures – What if we are too late?

Erland WittkotterDec 1, 2022, 7:59 AM
3 points

3 votes

Overall karma indicates overall quality.

0 comments14 min readLW link

Re­search re­quest (al­ign­ment strat­egy): Deep dive on “mak­ing AI solve al­ign­ment for us”

JanBDec 1, 2022, 2:55 PM
16 points

7 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

[LINK] - ChatGPT discussion

JanBDec 1, 2022, 3:04 PM
13 points

6 votes

Overall karma indicates overall quality.

7 comments1 min readLW link
(openai.com)

ChatGPT: First Impressions

specbugDec 1, 2022, 4:36 PM
18 points

13 votes

Overall karma indicates overall quality.

2 comments13 min readLW link
(sixeleven.in)

Re-Ex­am­in­ing LayerNorm

Eric WinsorDec 1, 2022, 10:20 PM
100 points

44 votes

Overall karma indicates overall quality.

8 comments5 min readLW link

Up­date on Har­vard AI Safety Team and MIT AI Alignment

Dec 2, 2022, 12:56 AM
56 points

24 votes

Overall karma indicates overall quality.

4 comments8 min readLW link

De­con­fus­ing Direct vs Amor­tised Optimization

berenDec 2, 2022, 11:30 AM
48 points

22 votes

Overall karma indicates overall quality.

6 comments10 min readLW link

[ASoT] Fine­tun­ing, RL, and GPT’s world prior

JozdienDec 2, 2022, 4:33 PM
31 points

19 votes

Overall karma indicates overall quality.

8 comments5 min readLW link

Take­off speeds, the chimps anal­ogy, and the Cul­tural In­tel­li­gence Hypothesis

NickGabsDec 2, 2022, 7:14 PM
14 points

12 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

Non-Tech­ni­cal Prepa­ra­tion for Hacker-AI and Cy­ber­war 2.0+

Erland WittkotterDec 19, 2022, 11:42 AM
2 points

4 votes

Overall karma indicates overall quality.

0 comments25 min readLW link

Ap­ply for the ML Up­skil­ling Win­ter Camp in Cam­bridge, UK [2-10 Jan]

hannah wing-yeeDec 2, 2022, 8:45 PM
3 points

2 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Re­search Prin­ci­ples for 6 Months of AI Align­ment Studies

Shoshannah TekofskyDec 2, 2022, 10:55 PM
22 points

7 votes

Overall karma indicates overall quality.

3 comments6 min readLW link

Chat GPT’s views on Me­ta­physics and Ethics

Cole KillianDec 3, 2022, 6:12 PM
5 points

6 votes

Overall karma indicates overall quality.

3 comments1 min readLW link
(twitter.com)

[Question] Will the first AGI agent have been de­signed as an agent (in ad­di­tion to an AGI)?

nahojDec 3, 2022, 8:32 PM
1 point

2 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

Could an AI be Reli­gious?

mk54Dec 4, 2022, 5:00 AM
−12 points

7 votes

Overall karma indicates overall quality.

14 comments1 min readLW link

ChatGPT seems over­con­fi­dent to me

qbolecDec 4, 2022, 8:03 AM
19 points

6 votes

Overall karma indicates overall quality.

3 comments16 min readLW link

AI can ex­ploit safety plans posted on the Internet

Peter S. ParkDec 4, 2022, 12:17 PM
−19 points

11 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Race to the Top: Bench­marks for AI Safety

Isabella DuanDec 4, 2022, 6:48 PM
12 points

6 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Take 3: No in­de­scrib­able heav­en­wor­lds.

Charlie SteinerDec 4, 2022, 2:48 AM
21 points

7 votes

Overall karma indicates overall quality.

12 comments2 min readLW link

ChatGPT is set­tling the Chi­nese Room argument

averrosDec 4, 2022, 8:25 PM
−7 points

10 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

AGI as a Black Swan Event

Stephen McAleeseDec 4, 2022, 11:00 PM
8 points

9 votes

Overall karma indicates overall quality.

8 comments7 min readLW link

Prob­a­bly good pro­jects for the AI safety ecosystem

Ryan KiddDec 5, 2022, 2:26 AM
73 points

46 votes

Overall karma indicates overall quality.

15 comments2 min readLW link

A ChatGPT story about ChatGPT doom

Matt HeDec 5, 2022, 5:40 AM
6 points

7 votes

Overall karma indicates overall quality.

3 comments4 min readLW link

Aligned Be­hav­ior is not Ev­i­dence of Align­ment Past a Cer­tain Level of Intelligence

Ronny FernandezDec 5, 2022, 3:19 PM
19 points

9 votes

Overall karma indicates overall quality.

5 comments7 min readLW link

Is the “Valley of Con­fused Ab­strac­tions” real?

jacquesthibsDec 5, 2022, 1:36 PM
15 points

10 votes

Overall karma indicates overall quality.

9 comments2 min readLW link

Anal­y­sis of AI Safety sur­veys for field-build­ing insights

Ash JafariDec 5, 2022, 7:21 PM
10 points

4 votes

Overall karma indicates overall quality.

2 comments4 min readLW link

Test­ing Ways to By­pass ChatGPT’s Safety Features

Robert_AIZIDec 5, 2022, 6:50 PM
6 points

6 votes

Overall karma indicates overall quality.

2 comments5 min readLW link
(aizi.substack.com)

ChatGPT on Spielberg’s A.I. and AI Alignment

Bill BenzonDec 5, 2022, 9:10 PM
5 points

5 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Shh, don’t tell the AI it’s likely to be evil

naterushDec 6, 2022, 3:35 AM
19 points

9 votes

Overall karma indicates overall quality.

9 comments1 min readLW link

Neu­ral net­works bi­ased to­wards ge­o­met­ri­cally sim­ple func­tions?

DavidHolmesDec 8, 2022, 4:16 PM
16 points

7 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

Things roll downhill

awenonianDec 6, 2022, 3:27 PM
19 points

13 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

ChatGPT and the Hu­man Race

Ben ReillyDec 6, 2022, 9:38 PM
6 points

8 votes

Overall karma indicates overall quality.

1 comment3 min readLW link

AI Safety in a Vuln­er­a­ble World: Re­quest­ing Feed­back on Pre­limi­nary Thoughts

Jordan ArelDec 6, 2022, 10:35 PM
3 points

2 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

In defense of prob­a­bly wrong mechanis­tic models

evhubDec 6, 2022, 11:24 PM
41 points

24 votes

Overall karma indicates overall quality.

10 comments2 min readLW link

ChatGPT: “An er­ror oc­curred. If this is­sue per­sists...”

Bill BenzonDec 7, 2022, 3:41 PM
5 points

4 votes

Overall karma indicates overall quality.

11 comments3 min readLW link

Where to be an AI Safety Pro­fes­sor

scasperDec 7, 2022, 7:09 AM
30 points

18 votes

Overall karma indicates overall quality.

12 comments2 min readLW link

Thoughts on AGI or­ga­ni­za­tions and ca­pa­bil­ities work

Dec 7, 2022, 7:46 PM
94 points

37 votes

Overall karma indicates overall quality.

17 comments5 min readLW link

Riffing on the agent type

QuinnDec 8, 2022, 12:19 AM
16 points

6 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Of pump­kins, the Fal­con Heavy, and Grou­cho Marx: High-Level dis­course struc­ture in ChatGPT

Bill BenzonDec 8, 2022, 10:25 PM
2 points

1 vote

Overall karma indicates overall quality.

0 comments8 min readLW link

Why I’m Scep­ti­cal of Foom

DragonGodDec 8, 2022, 10:01 AM
19 points

11 votes

Overall karma indicates overall quality.

26 comments3 min readLW link

If Went­worth is right about nat­u­ral ab­strac­tions, it would be bad for alignment

Wuschel SchulzDec 8, 2022, 3:19 PM
27 points

16 votes

Overall karma indicates overall quality.

5 comments4 min readLW link

Take 7: You should talk about “the hu­man’s util­ity func­tion” less.

Charlie SteinerDec 8, 2022, 8:14 AM
47 points

24 votes

Overall karma indicates overall quality.

22 comments2 min readLW link

Notes on OpenAI’s al­ign­ment plan

Alex FlintDec 8, 2022, 7:13 PM
47 points

25 votes

Overall karma indicates overall quality.

5 comments7 min readLW link

We need to make scary AIs

Igor IvanovDec 9, 2022, 10:04 AM
3 points

11 votes

Overall karma indicates overall quality.

8 comments5 min readLW link

I Believe we are in a Hard­ware Overhang

nemDec 8, 2022, 11:18 PM
8 points

4 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

[Question] What are your thoughts on the fu­ture of AI-as­sisted soft­ware de­vel­op­ment?

RomanHaukssonDec 9, 2022, 10:04 AM
4 points

2 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

ChatGPT’s Misal­ign­ment Isn’t What You Think

stavrosDec 9, 2022, 11:11 AM
3 points

9 votes

Overall karma indicates overall quality.

12 comments1 min readLW link

Si­mu­la­tors and Mindcrime

DragonGodDec 9, 2022, 3:20 PM
0 points

8 votes

Overall karma indicates overall quality.

4 comments3 min readLW link

Work­ing to­wards AI al­ign­ment is better

Johannes C. MayerDec 9, 2022, 3:39 PM
7 points

8 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

[Question] How would you im­prove ChatGPT’s fil­ter­ing?

Noah ScalesDec 10, 2022, 8:05 AM
9 points

3 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

In­spira­tion as a Scarce Resource

zenbu zenbu zenbu zenbuDec 10, 2022, 3:23 PM
7 points

4 votes

Overall karma indicates overall quality.

0 comments4 min readLW link
(inflorescence.substack.com)

Poll Re­sults on AGI

Niclas KupperDec 10, 2022, 9:25 PM
10 points

6 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

The Op­por­tu­nity and Risks of Learn­ing Hu­man Values In-Context

Past AccountDec 10, 2022, 9:40 PM
1 point

4 votes

Overall karma indicates overall quality.

4 comments5 min readLW link

High level dis­course struc­ture in ChatGPT: Part 2 [Quasi-sym­bolic?]

Bill BenzonDec 10, 2022, 10:26 PM
7 points

3 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

ChatGPT goes through a worm­hole hole in our Shandyesque uni­verse [vir­tual wacky weed]

Bill BenzonDec 11, 2022, 11:59 AM
−1 points

5 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

Ques­tions about AI that bother me

Eleni AngelouDec 11, 2022, 6:14 PM
11 points

8 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

Reflec­tions on the PIBBSS Fel­low­ship 2022

Dec 11, 2022, 9:53 PM
31 points

16 votes

Overall karma indicates overall quality.

0 comments18 min readLW link

Bench­marks for Com­par­ing Hu­man and AI Intelligence

MrThinkDec 11, 2022, 10:06 PM
8 points

4 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

a rough sketch of for­mal al­igned AI us­ing QACI

Tamsin LeakeDec 11, 2022, 11:40 PM
14 points

4 votes

Overall karma indicates overall quality.

0 comments4 min readLW link
(carado.moe)

Triv­ial GPT-3.5 limi­ta­tion workaround

Dave92F1Dec 12, 2022, 8:42 AM
5 points

9 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

[Question] Thought ex­per­i­ment. If hu­man minds could be har­nessed into one uni­ver­sal con­scious­ness of hu­man­ity, would we dis­cover things that have been quite difficult to reach with the means of mod­ern sci­ence? And would the con­scious­ness of hu­man­ity be more com­pre­hen­sive than the fu­ture power of ar­tifi­cial in­tel­li­gence?

lotta liedesDec 12, 2022, 2:43 PM
−1 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Mean­ingful things are those the uni­verse pos­sesses a se­man­tics for

Abhimanyu Pallavi SudhirDec 12, 2022, 4:03 PM
7 points

6 votes

Overall karma indicates overall quality.

14 comments14 min readLW link

Let’s go meta: Gram­mat­i­cal knowl­edge and self-refer­en­tial sen­tences [ChatGPT]

Bill BenzonDec 12, 2022, 9:50 PM
5 points

3 votes

Overall karma indicates overall quality.

0 comments9 min readLW link

[Question] Are law­suits against AGI com­pa­nies ex­tend­ing AGI timelines?

SlowingAGIDec 13, 2022, 6:00 AM
1 point

1 vote

Overall karma indicates overall quality.

1 comment1 min readLW link

An ex­plo­ra­tion of GPT-2′s em­bed­ding weights

Adam ScherlisDec 13, 2022, 12:46 AM
26 points

13 votes

Overall karma indicates overall quality.

2 comments10 min readLW link

Re­vis­it­ing al­gorith­mic progress

Dec 13, 2022, 1:39 AM
92 points

31 votes

Overall karma indicates overall quality.

8 comments2 min readLW link
(arxiv.org)

Align­ment with ar­gu­ment-net­works and as­sess­ment-predictions

Tor Økland BarstadDec 13, 2022, 2:17 AM
7 points

2 votes

Overall karma indicates overall quality.

3 comments45 min readLW link

Limits of Superintelligence

Aleksei PetrenkoDec 13, 2022, 12:19 PM
1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

[Question] Best in­tro­duc­tory overviews of AGI safety?

JakubKDec 13, 2022, 7:01 PM
14 points

8 votes

Overall karma indicates overall quality.

5 comments2 min readLW link
(forum.effectivealtruism.org)

Seek­ing par­ti­ci­pants for study of AI safety researchers

joelegardnerDec 13, 2022, 9:58 PM
2 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Assess­ing the Ca­pa­bil­ities of ChatGPT through Suc­cess Rates

Past AccountDec 13, 2022, 9:16 PM
5 points

2 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Dis­cov­er­ing La­tent Knowl­edge in Lan­guage Models Without Supervision

XodarapDec 14, 2022, 12:32 PM
45 points

19 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(arxiv.org)

all claw, no world — and other thoughts on the uni­ver­sal distribution

Tamsin LeakeDec 14, 2022, 6:55 PM
14 points

4 votes

Overall karma indicates overall quality.

0 comments7 min readLW link
(carado.moe)

Con­trary to List of Lethal­ity’s point 22, al­ign­ment’s door num­ber 2

False NameDec 14, 2022, 10:01 PM
0 points

2 votes

Overall karma indicates overall quality.

1 comment22 min readLW link

ChatGPT has a HAL Problem

Paul AndersonDec 14, 2022, 9:31 PM
1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

How “Dis­cov­er­ing La­tent Knowl­edge in Lan­guage Models Without Su­per­vi­sion” Fits Into a Broader Align­ment Scheme

CollinDec 15, 2022, 6:22 PM
124 points

47 votes

Overall karma indicates overall quality.

18 comments16 min readLW link

Avoid­ing Psy­cho­pathic AI

Cameron BergDec 19, 2022, 5:01 PM
28 points

17 votes

Overall karma indicates overall quality.

3 comments20 min readLW link

We’ve stepped over the thresh­old into the Fourth Arena, but don’t rec­og­nize it

Bill BenzonDec 15, 2022, 8:22 PM
2 points

3 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

AI Safety Move­ment Builders should help the com­mu­nity to op­ti­mise three fac­tors: con­trib­u­tors, con­tri­bu­tions and coordination

peterslatteryDec 15, 2022, 10:50 PM
4 points

2 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

Proper scor­ing rules don’t guaran­tee pre­dict­ing fixed points

Dec 16, 2022, 6:22 PM
55 points

24 votes

Overall karma indicates overall quality.

5 comments21 min readLW link

A learned agent is not the same as a learn­ing agent

Ben AmitayDec 16, 2022, 5:27 PM
4 points

3 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

Ab­stract con­cepts and met­al­in­gual defi­ni­tion: Does ChatGPT un­der­stand jus­tice and char­ity?

Bill BenzonDec 16, 2022, 9:01 PM
2 points

1 vote

Overall karma indicates overall quality.

0 comments13 min readLW link

Us­ing In­for­ma­tion The­ory to tackle AI Align­ment: A Prac­ti­cal Approach

Daniel SalamiDec 17, 2022, 1:37 AM
6 points

6 votes

Overall karma indicates overall quality.

4 comments8 min readLW link

Look­ing for an al­ign­ment tutor

JanBDec 17, 2022, 7:08 PM
15 points

9 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

What we owe the microbiome

weverkaDec 17, 2022, 7:40 PM
2 points

6 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(forum.effectivealtruism.org)

Bad at Arith­metic, Promis­ing at Math

cohenmacaulayDec 18, 2022, 5:40 AM
91 points

44 votes

Overall karma indicates overall quality.

17 comments20 min readLW link

AGI is here, but no­body wants it. Why should we even care?

MGowDec 20, 2022, 7:14 PM
−20 points

6 votes

Overall karma indicates overall quality.

0 comments17 min readLW link

Hacker-AI and Cy­ber­war 2.0+

Erland WittkotterDec 19, 2022, 11:46 AM
2 points

4 votes

Overall karma indicates overall quality.

0 comments15 min readLW link

Does ChatGPT’s perfor­mance war­rant work­ing on a tu­tor for chil­dren? [It’s time to take it to the lab.]

Bill BenzonDec 19, 2022, 3:12 PM
13 points

6 votes

Overall karma indicates overall quality.

2 comments4 min readLW link
(new-savanna.blogspot.com)

Re­sults from a sur­vey on tool use and work­flows in al­ign­ment research

Dec 19, 2022, 3:19 PM
50 points

31 votes

Overall karma indicates overall quality.

2 comments19 min readLW link

Pro­lifer­at­ing Education

Haris RashidDec 20, 2022, 7:22 PM
−1 points

6 votes

Overall karma indicates overall quality.

2 comments5 min readLW link
(www.harisrab.com)

[Question] Will re­search in AI risk jinx it? Con­se­quences of train­ing AI on AI risk arguments

Yann DuboisDec 19, 2022, 10:42 PM
5 points

3 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

AGI Timelines in Gover­nance: Differ­ent Strate­gies for Differ­ent Timeframes

Dec 19, 2022, 9:31 PM
47 points

28 votes

Overall karma indicates overall quality.

15 comments10 min readLW link

(Ex­tremely) Naive Gra­di­ent Hack­ing Doesn’t Work

ojorgensenDec 20, 2022, 2:35 PM
6 points

3 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

An Open Agency Ar­chi­tec­ture for Safe Trans­for­ma­tive AI

davidadDec 20, 2022, 1:04 PM
18 points

11 votes

Overall karma indicates overall quality.

12 comments4 min readLW link

Prop­er­ties of cur­rent AIs and some pre­dic­tions of the evolu­tion of AI from the per­spec­tive of scale-free the­o­ries of agency and reg­u­la­tive development

Roman LeventovDec 20, 2022, 5:13 PM
7 points

6 votes

Overall karma indicates overall quality.

0 comments36 min readLW link

I be­lieve some AI doomers are overconfident

FTPickleDec 20, 2022, 5:09 PM
10 points

14 votes

Overall karma indicates overall quality.

14 comments2 min readLW link

Perform­ing an SVD on a time-se­ries ma­trix of gra­di­ent up­dates on an MNIST net­work pro­duces 92.5 sin­gu­lar values

Garrett BakerDec 21, 2022, 12:44 AM
8 points

7 votes

Overall karma indicates overall quality.

10 comments5 min readLW link

CIRL Cor­rigi­bil­ity is Fragile

Dec 21, 2022, 1:40 AM
21 points

8 votes

Overall karma indicates overall quality.

1 comment12 min readLW link

New AI risk in­tro from Vox [link post]

JakubKDec 21, 2022, 6:00 AM
5 points

3 votes

Overall karma indicates overall quality.

1 comment2 min readLW link
(www.vox.com)