Academic Papers

TagLast edit: 9 Jul 2020 11:36 UTC by Kaj_Sotala

Posts either linking to, or summarizing, formal papers published elsewhere.

Some AI research areas and their relevance to existential safety

Andrew_Critch19 Nov 2020 3:18 UTC

204 points

37 comments50 min readLW link 2 reviews

Striking Implications for Learning Theory, Interpretability — and Safety?

RogerDearnaley5 Jan 2024 8:46 UTC

36 points

4 comments2 min readLW link

How to Control an LLM’s Behavior (why my P(DOOM) went down)

RogerDearnaley28 Nov 2023 19:56 UTC

64 points

30 comments11 min readLW link

Thirty-three randomly selected bioethics papers

Rob Bensinger22 Mar 2021 21:38 UTC

112 points

46 comments50 min readLW link

My Reservations about Discovering Latent Knowledge (Burns, Ye, et al)

Robert_AIZI27 Dec 2022 17:27 UTC

50 points

0 comments4 min readLW link

(aizi.substack.com)

Notes/blog posts on two recent MIRI papers

Quinn14 Jul 2013 23:11 UTC

35 points

3 comments1 min readLW link

Evidence of Learned Look-Ahead in a Chess-Playing Neural Network

Erik Jenner4 Jun 2024 15:50 UTC

119 points

14 comments13 min readLW link

Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs

L Rudolf L, bilalchughtai, jan betley, kaivu, Jérémy Scheurer, Mikita Balesni, AlexMeinke, Owain_Evans and Marius Hobbhahn

8 Jul 2024 22:24 UTC

99 points

27 comments5 min readLW link

New paper: Long-Term Trajectories of Human Civilization

Kaj_Sotala12 Aug 2018 9:10 UTC

33 points

1 comment2 min readLW link

(kajsotala.fi)

Study on what makes people approve or condemn mind upload technology; references LW

Kaj_Sotala10 Jul 2018 17:14 UTC

22 points

0 comments2 min readLW link

(www.nature.com)

AGI Safety Literature Review (Everitt, Lea & Hutter 2018)

Kaj_Sotala4 May 2018 8:56 UTC

14 points

1 comment1 min readLW link

(arxiv.org)

Some conceptual highlights from “Disjunctive Scenarios of Catastrophic AI Risk”

Kaj_Sotala12 Feb 2018 12:30 UTC

45 points

4 comments6 min readLW link

(kajsotala.fi)

Papers for 2017

Kaj_Sotala4 Jan 2018 13:30 UTC

12 points

2 comments2 min readLW link

(kajsotala.fi)

Paper: Superintelligence as a Cause or Cure for Risks of Astronomical Suffering

Kaj_Sotala3 Jan 2018 13:57 UTC

13 points

0 comments1 min readLW link

(www.informatica.si)

Social Choice Ethics in Artificial Intelligence (paper challenging CEV-like approaches to choosing an AI’s values)

Kaj_Sotala3 Oct 2017 17:39 UTC

3 points

0 comments1 min readLW link

(papers.ssrn.com)

[link] Why Self-Control Seems (but may not be) Limited

Kaj_Sotala20 Jan 2014 16:55 UTC

55 points

10 comments3 min readLW link

Kurzban et al. on opportunity cost models of mental fatigue and resource-based models of willpower

Kaj_Sotala6 Dec 2013 9:54 UTC

34 points

18 comments5 min readLW link

Fallacies as weak Bayesian evidence

Kaj_Sotala18 Mar 2012 3:53 UTC

87 points

42 comments10 min readLW link

I Was Not Almost Wrong But I Was Almost Right: Close-Call Counterfactuals and Bias

Kaj_Sotala8 Mar 2012 5:39 UTC

86 points

40 comments9 min readLW link

[Preprint for commenting] Digital Immortality: Theory and Protocol for Indirect Mind Uploading

avturchin27 Mar 2018 11:49 UTC

8 points

5 comments1 min readLW link

IJMC Mind Uploading Special Issue published

Kaj_Sotala22 Jun 2012 11:58 UTC

19 points

12 comments1 min readLW link

Bad news for uploading

PhilGoetz13 Dec 2012 23:32 UTC

19 points

6 comments1 min readLW link

“Personal Identity and Uploading”, by Mark Walker

gwern7 Jan 2012 19:55 UTC

7 points

19 comments16 min readLW link

“Ray Kurzweil and Uploading: Just Say No!”, Nick Agar

gwern2 Dec 2011 21:42 UTC

6 points

79 comments6 min readLW link

Publication of “Anthropic Decision Theory”

Stuart_Armstrong20 Sep 2017 15:41 UTC

12 points

9 comments1 min readLW link

SSC Journal Club: AI Timelines

Scott Alexander8 Jun 2017 19:00 UTC

15 points

16 comments8 min readLW link

Computerphile discusses MIRI’s “Logical Induction” paper

Parth Athley4 Oct 2018 16:00 UTC

43 points

2 comments1 min readLW link

(www.youtube.com)

New paper from MIRI: “Toward idealized decision theory”

So8res16 Dec 2014 22:27 UTC

41 points

22 comments3 min readLW link

[LINK] International variation in IQ – the role of parasites

David_Gerard14 May 2012 12:08 UTC

10 points

49 comments1 min readLW link

IQ Scores Fail to Predict Academic Performance in Children With Autism

InquilineKea18 Nov 2010 3:34 UTC

9 points

9 comments2 min readLW link

[LINK] Neuroscientists Find That Status within Groups Can Affect IQ

cafesofie23 Jan 2012 19:52 UTC

6 points

5 comments1 min readLW link

New report: Intelligence Explosion Microeconomics

Eliezer Yudkowsky29 Apr 2013 23:14 UTC

72 points

246 comments3 min readLW link

The Chromatic Number of the Plane is at Least 5 - Aubrey de Grey

Scott Garrabrant11 Apr 2018 18:19 UTC

61 points

5 comments1 min readLW link

(arxiv.org)

[Question] Why is pseudo-alignment “worse” than other ways ML can fail to generalize?

nostalgebraist18 Jul 2020 22:54 UTC

45 points

9 comments2 min readLW link

Stanford Encyclopedia of Philosophy on AI ethics and superintelligence

Kaj_Sotala2 May 2020 7:35 UTC

43 points

19 comments7 min readLW link

(plato.stanford.edu)

Multiverse-wide Cooperation via Correlated Decision Making

Kaj_Sotala20 Aug 2017 12:01 UTC

5 points

2 comments1 min readLW link

(foundational-research.org)

A technical note on bilinear layers for interpretability

Lee Sharkey8 May 2023 6:06 UTC

56 points

0 comments1 min readLW link

(arxiv.org)

Papers, Please #1: Various Papers on Employment, Wages and Productivity

Zvi22 May 2023 12:00 UTC

42 points

2 comments8 min readLW link

(thezvi.wordpress.com)

Aumann Agreement by Combat

roryokane5 Apr 2019 5:15 UTC

14 points

2 comments1 min readLW link

(sigbovik.org)

“A Definition of Subjective Probability” by Anscombe and Aumann

JonahS24 Jan 2014 20:30 UTC

14 points

2 comments2 min readLW link

Snyder-Beattie, Sandberg, Drexler & Bonsall (2020): The Timing of Evolutionary Transitions Suggests Intelligent Life Is Rare

Kaj_Sotala24 Nov 2020 10:36 UTC

83 points

20 comments2 min readLW link

(www.liebertpub.com)

[Paper] The Global Catastrophic Risks of the Possibility of Finding Alien AI During SETI

avturchin28 Aug 2018 21:32 UTC

13 points

2 comments1 min readLW link

Comment on “Endogenous Epistemic Factionalization”

Zack_M_Davis20 May 2020 18:04 UTC

151 points

8 comments7 min readLW link

Optimized Propaganda with Bayesian Networks: Comment on “Articulating Lay Theories Through Graphical Models”

Zack_M_Davis29 Jun 2020 2:45 UTC

105 points

10 comments4 min readLW link

Formal Solution to the Inner Alignment Problem

michaelcohen18 Feb 2021 14:51 UTC

49 points

123 comments2 min readLW link

Deep limitations? Examining expert disagreement over deep learning

Richard_Ngo27 Jun 2021 0:55 UTC

18 points

6 comments1 min readLW link

(link.springer.com)

Entropic boundary conditions towards safe artificial superintelligence

Santiago Nunez-Corrales20 Jul 2021 22:15 UTC

3 points

0 comments2 min readLW link

(www.tandfonline.com)

Comment on “Deception as Cooperation”

Zack_M_Davis27 Nov 2021 4:04 UTC

23 points

4 comments7 min readLW link

2021 AI Alignment Literature Review and Charity Comparison

Larks23 Dec 2021 14:06 UTC

168 points

28 comments73 min readLW link

Reading the ethicists: A review of articles on AI in the journal Science and Engineering Ethics

Charlie Steiner18 May 2022 20:52 UTC

50 points

8 comments14 min readLW link

Paper: Forecasting world events with neural nets

Owain_Evans, Dan H and Joe Kwon

1 Jul 2022 19:40 UTC

39 points

3 comments4 min readLW link

Poster Session on AI Safety

Neil Crawford12 Nov 2022 3:50 UTC

7 points

6 comments1 min readLW link

How to Read Papers Efficiently: Fast-then-Slow Three pass method

the gears to ascension, 1stuserhere and lastuserhere

25 Feb 2023 2:56 UTC

34 points

4 comments4 min readLW link

(ccr.sigcomm.org)

Learning preferences by looking at the world

Rohin Shah12 Feb 2019 22:25 UTC

43 points

10 comments7 min readLW link

(bair.berkeley.edu)

[Question] How Old is Smallpox?

Raemon10 Dec 2018 10:50 UTC

44 points

5 comments2 min readLW link

Is Caviar a Risk Factor For Being a Millionaire?

Anders_H9 Dec 2016 16:27 UTC

67 points

9 comments1 min readLW link

[Link] Computer improves its Civilization II gameplay by reading the manual

Kaj_Sotala13 Jul 2011 12:00 UTC

49 points

5 comments4 min readLW link

Article Review: Discovering Latent Knowledge (Burns, Ye, et al)

Robert_AIZI22 Dec 2022 18:16 UTC

13 points

4 comments6 min readLW link

(aizi.substack.com)

A Summary Of Anthropic’s First Paper

Sam Ringer30 Dec 2021 0:48 UTC

85 points

1 comment8 min readLW link

Generalizing Experimental Results by Leveraging Knowledge of Mechanisms

Carlos_Cinelli11 Dec 2019 20:39 UTC

50 points

5 comments1 min readLW link

New paper: Corrigibility with Utility Preservation

Koen.Holtman6 Aug 2019 19:04 UTC

44 points

11 comments2 min readLW link

Memory, nutrition, motivation, and genes

PhilGoetz26 Feb 2013 5:25 UTC

24 points

12 comments2 min readLW link

Human-AI Collaboration

Rohin Shah22 Oct 2019 6:32 UTC

42 points

7 comments2 min readLW link

(bair.berkeley.edu)

“Everything is Correlated”: An Anthology of the Psychology Debate

gwern27 Apr 2019 13:48 UTC

40 points

2 comments1 min readLW link

(www.gwern.net)

Skepticism About DeepMind’s “Grandmaster-Level” Chess Without Search

Arjun Panickssery12 Feb 2024 0:56 UTC

55 points

13 comments3 min readLW link

A discussion of the paper, “Large Language Models are Zero-Shot Reasoners”

HiroSakuraba26 May 2022 15:55 UTC

7 points

0 comments4 min readLW link

David Chalmers’ “The Singularity: A Philosophical Analysis”

lukeprog29 Jan 2011 2:52 UTC

55 points

203 comments4 min readLW link

Let’s Discuss Functional Decision Theory

Chris_Leong23 Jul 2018 7:24 UTC

29 points

18 comments1 min readLW link

Introducing Corrigibility (an FAI research subfield)

So8res20 Oct 2014 21:09 UTC

52 points

28 comments3 min readLW link

Counterfactual outcome state transition parameters

Anders_H27 Jul 2018 21:13 UTC

37 points

1 comment6 min readLW link

How to escape from your sandbox and from your hardware host

PhilGoetz31 Jul 2015 17:26 UTC

43 points

28 comments1 min readLW link

Oracle paper

Stuart_Armstrong13 Dec 2017 14:59 UTC

12 points

7 comments1 min readLW link

New paper: The Incentives that Shape Behaviour

RyanCarey23 Jan 2020 19:07 UTC

23 points

5 comments1 min readLW link

(arxiv.org)

Dissolving the Fermi Paradox, and what reflection it provides

Jan_Kulveit30 Jun 2018 16:35 UTC

28 points

22 comments1 min readLW link

(arxiv.org)

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

DragonGod6 Dec 2017 6:01 UTC

13 points

4 comments1 min readLW link

(arxiv.org)

Summary: Surreal Decisions

Chris_Leong27 Nov 2018 14:15 UTC

29 points

20 comments3 min readLW link

How Big a Deal are MatMul-Free Transformers?

JustisMills27 Jun 2024 22:28 UTC

19 points

6 comments5 min readLW link

(justismills.substack.com)

To Learn Critical Thinking, Study Critical Thinking

gwern7 Jul 2012 23:50 UTC

41 points

16 comments11 min readLW link

An Overview of Sparks of Artificial General Intelligence: Early experiments with GPT-4

Annapurna27 Mar 2023 13:44 UTC

10 points

0 comments7 min readLW link

(jorgevelez.substack.com)

Paper digestion: “May We Have Your Attention Please? Human-Rights NGOs and the Problem of Global Communication”

Klara Helene Nielsen20 Jul 2023 17:08 UTC

4 points

1 comment2 min readLW link

(journals.sagepub.com)

The Physiology of Willpower

pjeby18 Jun 2009 4:11 UTC

25 points

36 comments1 min readLW link

Experts vs. parents

PhilGoetz29 Sep 2010 16:48 UTC

24 points

23 comments1 min readLW link

The Mind Is Not Designed For Thinking

CronoDAS26 Mar 2009 21:57 UTC

9 points

7 comments1 min readLW link

[Link] Persistence of Long-Term Memory in Vitrified and Revived C. elegans worms

Rangi24 May 2015 3:43 UTC

34 points

8 comments1 min readLW link

[Question] Can this model grade a test without knowing the answers?

Elizabeth31 Aug 2019 0:53 UTC

20 points

3 comments1 min readLW link

Implications of Quantum Computing for Artificial Intelligence Alignment Research

Jsevillamol and PabloAMC

22 Aug 2019 10:33 UTC

24 points

3 comments13 min readLW link

The theory of Proximal Policy Optimisation implementations

salman.mohammadi11 Apr 2024 13:00 UTC

3 points

1 comment6 min readLW link

(salmanmohammadi.github.io)

Citability of Lesswrong and the Alignment Forum

Leon Lang8 Jan 2023 22:12 UTC

48 points

2 comments1 min readLW link

Link: Writing exercise closes the gender gap in university-level physics

Vladimir_Golovin27 Nov 2010 16:28 UTC

27 points

9 comments1 min readLW link

Donohue, Levitt, Roe, and Wade: T-minus 20 years to a massive crime wave?

Paul Logan3 Jul 2022 3:03 UTC

−24 points

6 comments3 min readLW link

(laulpogan.substack.com)

Over-encapsulation

PhilGoetz25 Mar 2010 17:58 UTC

29 points

56 comments3 min readLW link

FHI paper published in Science: interventions against COVID-19

SoerenMind16 Dec 2020 21:19 UTC

119 points

0 comments3 min readLW link

VLM-RM: Specifying Rewards with Natural Language

ChengCheng, David Lindner and Ethan Perez

23 Oct 2023 14:11 UTC

20 points

2 comments5 min readLW link

(far.ai)

NeurIPS ML Safety Workshop 2022

Dan H26 Jul 2022 15:28 UTC

72 points

2 comments1 min readLW link

(neurips2022.mlsafety.org)

[Question] How can we secure more research positions at our universities for x-risk researchers?

Neil Crawford6 Sep 2022 17:17 UTC

11 points

0 comments1 min readLW link

That one apocalyptic nuclear famine paper is bunk

Lao Mein12 Oct 2022 3:33 UTC

110 points

10 comments1 min readLW link

Hope Function

gwern1 Jul 2012 15:40 UTC

38 points

8 comments1 min readLW link

Rawls’s Veil of Ignorance Doesn’t Make Any Sense

Arjun Panickssery24 Feb 2024 13:18 UTC

10 points

9 comments1 min readLW link

Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

Henry Cai16 Jun 2024 13:01 UTC

7 points

0 comments7 min readLW link

(arxiv.org)

How You Can Gain Self Control Without “Self-Control”

spencerg24 Mar 2021 23:38 UTC

109 points

41 comments23 min readLW link

Functional Trade-offs

weathersystems19 May 2021 1:06 UTC

5 points

0 comments6 min readLW link

“Are Experiments Possible?” Seeds of Science call for reviewers

rogersbacon2 Nov 2022 20:05 UTC

8 points

0 comments1 min readLW link

Characterizing Intrinsic Compositionality in Transformers with Tree Projections

Ulisse Mini13 Nov 2022 9:46 UTC

12 points

2 comments1 min readLW link

(arxiv.org)

How truthful is GPT-3? A benchmark for language models

Owain_Evans16 Sep 2021 10:09 UTC

58 points

24 comments6 min readLW link

Walkthrough of the Tiling Agents for Self-Modifying AI paper

So8res13 Dec 2013 3:23 UTC

29 points

18 comments21 min readLW link

Doing your good deed for the day

Scott Alexander27 Oct 2009 0:45 UTC

152 points

57 comments3 min readLW link

[linkpost] Acquisition of Chess Knowledge in AlphaZero

Quintin Pope23 Nov 2021 7:55 UTC

8 points

1 comment1 min readLW link

Demanding and Designing Aligned Cognitive Architectures

Koen.Holtman21 Dec 2021 17:32 UTC

8 points

5 comments5 min readLW link

Even if you have a nail, not all hammers are the same

PhilGoetz29 Mar 2010 18:09 UTC

150 points

126 comments6 min readLW link

Less Competition, More Meritocracy?

Zvi20 Jan 2019 2:00 UTC

85 points

19 comments20 min readLW link 3 reviews

(thezvi.wordpress.com)

A New Interpretation of the Marshmallow Test

elharo5 Jul 2013 12:22 UTC

119 points

25 comments2 min readLW link

Good News for Immunostimulants

sarahconstantin16 Apr 2018 16:10 UTC

26 points

9 comments2 min readLW link

(srconstantin.wordpress.com)

Let’s Read: Superhuman AI for multiplayer poker

Yuxi_Liu14 Jul 2019 6:22 UTC

56 points

6 comments8 min readLW link

Tiling Agents for Self-Modifying AI (OPFAI #2)

Eliezer Yudkowsky6 Jun 2013 20:24 UTC

88 points

259 comments3 min readLW link

The Vulnerable World Hypothesis (by Bostrom)

Ben Pace6 Nov 2018 20:05 UTC

50 points

17 comments4 min readLW link

(nickbostrom.com)

DeepMind article: AI Safety Gridworlds

scarcegreengrass30 Nov 2017 16:13 UTC

25 points

6 comments1 min readLW link

(deepmind.com)

Claims & Assumptions made in Eternity in Six Hours

Ruby8 May 2019 23:11 UTC

50 points

7 comments3 min readLW link

[1911.08265] Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | Arxiv

DragonGod21 Nov 2019 1:18 UTC

52 points

4 comments1 min readLW link

(arxiv.org)

Effect heterogeneity and external validity in medicine

Anders_H25 Oct 2019 20:53 UTC

49 points

14 comments7 min readLW link

Learning biases and rewards simultaneously

Rohin Shah6 Jul 2019 1:45 UTC

41 points

3 comments4 min readLW link

Reasoning isn’t about logic (it’s about arguing)

Morendil14 Mar 2010 4:42 UTC

66 points

31 comments3 min readLW link

No comments.

Aca­demic Papers

Academic Papers