AI Alignment Writing Day 2018

13 Aug 2019 22:24 UTC

On 10th July 2018, all attendees of the MIRI Summer Fellows Program were given an entire day to write blogposts to the AI Alignment Forum with ideas they’d been thinking about. These are the 28 posts that resulted, in chronological order.

Choosing to Choose?

Daniel Herrmann10 Jul 2018 20:15 UTC

10 points

7 comments5 min readLW link

The Intentional Agency Experiment

Alexander Gietelink Oldenziel10 Jul 2018 20:32 UTC

13 points

5 comments3 min readLW link

Two agents can have the same source code and optimise different utility functions

Joar Skalse10 Jul 2018 21:51 UTC

11 points

11 comments1 min readLW link

Conditioning, Counterfactuals, Exploration, and Gears

Diffractor10 Jul 2018 22:11 UTC

28 points

1 comment5 min readLW link

Probability is fake, frequency is real

Linda Linsefors10 Jul 2018 22:32 UTC

12 points

7 comments1 min readLW link

Repeated (and improved) Sleeping Beauty problem

Linda Linsefors10 Jul 2018 22:32 UTC

12 points

5 comments2 min readLW link

Logical Uncertainty and Functional Decision Theory

swordsintoploughshares10 Jul 2018 23:08 UTC

15 points

4 comments2 min readLW link

A framework for thinking about wireheading

theotherotheralex10 Jul 2018 23:14 UTC

15 points

4 comments1 min readLW link

Bayesian Probability is for things that are Space-like Separated from You

Scott Garrabrant10 Jul 2018 23:47 UTC

87 points

22 comments2 min readLW link

A universal score for optimizers

levin10 Jul 2018 23:52 UTC

15 points

8 comments3 min readLW link

An environment for studying counterfactuals

Nisan11 Jul 2018 0:14 UTC

15 points

6 comments3 min readLW link

Mechanistic Transparency for Machine Learning

DanielFilan11 Jul 2018 0:34 UTC

55 points

9 comments4 min readLW link

Bounding Goodhart’s Law

eric_langlois11 Jul 2018 0:46 UTC

43 points

2 comments5 min readLW link

A comment on the IDA-AlphaGoZero metaphor; capabilities versus alignment

AlexMennen11 Jul 2018 1:03 UTC

40 points

1 comment1 min readLW link

Dependent Type Theory and Zero-Shot Reasoning

evhub11 Jul 2018 1:16 UTC

27 points

3 comments5 min readLW link

Conceptual problems with utility functions

Dacyn11 Jul 2018 1:29 UTC

22 points

12 comments2 min readLW link

No, I won’t go there, it feels like you’re trying to Pascal-mug me

Rupert11 Jul 2018 1:37 UTC

9 points

0 comments2 min readLW link

Conditions under which misaligned subagents can (not) arise in classifiers

anon111 Jul 2018 1:52 UTC

12 points

2 comments2 min readLW link

Complete Class: Consequentialist Foundations

abramdemski11 Jul 2018 1:57 UTC

58 points

37 comments13 min readLW link

Clarifying Consequentialists in the Solomonoff Prior

Vlad Mikulik11 Jul 2018 2:35 UTC

20 points

16 comments6 min readLW link

On the Role of Counterfactuals in Learning

Max Kanwal11 Jul 2018 2:45 UTC

11 points

2 comments3 min readLW link

Agents That Learn From Human Behavior Can’t Learn Human Values That Humans Haven’t Learned Yet

steven046111 Jul 2018 2:59 UTC

29 points

11 comments1 min readLW link

Decision-theoretic problems and Theories; An (Incomplete) comparative list

somervta11 Jul 2018 2:59 UTC

36 points

0 comments1 min readLW link

(docs.google.com)

Mathematical Mindset

komponisto11 Jul 2018 3:03 UTC

55 points

5 comments2 min readLW link

Monk Treehouse: some problems defining simulation

dranorter11 Jul 2018 7:35 UTC

6 points

1 comment5 min readLW link

An Agent is a Worldline in Tegmark V

komponisto12 Jul 2018 5:12 UTC

24 points

12 comments2 min readLW link

Generalized Kelly betting

Linda Linsefors19 Jul 2018 1:38 UTC

15 points

5 comments2 min readLW link

Conceptual problems with utility functions, second attempt at explaining

Dacyn21 Jul 2018 2:08 UTC

16 points

5 comments2 min readLW link