AI Alignment Writing Day 2019

1 Oct 2019 0:06 UTC

On August 22nd 2019, all attendees of the MIRI Summer Fellows Program were given an entire day to write blogposts to the AI Alignment Forum with ideas they’d been thinking about. These are the posts that resulted, in chronological order.

Announcement: Writing Day Today (Thursday)

Ben Pace22 Aug 2019 4:48 UTC

29 points

5 comments1 min readLW link

Markets are Universal for Logical Induction

johnswentworth22 Aug 2019 6:44 UTC

75 points

2 comments5 min readLW link

Computational Model: Causal Diagrams with Symmetry

johnswentworth22 Aug 2019 17:54 UTC

53 points

29 comments4 min readLW link

Intentional Bucket Errors

Scott Garrabrant22 Aug 2019 20:02 UTC

55 points

6 comments3 min readLW link

Embedded Naive Bayes

johnswentworth22 Aug 2019 21:40 UTC

14 points

6 comments3 min readLW link

Time Travel, AI and Transparent Newcomb

johnswentworth22 Aug 2019 22:04 UTC

10 points

7 comments1 min readLW link

Logical Counterfactuals and Proposition graphs, Part 1

Donald Hobson22 Aug 2019 22:06 UTC

20 points

0 comments2 min readLW link

[Question] Why so much variance in human intelligence?

Ben Pace22 Aug 2019 22:36 UTC

65 points

28 comments4 min readLW link

Towards a mechanistic understanding of corrigibility

evhub22 Aug 2019 23:20 UTC

47 points

26 comments6 min readLW link

Logical Optimizers

Donald Hobson22 Aug 2019 23:54 UTC

11 points

4 comments3 min readLW link

Deconfuse Yourself about Agency

VojtaKovarik23 Aug 2019 0:21 UTC

15 points

9 comments5 min readLW link

Thoughts from a Two Boxer

jaek23 Aug 2019 0:24 UTC

18 points

11 comments5 min readLW link

Analysis of a Secret Hitler Scenario

jaek23 Aug 2019 1:24 UTC

16 points

6 comments4 min readLW link

The Commitment Races problem

Daniel Kokotajlo23 Aug 2019 1:58 UTC

152 points

56 comments5 min readLW link

[Question] Does Agent-like Behavior Imply Agent-like Architecture?

Scott Garrabrant23 Aug 2019 2:01 UTC

58 points

8 comments1 min readLW link

Redefining Fast Takeoff

VojtaKovarik23 Aug 2019 2:15 UTC

10 points

1 comment1 min readLW link

Vaniver’s View on Factored Cognition

Vaniver23 Aug 2019 2:54 UTC

48 points

4 comments8 min readLW link

Tabooing ‘Agent’ for Prosaic Alignment

Hjalmar_Wijk23 Aug 2019 2:55 UTC

57 points

10 comments6 min readLW link

Creating Environments to Design and Test Embedded Agents

lukehmiles23 Aug 2019 3:17 UTC

13 points

5 comments8 min readLW link

Formalising decision theory is hard

Lukas Finnveden23 Aug 2019 3:27 UTC

17 points

19 comments2 min readLW link

Vague Thoughts and Questions about Agent Structures

loriphos23 Aug 2019 4:01 UTC

9 points

3 comments2 min readLW link

Towards an Intentional Research Agenda

romeostevensit23 Aug 2019 5:27 UTC

20 points

8 comments3 min readLW link

Metalignment: Deconfusing metaethics for AI alignment.

Guillaume Corlouer23 Aug 2019 10:25 UTC

13 points

7 comments3 min readLW link

Torture and Dust Specks and Joy—Oh my! or: Non-Archimedean Utility Functions as Pseudograded Vector Spaces

Louis_Brown23 Aug 2019 11:11 UTC

19 points

29 comments8 min readLW link

Soft takeoff can still lead to decisive strategic advantage

Daniel Kokotajlo23 Aug 2019 16:39 UTC

122 points

47 comments8 min readLW link 4 reviews

Algorithmic Similarity

LukasM23 Aug 2019 16:39 UTC

28 points

10 comments11 min readLW link

Thoughts on Retrieving Knowledge from Neural Networks

Jaime Ruiz23 Aug 2019 16:41 UTC

11 points

2 comments5 min readLW link

Parables of Constraint and Actualization

Spencer Wyman23 Aug 2019 16:56 UTC

13 points

0 comments6 min readLW link

When do utility functions constrain?

Hoagy23 Aug 2019 17:19 UTC

29 points

7 comments7 min readLW link

Actually updating

SaraHax23 Aug 2019 17:46 UTC

56 points

10 comments4 min readLW link

Understanding understanding

mthq23 Aug 2019 18:10 UTC

24 points

1 comment2 min readLW link

Troll Bridge

abramdemski23 Aug 2019 18:36 UTC

85 points

59 comments12 min readLW link

Optimization Provenance

Adele Lopez23 Aug 2019 20:08 UTC

38 points

5 comments5 min readLW link