AI Alignment Writing Day 2018

On 10th July 2018, all attendees of the MIRI Summer Fellows Program were given an entire day to write blogposts to the AI Alignment Forum with ideas they’d been thinking about. These are the 28 posts that resulted, in chronological order.

Choos­ing to Choose?

The In­ten­tional Agency Experiment

Two agents can have the same source code and op­ti­mise differ­ent util­ity functions

Con­di­tion­ing, Coun­ter­fac­tu­als, Ex­plo­ra­tion, and Gears

Prob­a­bil­ity is fake, fre­quency is real

Re­peated (and im­proved) Sleep­ing Beauty problem

Log­i­cal Uncer­tainty and Func­tional De­ci­sion Theory

A frame­work for think­ing about wireheading

Bayesian Prob­a­bil­ity is for things that are Space-like Separated from You

A uni­ver­sal score for optimizers

An en­vi­ron­ment for study­ing counterfactuals

Mechanis­tic Trans­parency for Ma­chine Learning

Bound­ing Good­hart’s Law

A com­ment on the IDA-AlphaGoZero metaphor; ca­pa­bil­ities ver­sus alignment

Depen­dent Type The­ory and Zero-Shot Reasoning

Con­cep­tual prob­lems with util­ity functions

No, I won’t go there, it feels like you’re try­ing to Pas­cal-mug me

Con­di­tions un­der which mis­al­igned sub­agents can (not) arise in classifiers

Com­plete Class: Con­se­quen­tial­ist Foundations

Clar­ify­ing Con­se­quen­tial­ists in the Solomonoff Prior

On the Role of Coun­ter­fac­tu­als in Learning

Agents That Learn From Hu­man Be­hav­ior Can’t Learn Hu­man Values That Hu­mans Haven’t Learned Yet

De­ci­sion-the­o­retic prob­lems and The­o­ries; An (In­com­plete) com­par­a­tive list

Math­e­mat­i­cal Mindset

Monk Tree­house: some prob­lems defin­ing simulation

An Agent is a Wor­ldline in Teg­mark V

Gen­er­al­ized Kelly betting

Con­cep­tual prob­lems with util­ity func­tions, sec­ond at­tempt at ex­plain­ing