TurnTrout comments on A Certain Formalization of Corrigibility Is VNM-Incoherent

TurnTrout 20 Nov 2021 1:26 UTC
LW: 23 AF: 13
0
AF
I worded the title conservatively because I only showed that corrigibility is never nontrivially VNM-coherent in this particular MDP. Maybe there’s a more general case to be proven for all MDPs, and using more realistic (non-single-timestep) reward aggregation schemes.