johnswentworth comments on A Certain Formalization of Corrigibility Is VNM-Incoherent

johnswentworth 20 Nov 2021 1:22 UTC
LW: 14 AF: 9
0
AF
Does broad corrigibility imply VNM-incoherence?
Yes, unless the state reward function is constant and we only demand weak corrigibility to all policies.
Given that this is the main result, I feel like the title “Corrigibility Can Be VNM-Incoherent” is rather dramatically understating the case. Maybe something like “Corrigibility Is Never Nontrivially VNM-Coherent In MDPs” would be closer. Or maybe just drop the hedging and say “Corrigibility Is Never VNM-Coherent In MDPs”, since the constant-utility case is never interesting anyway.
- TurnTrout 20 Nov 2021 1:26 UTC
  LW: 23 AF: 13
  0
  AF Parent
  I worded the title conservatively because I only showed that corrigibility is never nontrivially VNM-coherent in this particular MDP. Maybe there’s a more general case to be proven for all MDPs, and using more realistic (non-single-timestep) reward aggregation schemes.