Yes, unless the state reward function is constant and we only demand weak corrigibility to all policies.
Given that this is the main result, I feel like the title “Corrigibility Can Be VNM-Incoherent” is rather dramatically understating the case. Maybe something like “Corrigibility Is Never Nontrivially VNM-Coherent In MDPs” would be closer. Or maybe just drop the hedging and say “Corrigibility Is Never VNM-Coherent In MDPs”, since the constant-utility case is never interesting anyway.
I worded the title conservatively because I only showed that corrigibility is never nontrivially VNM-coherent in this particular MDP.Maybe there’s a more general case to be proven for all MDPs, and using more realistic (non-single-timestep) reward aggregation schemes.
Given that this is the main result, I feel like the title “Corrigibility Can Be VNM-Incoherent” is rather dramatically understating the case. Maybe something like “Corrigibility Is Never Nontrivially VNM-Coherent In MDPs” would be closer. Or maybe just drop the hedging and say “Corrigibility Is Never VNM-Coherent In MDPs”, since the constant-utility case is never interesting anyway.
I worded the title conservatively because I only showed that corrigibility is never nontrivially VNM-coherent in this particular MDP. Maybe there’s a more general case to be proven for all MDPs, and using more realistic (non-single-timestep) reward aggregation schemes.