I share the sense that this article has many of the common shortcomings with other MIRI output and feel like maybe I ought to try a lot harder to communicate these issues, BUT I really don’t think VNM rationality is the culprit here. I’ve not seen a compelling case that an otherwise capable model would be aligned or corrigible but for its taste for getting money pumped (I had a chat with Elliot T on twitter recently where he actually had a proposal along these lines … but I didn’t buy it).
I really think it’s reasoning errors in how VNM and other “goal-directedness” premises are employed, and not VNM itself, that is problematic.
To be honest I stand with Barbie: “Reliable probabilistic reasoning is hard”