Explain a non-VNM-rational architecture which is very intelligent, but has goals that are toggleable with a button in a way that is immune to the failures discussed in the article (as well as the related failures).
EJT’s incomplete preferences proposal. But as far as I’m able to make out from the comments, you need to define a decision rule in addition to the utility function of an agent with incomplete preferences, and only some of those ways are compatible with shutdownability.
Explain a non-VNM-rational architecture which is very intelligent, but has goals that are toggleable with a button in a way that is immune to the failures discussed in the article (as well as the related failures).
EJT’s incomplete preferences proposal. But as far as I’m able to make out from the comments, you need to define a decision rule in addition to the utility function of an agent with incomplete preferences, and only some of those ways are compatible with shutdownability.