CuriouslyNuclear comments on Why Corrigibility is Hard and Important (i.e. “Whence the high MIRI confidence in alignment difficulty?”)

CuriouslyNuclear 4 Oct 2025 19:36 UTC
1 point
0
Explain a non-VNM-rational architecture which is very intelligent, but has goals that are toggleable with a button in a way that is immune to the failures discussed in the article (as well as the related failures).
- Thomas Kwa 5 Oct 2025 0:10 UTC
  2 points
  0
  Parent
  EJT’s incomplete preferences proposal. But as far as I’m able to make out from the comments, you need to define a decision rule in addition to the utility function of an agent with incomplete preferences, and only some of those ways are compatible with shutdownability.