CuriouslyNuclear comments on Why Corrigibility is Hard and Important (i.e. “Whence the high MIRI confidence in alignment difficulty?”)

CuriouslyNuclear 4 Oct 2025 19:34 UTC
1 point
0
But we learned something from the exercise. We learned not just about the problem itself, but also about how hard it was to get outside grantmakers or journal editors to be able to understand what the problem was.
True and unfortunately extends beyond just grantmakers and journal editors.
This part was interesting:
“Ah,” says the computer scientist. “Well, in that case, how about if [some other clever idea]?”
Well, you see, that clever idea is isomorphic to the AI believing that it’s impossible for the button to ever be pressed, which incentivizes it to terrify the user whenever it gets a setback, so as to correlate setbacks with button-presses, which (relative to its injured belief system) causes it to think the setbacks can’t happen.
I would love to see Yudkowsky’s or Soares’s thoughts on Wentworth’s Shutdown Problem Proposal, which seems to avoid the problems discussed here. At first glance it appears to fall under the above failure mode. But since it uses do operations instead of conditional probability, Wentworth argues it doesn’t have this problem.