My reading is that you endorse the former type of corrigibility as well, not the latter?
Yes. I also had the “grab-bag of tricks” impression from MIRI’s previous work on the topic, since it was mostly just trying various hacks, and that was also part of why I mostly ignored it. The notion that there’s a True Name to be found here, we’re not just trying hacks, is a big part of why I now have hope for corrigibility.
Yes. I also had the “grab-bag of tricks” impression from MIRI’s previous work on the topic, since it was mostly just trying various hacks, and that was also part of why I mostly ignored it. The notion that there’s a True Name to be found here, we’re not just trying hacks, is a big part of why I now have hope for corrigibility.