Rohin Shah comments on Thoughts on Iason Gabriel’s Artificial Intelligence, Values, and Alignment

Rohin Shah 14 Jan 2021 17:37 UTC
LW: 8 AF: 6
0
AF
Huh. I see a lot of the work I’m excited about as trying to avoid the Agency Hand-Off paradigm, and the Value Learning sequence was all about why we might hope to be able to avoid the Agency Hand-Off paradigm.
Your definition of corrigibility is the MIRI version. If you instead go by Paul’s post, the introduction goes:
I would like to build AI systems which help me:
- Figure out whether I built the right AI and correct any mistakes I made
- Remain informed about the AI’s behavior and avoid unpleasant surprises
- Make better decisions and clarify my preferences
- Acquire resources and remain in effective control of them
- Ensure that my AI systems continue to do all of these nice things
- …and so on
This seems pretty clearly outside of the Agency Hand-Off paradigm to me (that’s the way I interpret it at least). Similarly for e.g. approval-directed agents.
I do agree that assistance games look like they’re within the Agency Hand-Off paradigm, especially the way Stuart talks about them; this is one of the main reasons I’m more excited about corrigibility.
What links here?
- evhub's comment on Thoughts on Iason Gabriel’s Artificial Intelligence, Values, and Alignment by Alex Flint (14 Jan 2021 21:47 UTC; 11 points)