What is corrigibility? / What are the right background readings on it?

[Question] What is corrigibility? / What are the right background readings on it?

Ryan Carey asks in a related question “What are some good examples of incorrigibility?” He provides the following overview:

The idea of corrigibility is roughly that an AI should be aware that it may have faults, and therefore allow and facilitate human operators to correct these faults. I’m especially interested in scenarios where the AI system controls a particular input channel that is supposed to be used to control it, such as a shutdown button, a switch used to alter its mode of operation, or another device used to control its motivation.

What’s a more detailed understanding? What are the right things to read? I believe there’s at least one MIRI paper, some Arbital posts. Writing to this question to center my inquiry.

Ruby2 May 2019 20:43 UTC

6 points

2 comments1 min readLW link

Ruby 2 May 2019 21:30 UTC
2 points
The Aribital entry is a very comprehensive and clear introduction.
https://arbital.com/p/corrigibility/
- Ruby 2 May 2019 22:24 UTC
  2 points
  Corrigibility paper (original?) https://intelligence.org/files/Corrigibility.pdf

No comments.

[Question] What is corrigibility? /​ What are the right background readings on it?

[Question] What is corrigibility? / What are the right background readings on it?