Question for people who think about corrigibility: Consider a corrigible agent with the goal of coloring a wall red. It is considering two kinds of paint. One paint is a brighter, more red red while the other is a duller red. However, the duller red paint is easier to scrape off to repaint the wall in a different color, while the brighter red is harder to remove or paint over. What is the right choice of paint for the corrigible agent to use? How should a corrigible agent make this decision? What additional information if any does the agent need to decide?
Question for people who think about corrigibility: Consider a corrigible agent with the goal of coloring a wall red. It is considering two kinds of paint. One paint is a brighter, more red red while the other is a duller red. However, the duller red paint is easier to scrape off to repaint the wall in a different color, while the brighter red is harder to remove or paint over. What is the right choice of paint for the corrigible agent to use? How should a corrigible agent make this decision? What additional information if any does the agent need to decide?