Nathan Helm-Burger comments on CIRL Corrigibility is Fragile

Nathan Helm-Burger 24 Dec 2023 4:47 UTC
5 points
0
I’ve been having discussions with a friend about his ideas of trying to get a near-corrigible agent to land in a ‘corrigibility basin’. The idea is that you could make an agent close enough to corrigible that it will be willing to self-edit to bring itself more in line with a more corrigible version of itself upon receiving critical feedback from a supervisor or the environment about the imperfection of its corrigibility.
I would like to see some toy-problem research focused on the corrigibility sub-problem of correctional-self-editing rather than on the sub-problem of ‘the shutdown problem’.
- Rachel Freedman 24 Dec 2023 17:56 UTC
  3 points
  2
  Parent
  I’d be interested to see this as well!