For corrigibility in particular, some good material that’s not discussed in “Embedded Agency” or the reading guide is Arbital’s Corrigibility and Problem of Fully Updated Deference articles.
I’d expect Jessica/Stuart/Scott/Abram/Sam/Tsvi to have a better sense of that than me. I didn’t spot any obvious signs that it’s no longer a good reference.
For corrigibility in particular, some good material that’s not discussed in “Embedded Agency” or the reading guide is Arbital’s Corrigibility and Problem of Fully Updated Deference articles.
Is Jessica Taylor’s A first look at the hard problem of corrigibility still a good reference or is it outdated?
I’d expect Jessica/Stuart/Scott/Abram/Sam/Tsvi to have a better sense of that than me. I didn’t spot any obvious signs that it’s no longer a good reference.