Thoughts on Corrigibility

My writings on different kinds of corrigibility. These thoughts build on each other and form part of my alignment worldview, but they are not yet woven into a coherent narrative.

Non-Ob­struc­tion: A Sim­ple Con­cept Mo­ti­vat­ing Corrigibility

Cor­rigi­bil­ity as out­side view

A Cer­tain For­mal­iza­tion of Cor­rigi­bil­ity Is VNM-Incoherent

For­mal­iz­ing Policy-Mod­ifi­ca­tion Corrigibility