If I were a well-intentioned AI...

I look at how some of the major problems in AI alignment—Goodhart problems, distributional shift, mesaoptimising, etc.. - look from the perspective of a well-intentioned but ignorant AI. And if this perspective can suggest methods of safety improvements.

If I were a well-in­ten­tioned AI… I: Image classifier

If I were a well-in­ten­tioned AI… II: Act­ing in a world

If I were a well-in­ten­tioned AI… III: Ex­tremal Goodhart

If I were a well-in­ten­tioned AI… IV: Mesa-optimising