Paul, I think you’re headed in a good direction here.
On the subject of approval-directed behavior:
One broad reason people and governments disapprove of behaviors is that they break the law or violate ethical norms that supplement laws. A lot of AGI disaster seems to incorporate some law-breaking pretty early on.
Putting aside an advanced AI that can start working on changing the law, shouldn’t one thing (but not the only thing) an approval-directed AI do is constantly check whether its actions are legal before doing them?
The law by itself is not a complete set of norms of acceptable behavior, and violating the law may be acceptable in exceptional circumstances.
However, why can’t we start there?