This topic came up when working on a project where I try to make a set of minimal assumptions such that I know how to construct an aligned system under these assumptions. After knowing how to construct an aligned system under this set of assumptions, I then attempt to remove an assumption and adjust the system such that it is still aligned. I am trying to remove the cartesian assumption right now.
I would encourage you to consider looking at Reflective Oracles next, to describe a computationally unbounded agent which is capable of thinking about worlds which are as computationally unbounded as itself; and a next logical step after that would be to look at logical induction or infrabayesianism, to think about agents which are smaller than what they reason about.
I would encourage you to consider looking at Reflective Oracles next, to describe a computationally unbounded agent which is capable of thinking about worlds which are as computationally unbounded as itself; and a next logical step after that would be to look at logical induction or infrabayesianism, to think about agents which are smaller than what they reason about.