I see. So the agent issue I address above is a sub-issue of overall inner alignment.
In particular, I was the addressing deceptively aligned mesa-optimizers, as discussed here: https://astralcodexten.substack.com/p/deceptively-aligned-mesa-optimizers
Thanks!
I think it’s right. Inner alignment is getting the mesa-optimizers (agents) aligned with the overall objective. Outer alignment ensures the AI understands an overall objective that humans want.