I think it’s right. Inner alignment is getting the mesa-optimizers (agents) aligned with the overall objective. Outer alignment ensures the AI understands an overall objective that humans want.
Crissman
Karma: 10
I see. So the agent issue I address above is a sub-issue of overall inner alignment.
In particular, I was the addressing deceptively aligned mesa-optimizers, as discussed here: https://astralcodexten.substack.com/p/deceptively-aligned-mesa-optimizers
Thanks!
Health and longevity blogger from Unaging.com here. I’ve submitted talks on optimal diet, optimal exercise, how to run sub 3:30 for your first marathon, and sugar is fine—fight me!
Looking forward to extended, rational health discussions!
This started out as an interesting concrete article, but then it got too meta, and I stopped reading. 🤷♂️