I think we basically agree. I would also prefer people to think more about the middle case. Indeed, when I use the term mesa-optimiser, I usually intend to talk about the middle picture, though strictly that’s sinful as the term is tied to Optimisers.
Re: inner alignment
I think it’s basically the right term. I guess in my mind I want to say something like, “Inner Alignment is the problem of aligning objectives across the Mesa≠Base gap”, which shows how the two have slightly different shapes. But the difference isn’t really important.
Inner alignment gap? Inner objective gap?