I see. So, restating in my own terms—outer alignment is in fact about whether getting what you asked for is good, but for the case of prediction, the malign universal prior argument says that “perfect” prediction is actually malign. So this would be a case of getting what you wanted / asked for / optimized for, but that not being good. So it is an outer alignment failure.
Whereas an inner alignment failure would necessarily involve not hitting optimal performance at your objective. (Otherwise it would be an inner alignment success, and an outer alignment failure.)
I see. So, restating in my own terms—outer alignment is in fact about whether getting what you asked for is good, but for the case of prediction, the malign universal prior argument says that “perfect” prediction is actually malign. So this would be a case of getting what you wanted / asked for / optimized for, but that not being good. So it is an outer alignment failure.
Whereas an inner alignment failure would necessarily involve not hitting optimal performance at your objective. (Otherwise it would be an inner alignment success, and an outer alignment failure.)
Is that about right?
Yep—at least that’s how I’m generally thinking about it in this post.
Got it, thank you!