ESRogs comments on An overview of 11 proposals for building safe advanced AI

ESRogs 1 Jun 2020 5:33 UTC
LW: 15 AF: 6
AF
I see. So, restating in my own terms—outer alignment is in fact about whether getting what you asked for is good, but for the case of prediction, the malign universal prior argument says that “perfect” prediction is actually malign. So this would be a case of getting what you wanted / asked for / optimized for, but that not being good. So it is an outer alignment failure.

Whereas an inner alignment failure would necessarily involve not hitting optimal performance at your objective. (Otherwise it would be an inner alignment success, and an outer alignment failure.)

Is that about right?
- evhub 1 Jun 2020 19:03 UTC
  LW: 6 AF: 3
  AF Parent
  Yep—at least that’s how I’m generally thinking about it in this post.
  - ESRogs 1 Jun 2020 19:08 UTC
    LW: 2 AF: 1
    AF Parent
    Got it, thank you!