So, for example, this claims that either intent alignment + objective robustness or outer alignment + robustness would be sufficient for impact alignment.
Shouldn’t this be “intent alignment + capability robustness or outer alignment + robustness”?
Btw, I plan to post more detailed comments in response here and to your other post, just wanted to note this so hopefully there’s no confusion in interpreting your diagram.
Shouldn’t this be “intent alignment + capability robustness or outer alignment + robustness”?
Btw, I plan to post more detailed comments in response here and to your other post, just wanted to note this so hopefully there’s no confusion in interpreting your diagram.
Yep, fixed.