evhub comments on Outer alignment and imitative amplification

evhub 10 Jan 2020 5:29 UTC
LW: 8 AF: 7
AF
I think I’m quite happy even if the optimal model is just trying to do what we want. With imitative amplification, the true optimum—HCH—still has benign failures, but I nevertheless want to argue that it’s aligned. In fact, I think this post really only makes sense if you adopt a definition of alignment that excludes benign failures, since otherwise you can’t really consider HCH aligned (and thus can’t consider imitative amplification outer aligned at optimum).