Yeah, I wouldn’t even include reward tampering. Outer alignment, as I think about it, is mostly the pointer problem, and the (values) pointer problem is a subset of outer alignment. (Though e.g. Evan would define it differently.)
Yeah, I wouldn’t even include reward tampering. Outer alignment, as I think about it, is mostly the pointer problem, and the (values) pointer problem is a subset of outer alignment. (Though e.g. Evan would define it differently.)