johnswentworth comments on The Pointers Problem: Human Values Are A Function Of Humans’ Latent Variables

johnswentworth 2 Mar 2021 17:42 UTC
LW: 2 AF: 2
0
AF
Yeah, I wouldn’t even include reward tampering. Outer alignment, as I think about it, is mostly the pointer problem, and the (values) pointer problem is a subset of outer alignment. (Though e.g. Evan would define it differently.)