Richard_Ngo comments on The Pointers Problem: Human Values Are A Function Of Humans’ Latent Variables

Richard_Ngo 2 Mar 2021 9:49 UTC
LW: 2 AF: 1
AF
Thanks for the reply. To check that I understand your position, would you agree that solving outer alignment plus solving reward tampering would solve the pointers problem in the context of machine learning?
Broadly speaking, I think our disagreement here is closely related to one we’ve discussed before, about how much sense it makes to talk about outer alignment in isolation (and also about your definition of inner alignment), so I probably won’t pursue this further.
- johnswentworth 2 Mar 2021 17:42 UTC
  LW: 2 AF: 2
  AF Parent
  Yeah, I wouldn’t even include reward tampering. Outer alignment, as I think about it, is mostly the pointer problem, and the (values) pointer problem is a subset of outer alignment. (Though e.g. Evan would define it differently.)