Lauro Langosco comments on Evaluating the historical value misspecification argument

Lauro Langosco 6 Oct 2023 16:20 UTC
LW: 5 AF: 3
2
AF
Do you have an example of one way that the full alignment problem is easier now that we’ve seen that GPT-4 can understand & report on human values?

(I’m asking because it’s hard for me to tell if your definition of outer alignment is disconnected from the rest of the problem in a way where it’s possible for outer alignment to become easier without the rest of the problem becoming easier).