Jacob_Hilton comments on How much alignment data will we need in the long run?

Jacob_Hilton 12 Aug 2022 21:19 UTC
LW: 1 AF: 1
0
AF
This is just supposed to be an (admittedly informal) restatement of the definition of outer alignment in the context of an objective function where the data distribution plays a central role.
For example, assuming a reinforcement learning objective function, outer alignment is equivalent to the statement that there is an aligned policy that gets higher average reward on the training distribution than any unaligned policy.
I did not intend to diminish the importance of robustness by focusing on outer alignment in this post.