Richard_Ngo comments on Richard Ngo’s Shortform

Richard_Ngo 1 May 2024 20:01 UTC
LW: 6 AF: 5
2
AF
You can think of this as a way of getting around the problem of fully updated deference, because the AI is choosing a policy based on what that policy would have done in the full range of hypothetical situations, and so it never updates away from considering any given goal. The cost, of course, is that we don’t know how to actually pin down these hypotheticals.