Ben Pace comments on List of resolved confusions about IDA

Ben Pace 9 Oct 2019 0:14 UTC
LW: 2 AF: 1
0
AF
I understand Paul to be saying that he hopes that corrigibility will fall out if we train an AI to score well on your short-term preferences, not just your narrow-preferences.