Vladimir_Nesov comments on Matthew Barnett’s Shortform

Vladimir_Nesov 21 Apr 2023 17:13 UTC
LW: 13 AF: 8
AF
Complexity of value says that the space of system’s possible values is large, compared to what you want to hit, so to hit it you must aim correctly, there is no hope of winning the lottery otherwise. Thus any approach that doesn’t aim the values of the system correctly will fail at alignment. System’s understanding of some goal is not relevant to this, unless a design for correctly aiming system’s values makes use of it.

Ambitious alignment aims at human values. Prosaic alignment aims at human wishes, as currently intended. Pivotal alignment aims at a particular bounded technical task. As we move from ambitious to prosaic to pivotal alignment, minimality principle gets a bit more to work with, making the system more specific in the kinds of cognition it needs to work and thus less dangerous given lack of comprehensive understanding of what aligning a superintelligence entails.