TurnTrout comments on General alignment properties

TurnTrout 15 Aug 2022 3:43 UTC
LW: 7 AF: 3
0
AF
Because LDAIXI doesn’t e.g. have the credit assignment mechanism which propagates reward into learned values. Hutter just called it “reward.” But that “reward function” is really just a utility function over observation histories, or the work tapes of the hypotheses, or whatever. Not the same as the mechanisms within people which make them have good general alignment properties.
(See also: the detached lever fallacy)