Because LDAIXI doesn’t e.g. have the credit assignment mechanism which propagates reward into learned values. Hutter just called it “reward.” But that “reward function” is really just a utility function over observation histories, or the work tapes of the hypotheses, or whatever. Not the same as the mechanisms within people which make them have good general alignment properties.
Because LDAIXI doesn’t e.g. have the credit assignment mechanism which propagates reward into learned values. Hutter just called it “reward.” But that “reward function” is really just a utility function over observation histories, or the work tapes of the hypotheses, or whatever. Not the same as the mechanisms within people which make them have good general alignment properties.
(See also: the detached lever fallacy)