The main difference between LDAIXI and a human in terms of ontology seems to be that the things the human values are ultimately grounded in senses and a reward tied to that. For example, we value sweet things because we have a detector for sweetness and a reward tied to that. When our understanding of what sugar is changes the detector doesn’t, and thus the ontology change works out fine. But I don’t see a reason you couldn’t set up LDAIXI the same way: Just specify the reward in terms of a diamond detector—or multiple ones. In the end, there are already detectors that AIXI uses—how else would it get input?
Because LDAIXI doesn’t e.g. have the credit assignment mechanism which propagates reward into learned values. Hutter just called it “reward.” But that “reward function” is really just a utility function over observation histories, or the work tapes of the hypotheses, or whatever. Not the same as the mechanisms within people which make them have good general alignment properties.
The main difference between LDAIXI and a human in terms of ontology seems to be that the things the human values are ultimately grounded in senses and a reward tied to that. For example, we value sweet things because we have a detector for sweetness and a reward tied to that. When our understanding of what sugar is changes the detector doesn’t, and thus the ontology change works out fine. But I don’t see a reason you couldn’t set up LDAIXI the same way: Just specify the reward in terms of a diamond detector—or multiple ones. In the end, there are already detectors that AIXI uses—how else would it get input?
Because LDAIXI doesn’t e.g. have the credit assignment mechanism which propagates reward into learned values. Hutter just called it “reward.” But that “reward function” is really just a utility function over observation histories, or the work tapes of the hypotheses, or whatever. Not the same as the mechanisms within people which make them have good general alignment properties.
(See also: the detached lever fallacy)