How do you think the “Greenpeace by default” AI might define either “harm” or “value”, and “life”?
It simply won’t. Harm, value, life, we never defined those; they are the commonly agreed upon labels which we apply to things for communication purposes, and it works on a limited set of things that already exist but does not define anything outside context of this limited set.
It would have maximization of some sort of complexity metric (perhaps while acting conservatively and penalizing actions it can’t undo to avoid self harm in the form of cornering oneself), which it first uses on itself to self improve for a while without even defining what self is. Consider evolution as example; it doesn’t really define fitness in the way that humans do. It doesn’t work like—okay we’ll maximize the fitness that is defined so and so, so there’s what we should do.
edit: that is to say, it doesn’t define ‘life’ or ‘harm’. It has a simple goal system involving some metrics, which incidentally prevents the self harm, and permits self improvement, in the sense that we would describe it this way like we would describe the shooting-at-short-part-of-visible-spectrum robot as blue-minimizing one (albeit that is not very good analogy as we define blue and minimization independently of the robot).
How do you think the “Greenpeace by default” AI might define either “harm” or “value”, and “life”?
It simply won’t. Harm, value, life, we never defined those; they are the commonly agreed upon labels which we apply to things for communication purposes, and it works on a limited set of things that already exist but does not define anything outside context of this limited set.
It would have maximization of some sort of complexity metric (perhaps while acting conservatively and penalizing actions it can’t undo to avoid self harm in the form of cornering oneself), which it first uses on itself to self improve for a while without even defining what self is. Consider evolution as example; it doesn’t really define fitness in the way that humans do. It doesn’t work like—okay we’ll maximize the fitness that is defined so and so, so there’s what we should do.
edit: that is to say, it doesn’t define ‘life’ or ‘harm’. It has a simple goal system involving some metrics, which incidentally prevents the self harm, and permits self improvement, in the sense that we would describe it this way like we would describe the shooting-at-short-part-of-visible-spectrum robot as blue-minimizing one (albeit that is not very good analogy as we define blue and minimization independently of the robot).