First of all, all of these methods involve integrating the AGI in human society. So the AGI is forming its values, at least in part, through doing something (possibly talking) and getting a response from some human. That human will be interpreting the AGI’s answers, and selecting the right response, using their own theory of the AGI’s mind—nearly certainly an anthopomorphisation! Even if that human develops experience dealing with the AGI, their understanding will be limited (as our understanding of other humans is limited, except worse than that).
So the AGI programmer is taking a problem that they can’t solve through direct coding, and putting the AGI through interactions so that it will acquire the values that the programmer can’t specify directly, in settings where the other interactors will be prone to anthropomorphisation.
ie: “I can’t solve this problem formally, but I do understand it’s structure enough to be reasonably sure that anthropomorphic interactions will solve it”.
If that’s the claim, I would expect the programmer to be very schooled in the properties and perils of anthropomorphisation, and to cast their arguments, as much as possible, in formal logic or code form. For instance, if we want the AGI to “love” us: what kind of behaviour would we expect that this entailed, and why would this code acquire that behaviour from these interactions? If you couldn’t use the word love, or any close synonyms, could you still describe the process and show that it will perform well? If you can’t describe love without saying “love”, then you are counting on a shared non-formalised human understanding of what love is, and hoping that the AGI will stumble upon the same understanding—you don’t know the contours of the definition, and the potential pitfalls, but you’re counting on the AGI to avoid them.
Those four types of behaviours that I mentioned there, and that we need to separate—don’t just decry the use of anthropomorphisation in the description, but say which parts of the open cog system will be used to distinguish between them, and select the friendly behaviour rather than the others. You know how your system works—reassure me! :-)
Stuart—Yeah, the line of theoretical research you suggest is worthwhile.…
However, it’s worth noting that I and the other OpenCog team members are pressed for time, and have a lot of concrete OpenCog work to do. It would seem none of us really feels like taking a lot of time, at this stage, to carefully formalize arguments about what the system is likely to do in various situations once it’s finished. We’re too consumed with trying to finish the system, which is a long and difficult task in itself...
I will try to find some time in the near term to sketch a couple example arguments of the type you request… but it won’t be today...
As a very rough indication for the moment… note that OpenCog has explicit Goal Node objects in its AtomSpace knowledge store, and then one can look at the explicit probabilistic ImplicationLinks pointing to these GoalNodes from various combinations of contexts and actions. So one can actually look, in principle, at the probabilistic relations between (context, action) pairs and goals that OpenCog is using to choose actions.
Now, for a quite complex OpenCog system, it may be hard to understand what all these probabilistic relations mean. But for a young OpenCog doing simple things, it will be easier. So one would want to validate for a young OpenCog doing simple things, that the information in the system’s AtomSpace is compatible with 1 rather than 2-4.… One would then want to validate that, as the system gets more mature and does more complex things, there is not a trend toward more of 2-4 and less of 1 ….
Thanks for your answer, Ben!
First of all, all of these methods involve integrating the AGI in human society. So the AGI is forming its values, at least in part, through doing something (possibly talking) and getting a response from some human. That human will be interpreting the AGI’s answers, and selecting the right response, using their own theory of the AGI’s mind—nearly certainly an anthopomorphisation! Even if that human develops experience dealing with the AGI, their understanding will be limited (as our understanding of other humans is limited, except worse than that).
So the AGI programmer is taking a problem that they can’t solve through direct coding, and putting the AGI through interactions so that it will acquire the values that the programmer can’t specify directly, in settings where the other interactors will be prone to anthropomorphisation.
ie: “I can’t solve this problem formally, but I do understand it’s structure enough to be reasonably sure that anthropomorphic interactions will solve it”.
If that’s the claim, I would expect the programmer to be very schooled in the properties and perils of anthropomorphisation, and to cast their arguments, as much as possible, in formal logic or code form. For instance, if we want the AGI to “love” us: what kind of behaviour would we expect that this entailed, and why would this code acquire that behaviour from these interactions? If you couldn’t use the word love, or any close synonyms, could you still describe the process and show that it will perform well? If you can’t describe love without saying “love”, then you are counting on a shared non-formalised human understanding of what love is, and hoping that the AGI will stumble upon the same understanding—you don’t know the contours of the definition, and the potential pitfalls, but you’re counting on the AGI to avoid them.
Those four types of behaviours that I mentioned there, and that we need to separate—don’t just decry the use of anthropomorphisation in the description, but say which parts of the open cog system will be used to distinguish between them, and select the friendly behaviour rather than the others. You know how your system works—reassure me! :-)
Stuart—Yeah, the line of theoretical research you suggest is worthwhile.…
However, it’s worth noting that I and the other OpenCog team members are pressed for time, and have a lot of concrete OpenCog work to do. It would seem none of us really feels like taking a lot of time, at this stage, to carefully formalize arguments about what the system is likely to do in various situations once it’s finished. We’re too consumed with trying to finish the system, which is a long and difficult task in itself...
I will try to find some time in the near term to sketch a couple example arguments of the type you request… but it won’t be today...
As a very rough indication for the moment… note that OpenCog has explicit Goal Node objects in its AtomSpace knowledge store, and then one can look at the explicit probabilistic ImplicationLinks pointing to these GoalNodes from various combinations of contexts and actions. So one can actually look, in principle, at the probabilistic relations between (context, action) pairs and goals that OpenCog is using to choose actions.
Now, for a quite complex OpenCog system, it may be hard to understand what all these probabilistic relations mean. But for a young OpenCog doing simple things, it will be easier. So one would want to validate for a young OpenCog doing simple things, that the information in the system’s AtomSpace is compatible with 1 rather than 2-4.… One would then want to validate that, as the system gets more mature and does more complex things, there is not a trend toward more of 2-4 and less of 1 ….
Interesting line of thinking indeed! …