I’m not confident about any of the below, so please add cautions in the text as appropriate.
The orthogonality thesis is both stronger and weaker than we need. It suffices to point out that neither we nor Ben Goertzel know anything useful or relevant about what goals are compatible with very large amounts of optimizing power, and so we have no reason to suppose that superoptimization by itself points either towards or away from things we value. By creating an “orthogonality thesis” that we defend as part of our arguments, we make it sound like we have a separate burden of proof to meet, whereas in fact it’s the assertion that superoptimization tells us something about the goal system that needs defending.
The orthogonality thesis is non-controversial. Ben’s point is that what matters is not the question of what types of goals are theoretically compatible with superoptimization, but rather what types of goals we can expect to be associated with superoptimization in reality.
In reality AGI’s with superoptimization power will be created by human agencies (or their descendants) with goal systems subject to extremely narrow socio-economic filters.
The other tangential consideration is that AGI’s with superoptimization power and long planning horizons/zero time discount may have highly convergent instrumental values/goals which are equivalent in effect to terminal values/goals for agents with short planning horizons (such as humans). From a human perspective, we may observe all super-AGIs to appear to have strangely similar ethics/morality/goals, even though what we are really observing are convergent instrumental values and short term opening plans as their true goals concern the end of the universe and are essentially unknowable to us.
Right, but none of this answers what I was trying to say, which is that the burden of proof is definitely with whoever wants to assert that superintelligence tells us anything about goals. In the absence of a specific argument, “this agent is superintelligent” shouldn’t be taken as informative about its goals.
A superintelligent agent doesn’t just appear ex nihilio as a random sample out of the space of possible minds. Its existence requires a lengthy, complex technological development which implies the narrow socio-economic filter I mentioned above. Thus “this agent is superintelligent” is at least partially informative about the probability landscape over said agent’s goals: they are much more likely than not to be related to or derived from prior goals of the agent’s creators.
Right, and that’s one example of a specific argument. Another is the Gödelian and self-defeating examples in the main article. But neither of these do anything to prop up the Goertzel-style argument of “a superintelligence won’t tile the Universe with smiley faces, because that’s a stupid thing to do”.
Well, Goertzel’s argument is pretty much bulletproof-correct when it comes to learning algorithms like the ones he works at, where the goal is essentially set by training, alongside with human culture and human notion of stupid goal. I.e. the AI that reuses human culture as a foundation for superhuman intelligence.
Ultimately, orthogonality dissolves once you start being specific what intelligence we’re talking of—assume that it has speed of light lag and is not physically very small, and it dissolves, assume that it is learning algorithm that gets to adult human level by absorbing human culture, and it dissolves, etc etc. The orthogonality thesis is only correct in the sense that being entirely ignorant of the specifics of what the ‘intelligence’ is you can’t attribute any qualities to it, which is trivially correct.
While that specific Goertzel-style argument is not worth bothering with, the more supportable version of that line of argument is: based on the current socio-economic landscape of earth, we can infer something of the probability landscape over near future earth superintelligent agent goal systems, namely that they will be tightly clustered around regions in goal space that are both economically useful and achievable.
Two natural attractors in that goal space will be along the lines of profit maximizers or intentionally anthropocentric goal systems. The evidence for this distribution over goal space is already rather abundant if one simply surveys existing systems and research. Market evolutionary forces make profit maximization a central attractor, likewise socio-cultural forces pull us towards anthropocentric goal systems (and of course the two overlap). The brain reverse engineering and neuroscience heavy tract in the AGI field in particular should eventually lead to anthropocentric designs, although it’s worth mentioning that some AGI researches (ie opencog) are aiming for explicit anthropocentric goal systems without brain reverse engineering.
Isn’t that specific Goertzel-style argument the whole point of the Orthogonality Thesis? Even in its strongest form, the Thesis doesn’t do anything to address your second paragraph.
I’m not sure. I don’t think the specific quote of Goertzel is an accurate summary of his views, and the real key disagreements over safety concern this admittedly nebulous distribution of future AGI designs and goal systems.
Copying from a comment I already made cos no-one responded last time:
I’m not confident about any of the below, so please add cautions in the text as appropriate.
The orthogonality thesis is both stronger and weaker than we need. It suffices to point out that neither we nor Ben Goertzel know anything useful or relevant about what goals are compatible with very large amounts of optimizing power, and so we have no reason to suppose that superoptimization by itself points either towards or away from things we value. By creating an “orthogonality thesis” that we defend as part of our arguments, we make it sound like we have a separate burden of proof to meet, whereas in fact it’s the assertion that superoptimization tells us something about the goal system that needs defending.
The orthogonality thesis is non-controversial. Ben’s point is that what matters is not the question of what types of goals are theoretically compatible with superoptimization, but rather what types of goals we can expect to be associated with superoptimization in reality.
In reality AGI’s with superoptimization power will be created by human agencies (or their descendants) with goal systems subject to extremely narrow socio-economic filters.
The other tangential consideration is that AGI’s with superoptimization power and long planning horizons/zero time discount may have highly convergent instrumental values/goals which are equivalent in effect to terminal values/goals for agents with short planning horizons (such as humans). From a human perspective, we may observe all super-AGIs to appear to have strangely similar ethics/morality/goals, even though what we are really observing are convergent instrumental values and short term opening plans as their true goals concern the end of the universe and are essentially unknowable to us.
The orthogonality thesis is highly controversial—among philosophers.
Right, but none of this answers what I was trying to say, which is that the burden of proof is definitely with whoever wants to assert that superintelligence tells us anything about goals. In the absence of a specific argument, “this agent is superintelligent” shouldn’t be taken as informative about its goals.
A superintelligent agent doesn’t just appear ex nihilio as a random sample out of the space of possible minds. Its existence requires a lengthy, complex technological development which implies the narrow socio-economic filter I mentioned above. Thus “this agent is superintelligent” is at least partially informative about the probability landscape over said agent’s goals: they are much more likely than not to be related to or derived from prior goals of the agent’s creators.
Right, and that’s one example of a specific argument. Another is the Gödelian and self-defeating examples in the main article. But neither of these do anything to prop up the Goertzel-style argument of “a superintelligence won’t tile the Universe with smiley faces, because that’s a stupid thing to do”.
Well, Goertzel’s argument is pretty much bulletproof-correct when it comes to learning algorithms like the ones he works at, where the goal is essentially set by training, alongside with human culture and human notion of stupid goal. I.e. the AI that reuses human culture as a foundation for superhuman intelligence.
Ultimately, orthogonality dissolves once you start being specific what intelligence we’re talking of—assume that it has speed of light lag and is not physically very small, and it dissolves, assume that it is learning algorithm that gets to adult human level by absorbing human culture, and it dissolves, etc etc. The orthogonality thesis is only correct in the sense that being entirely ignorant of the specifics of what the ‘intelligence’ is you can’t attribute any qualities to it, which is trivially correct.
While that specific Goertzel-style argument is not worth bothering with, the more supportable version of that line of argument is: based on the current socio-economic landscape of earth, we can infer something of the probability landscape over near future earth superintelligent agent goal systems, namely that they will be tightly clustered around regions in goal space that are both economically useful and achievable.
Two natural attractors in that goal space will be along the lines of profit maximizers or intentionally anthropocentric goal systems. The evidence for this distribution over goal space is already rather abundant if one simply surveys existing systems and research. Market evolutionary forces make profit maximization a central attractor, likewise socio-cultural forces pull us towards anthropocentric goal systems (and of course the two overlap). The brain reverse engineering and neuroscience heavy tract in the AGI field in particular should eventually lead to anthropocentric designs, although it’s worth mentioning that some AGI researches (ie opencog) are aiming for explicit anthropocentric goal systems without brain reverse engineering.
Isn’t that specific Goertzel-style argument the whole point of the Orthogonality Thesis? Even in its strongest form, the Thesis doesn’t do anything to address your second paragraph.
I’m not sure. I don’t think the specific quote of Goertzel is an accurate summary of his views, and the real key disagreements over safety concern this admittedly nebulous distribution of future AGI designs and goal systems.