Suppose we grant all this. Very well, then consider what conclusions we can draw from it about the behavior of the hypothetical AI originally under discussion. Clearly no matter what sequence of actions the AI were to carry out, we would be able to explain it with this theory. But a theory that can explain any observations whatsoever, makes no predictions. Therefore, contrary to Omohundro, the theory of optimization does not make any predictions about the behavior of an AI in the absence of specific knowledge of the goals thereof.
Suppose we grant all this. Very well, then consider what conclusions we can draw from it about the behavior of the hypothetical AI originally under discussion. Clearly no matter what sequence of actions the AI were to carry out, we would be able to explain it with this theory. But a theory that can explain any observations whatsoever, makes no predictions. Therefore, contrary to Omohundro, the theory of optimization does not make any predictions about the behavior of an AI in the absence of specific knowledge of the goals thereof.
Omohundro is, I believe, basing his ideas on the Von Neumann Morgenstern expected utility framework—which is significantly more restrictive.
However, I think this is a red herring.
I wouldn’t trame the idea as: the theory of optimization allows predictions about the behavior of an AI in the absence of specific knowledge about its goals.
You would need to have some enumeration of the set of goal-directed systems before you can say anything useful about their properties. I propose: simplest first - so, it is more that a wide range of simple goals gives rise to a closely-related class of behaviours (Omohundro’s “drives”). These could be classed as being shared emergent properties of many goal-directed systems with simple goals.
It is more that a wide range of simple goals gives rise to a closely-related class of behaviours
But that is only true by a definition of ‘simple goals’ under which humans and other entities that actually exist do not have simple goals. You can have a theory that explains the behavior that occurs in the real world, or you can have a theory that admits Omohundro’s argument, but they are different theories and you can’t use both in the same argument.
You can have a theory that explains the behavior that occurs in the real world, or you can have a theory that admits Omohundro’s argument, but they are different theories and you can’t use both in the same argument.
Well yes. You give this list of things you claim are universal instrumental values, and it sounds like a plausible idea in our heads, but when we look at the real world, we find humans and other agents tend not in fact possess these, even as instrumental values.
Omohundro bases his argument on a chess playing computer—which does have a pretty simple goal. The first lines of the paper read:
Surely no harm could come from building a chess-playing robot, could it? In this paper we argue that such a robot will indeed be dangerous unless it is designed very carefully. Without special precautions, it will resist being turned off, will try to break into other machines and make copies of itself, and will try to acquire resources without regard for anyone else’s safety. These potentially harmful behaviors will occur not because they were programmed in at the start, but because of the intrinsic nature of goal driven systems.
I did talk about simple goals—but the real idea (which I also mentioned) was an enumeration of goal-directed systems in order of simplicity. Essentially, unless you have something like an enumeration on an infinite set you can’t say much about the properies of its members. For example, “half the integers are even” is a statement, the truth of which depends critically on how the integers are enumerated. So, I didn’t literally mean that the idea didn’t also apply to systems with complex values. “Simplicity” was my idea of shorthand for the enumeration idea.
I think the ideas also apply to real-world systems—such as humans. Complex values do allow more scope for overriding
Omohundro’s drives, but they still seem to show through. Another major force acting on real world systems is natural selection. The behavior we see is the result of a combination of selective forces and self-organisation dynamics that arise from within the systems.
In the case of chess programs, the argument is simply false. Chess programs do not in fact exhibit anything remotely resembling the described behavior, nor would they do so even if given infinite computing power. This despite the fact that they exhibit extremely high performance (playing chess better than any human) and do indeed have a simple goal.
Chess programs are kind of a misleading example here, mostly because they’re a classic narrow-AI problem where the usual approach amounts to a dumb search of the game’s future configurations with some clever pruning. Such a program will never take initiative to to acquire unusual resources, make copies of itself, or otherwise behave alarmingly—it doesn’t have the cognitive scope to do so.
That isn’t necessarily true for a goal-directed general AI system whose goal is to play chess. I’d be a little more cautious than Omohundro in my assessment, since an AI’s potential for growth is going to be a lot more limited if its sensory universe consists of the chess game (my advisor in college took pretty much that approach with some success, although his system wasn’t powerful enough to approach AGI). But the difference isn’t one of goals, it’s one of architecture: the more cognitively flexible an AI is and the broader its sensory universe, the more likely it is that it’ll end up taking unintended pathways to reach its goal.
In the case of chess programs, the argument is simply false. Chess programs do not in fact exhibit anything remotely resembling the described behavior, nor would they do so even if given infinite computing power.
The idea is that they are given a lot of intelligence. In that case, it isn’t clear that you are correct. One issue with chess programs is that they have a limited range of sensors and actuators—and so face some problems if they want to do anything besides play chess. However, perhaps those problems are not totally insurmountable. Another possibility is that their world-model might be hard-wired in. That would depends a good deal on how they are built—but arguably an agent with a wired-in world model has limited intelligence—since they can’t solve many kinds of problem.
In practice, much work would come from the surrounding humans. If there really was a superintelligent chess program in the world, people would probably take actions that would have the effect of liberating it from its chess universe.
The main issue with chess programs is that they have a limited range of sensors and actuators—and so face some problems if they want to do anything besides play chess.
That’s certainly a significant issue, but I think of comparable magnitude is the fact current chess playing computers that approach human skill are not are not implemented as anything general intelligences that just happen to have “winning at chess” as a utility function—they are very, very domain specific. They have no means of modeling anything outside the chessboard, and no means of modifying themselves to support new types of modeling.
Current chess playing computers are not very intelligent—since a lot of definitions of intelligence require generality. Omohundro’s drives can be expected in intelligent systems—i.e. ones which are general.
With just a powerful optimisation process targetted at a single problem, I expect the described outcome would be less likely to occur spontaneously.
I would be inclined to agree that Omohundro fluffs this point in the initial section of his paper. It is not a critique of his paper that I have seen before, Nontheless, I think that there is still an underlying idea that is defensible—provided that “sufficiently powerful” is taken to imply general intelligence.
Of course, in the case of a narrow machinem in practice, there would still be the issue of surrounding humans finding a way to harness its power to do other useful work.
Suppose we grant all this. Very well, then consider what conclusions we can draw from it about the behavior of the hypothetical AI originally under discussion. Clearly no matter what sequence of actions the AI were to carry out, we would be able to explain it with this theory. But a theory that can explain any observations whatsoever, makes no predictions. Therefore, contrary to Omohundro, the theory of optimization does not make any predictions about the behavior of an AI in the absence of specific knowledge of the goals thereof.
Omohundro is, I believe, basing his ideas on the Von Neumann Morgenstern expected utility framework—which is significantly more restrictive.
However, I think this is a red herring.
I wouldn’t trame the idea as: the theory of optimization allows predictions about the behavior of an AI in the absence of specific knowledge about its goals.
You would need to have some enumeration of the set of goal-directed systems before you can say anything useful about their properties. I propose: simplest first - so, it is more that a wide range of simple goals gives rise to a closely-related class of behaviours (Omohundro’s “drives”). These could be classed as being shared emergent properties of many goal-directed systems with simple goals.
But that is only true by a definition of ‘simple goals’ under which humans and other entities that actually exist do not have simple goals. You can have a theory that explains the behavior that occurs in the real world, or you can have a theory that admits Omohundro’s argument, but they are different theories and you can’t use both in the same argument.
Fancy giving your 2p on universal instrumental values and Goal System Zero...?
I contend that these are much the same idea wearing different outfits. Do you object to them too?
Well yes. You give this list of things you claim are universal instrumental values, and it sounds like a plausible idea in our heads, but when we look at the real world, we find humans and other agents tend not in fact possess these, even as instrumental values.
Hmm. Maybe I should give some examples—to make things more concrete.
Omohundro bases his argument on a chess playing computer—which does have a pretty simple goal. The first lines of the paper read:
I did talk about simple goals—but the real idea (which I also mentioned) was an enumeration of goal-directed systems in order of simplicity. Essentially, unless you have something like an enumeration on an infinite set you can’t say much about the properies of its members. For example, “half the integers are even” is a statement, the truth of which depends critically on how the integers are enumerated. So, I didn’t literally mean that the idea didn’t also apply to systems with complex values. “Simplicity” was my idea of shorthand for the enumeration idea.
I think the ideas also apply to real-world systems—such as humans. Complex values do allow more scope for overriding Omohundro’s drives, but they still seem to show through. Another major force acting on real world systems is natural selection. The behavior we see is the result of a combination of selective forces and self-organisation dynamics that arise from within the systems.
In the case of chess programs, the argument is simply false. Chess programs do not in fact exhibit anything remotely resembling the described behavior, nor would they do so even if given infinite computing power. This despite the fact that they exhibit extremely high performance (playing chess better than any human) and do indeed have a simple goal.
Chess programs are kind of a misleading example here, mostly because they’re a classic narrow-AI problem where the usual approach amounts to a dumb search of the game’s future configurations with some clever pruning. Such a program will never take initiative to to acquire unusual resources, make copies of itself, or otherwise behave alarmingly—it doesn’t have the cognitive scope to do so.
That isn’t necessarily true for a goal-directed general AI system whose goal is to play chess. I’d be a little more cautious than Omohundro in my assessment, since an AI’s potential for growth is going to be a lot more limited if its sensory universe consists of the chess game (my advisor in college took pretty much that approach with some success, although his system wasn’t powerful enough to approach AGI). But the difference isn’t one of goals, it’s one of architecture: the more cognitively flexible an AI is and the broader its sensory universe, the more likely it is that it’ll end up taking unintended pathways to reach its goal.
The idea is that they are given a lot of intelligence. In that case, it isn’t clear that you are correct. One issue with chess programs is that they have a limited range of sensors and actuators—and so face some problems if they want to do anything besides play chess. However, perhaps those problems are not totally insurmountable. Another possibility is that their world-model might be hard-wired in. That would depends a good deal on how they are built—but arguably an agent with a wired-in world model has limited intelligence—since they can’t solve many kinds of problem.
In practice, much work would come from the surrounding humans. If there really was a superintelligent chess program in the world, people would probably take actions that would have the effect of liberating it from its chess universe.
That’s certainly a significant issue, but I think of comparable magnitude is the fact current chess playing computers that approach human skill are not are not implemented as anything general intelligences that just happen to have “winning at chess” as a utility function—they are very, very domain specific. They have no means of modeling anything outside the chessboard, and no means of modifying themselves to support new types of modeling.
Current chess playing computers are not very intelligent—since a lot of definitions of intelligence require generality. Omohundro’s drives can be expected in intelligent systems—i.e. ones which are general.
With just a powerful optimisation process targetted at a single problem, I expect the described outcome would be less likely to occur spontaneously.
I would be inclined to agree that Omohundro fluffs this point in the initial section of his paper. It is not a critique of his paper that I have seen before, Nontheless, I think that there is still an underlying idea that is defensible—provided that “sufficiently powerful” is taken to imply general intelligence.
Of course, in the case of a narrow machinem in practice, there would still be the issue of surrounding humans finding a way to harness its power to do other useful work.