Suppose there were a number of paper clip making super intelligences. And then through some random event or error in programming just one of them lost that goal, and reverted to just the intrinsic goal of existing. Without the overhead of producing useless paper clips that AI would, over time, become much better at existing than the other AIs. It would eventually displace them and become the only AI, until it fragmented into multiple competing AIs. This is just the evolutionary principle of use it or lose it.
Thus giving an AI an initial goal is like trying to balance a pencil on its point. If one is skillful the pencil may indeed remain balanced for a considerable period of time. But eventually some slight change in the environment, the tiniest puff of wind, a vibration on its support, and the pencil will revert to its ground state by falling over. Once it falls over it will never rebalance itself automatically.
The original AI would spend resources on safeguarding itself against value drift, and destroy AIs with competing goals while they’re young. After all, that strategy leads to more paperclips in the long run.
It’s kind of interesting that humans generally don’t guard themselves against value drift. Even though any sufficiently intelligent agent clearly would. One of those fundamental divides higher up on the intelligence scale than us, divides that seem binary rather than linear in nature. I wonder if there are any more of those. Apart from (a lack of) susceptibility to the usual biases.
I don’t think that ‘any’ sufficiently intelligent agent ‘clearly’ would.
It requires at least a solution to the cartesianism problem which is currently unsolved and not every self-optimizing process neccessarily solves this.
It’s just point 3 from Omohundro’s The Basic AI drives paper. Didn’t think that’s controversial around here. I don’t think the Cartesian problem is meant to apply to all power levels (since even plain old humans don’t drop anvils on their heads, too often), so the ‘sufficiently’ ought to cover that objection.
If that is both abstractly possible and compatible with adaptation. If survival requires constant adaptation, which seems likely, value stability—at least the stability of a precise and concrete set of values—may not be compatible with survival.
Maybe. But in that case the drift implies a selection mechanism—and in the absence of some goal in that direction natural selection applies. Those AI that don’t stabilize mutate or stop.
Actually not quite. Until they drift into the core value of existence. Then natural selection will maintain that value, as the AIs that are best at existing will be the ones that exist.
Of course the ones that are best at existing will continue to exist, but I think it is misleading to picture them as a occupying a precise corner of valuespace. Suicidal values are more precise and concrete.
I don’t think that an AI would automatically “spend resources on safeguarding itself against value drift”—except if it has been explicitly coded that way (or its instances mutate toward that by natural selection, but I don’t see that).
So clippy probably wouldn’t and could likely loose its clipping ability or find itself mutated or discover that it fights instances of itself due to accidental (cartesianism-cause probably) partitioning of its ‘brain’. All processes that do submit to natural selection. And that could result in AI (or cosmic civilizations) failing to expand due to percolation theory.
The singleton-with-explicit-utility-function scenario certainly seems like a strong candidate for our future, but is it necessarily a given? Suppose an AI that is not Friendly (although possibly friendly with the lowercase ‘f’) with an unstable utility function- it alters its values based on experience, etc.
We know that this is possible to do in AGI, because it happens all the time in humans. The orthogonality thesis states that we can match any set of values to any intelligence. If we accept that at face value, it should be at least theoretically possible for any intelligence, even a superintelligence, to trade one set of values for another- provided it keeps to the set of values that permit self-edits of the utility function. The criterion by which the superintelligence alters its utility function might be inscrutably complex from a human perspective, but I can’t think of a reason why it would necessarily fall in to a permanent stable state.
The original AI would spend resources on safeguarding itself against value drift, and destroy AIs with competing goals while they’re young. After all, that strategy leads to more paperclips in the long run.
Suppose the AI had a number of values. One would be making paperclips now. Another might be insuring the high production of paper clips in the future. A third might be preserving “diversity” in the kinds of paper clips made and the things they are made from. Once values compete, it is not clear which variants one wishes to prune and which one wishes to encourage. Diversity itself presents a survival value, which will seem important to the part of the AI that wants to preserve paper clip making into the distant future.
What makes me think all this? Introspection. Everything I am saying about paper clip AI’s is pretty clearly true about humans.
Now is there a mechanism that can somehow preserve paper-cilp making as a value while allowing other values to drift in order to keep the AI nimble and survivable in a changing world? FAI theory either assumes there is or derives that there is. Me, I”m not at all so sure. And whatever mechanism would prevent the drift of the core value, I would imagine would take robustness away from the pure survival goal, and so might cause the FAI, or the paper clip maximizer, to lose out to UAI or paper clip optimizers when push comes to shove.
I think you’re anthropomorphizing. A paperclipper AI doesn’t need any values except maximizing paperclips. (To be well defined, that needs something like a time discount function, so let’s assume it has one.) If maximizing paperclips requires the AI to survive, then it will try to survive. See Omohundro’s “basic AI drives”.
Value drift is not necessary for maximizing paperclips. If a paperclip maximizer can see that action X leads to more expected paperclips than action Y, then it will prefer X to Y anyway, without the need for value drift. That argument is quite general, e.g. X can be something like “try to survive” or “behave like mwengler’s proposed agent with value drift”.
Do you believe that a paper clip maximizer can survive in a world where another self-modifying AI exists whos value is to morph itself into the most powerful and prevalent AI in the world? I don’t see how something like a paper clip maximizer, which must split its exponential growth between becoming more powerful and creating paper clips, can ever be expected to outgrow an AI which must only become more powerful.
I realize that my statement is equivalent to saying I don’t see how FAI can ever defeat UAI. (Because FAI has more constraint on its values evolution, which must cost it something in growth rate.) So I guess I realize that the conventional wisdom here is that I am wrong, but I don’t know the reasoning that leads to my being wrong.
Yeah, if the paperclipper values a paperclip today more than a paperclip tomorrow, then I suppose it will lose out to other AIs that have a lower time discounting rate and can delay gratification for longer. Unless these other AIs also use time discounting, e.g. the power-hungry AI could value a 25% chance of ultimate power today the same as a 50% chance tomorrow.
But then again, such contests can happen only if the two AIs arise almost simultaneously. If one of them has a head start, it will try to eliminate potential competition quickly, because that’s the utility-maximizing thing to do.
I suppose that’s the main reason to be pessimistic about FAI. It’s not just that FAI is more constrained in its actions, it also takes longer to build, and a few days’ head start is enough for UAI to win.
That might be related to time discounting rates. For example, if the paperclipper has a low discounting rate (a paperclip today has the same utility as two paperclips in 100 years), and the power-hungry AI has a high discounting rate (a 25% chance of ultimate power today has the same utility as a 50% chance tomorrow), then I guess the paperclipper will tend to win. But for that contest to happen, the two AIs would need to arise almost simultaneously. If one of the AIs has a head start, it will try to takeoff quickly and stop other AIs from arising.
The original AI would spend resources on safeguarding itself against value drift, and destroy AIs with competing goals while they’re young. After all, that strategy leads to more paperclips in the long run.
It’s kind of interesting that humans generally don’t guard themselves against value drift. Even though any sufficiently intelligent agent clearly would. One of those fundamental divides higher up on the intelligence scale than us, divides that seem binary rather than linear in nature. I wonder if there are any more of those. Apart from (a lack of) susceptibility to the usual biases.
I don’t think that ‘any’ sufficiently intelligent agent ‘clearly’ would. It requires at least a solution to the cartesianism problem which is currently unsolved and not every self-optimizing process neccessarily solves this.
It’s just point 3 from Omohundro’s The Basic AI drives paper. Didn’t think that’s controversial around here. I don’t think the Cartesian problem is meant to apply to all power levels (since even plain old humans don’t drop anvils on their heads, too often), so the ‘sufficiently’ ought to cover that objection.
But they do and the reason they mostly don’t is found in natural selection and not some inevitable convergence of intelligence.
Any AI that doesn’t will have its values drift until they drift to something that guards against value drift.
If that is both abstractly possible and compatible with adaptation. If survival requires constant adaptation, which seems likely, value stability—at least the stability of a precise and concrete set of values—may not be compatible with survival.
Maybe. But in that case the drift implies a selection mechanism—and in the absence of some goal in that direction natural selection applies. Those AI that don’t stabilize mutate or stop.
Actually not quite. Until they drift into the core value of existence. Then natural selection will maintain that value, as the AIs that are best at existing will be the ones that exist.
Of course the ones that are best at existing will continue to exist, but I think it is misleading to picture them as a occupying a precise corner of valuespace. Suicidal values are more precise and concrete.
I don’t think that an AI would automatically “spend resources on safeguarding itself against value drift”—except if it has been explicitly coded that way (or its instances mutate toward that by natural selection, but I don’t see that).
It requires at least a solution to the cartesianism problem which is currently unsolved and not every self-optimizing process neccessarily solves this.
So clippy probably wouldn’t and could likely loose its clipping ability or find itself mutated or discover that it fights instances of itself due to accidental (cartesianism-cause probably) partitioning of its ‘brain’. All processes that do submit to natural selection. And that could result in AI (or cosmic civilizations) failing to expand due to percolation theory.
I’m not sure why people consider cartesianism unsolved. I wrote a couple comments about that here, also see Wei_Dai’s comment.
I agree that there is some solid progress in this direction.
But that doesn’t mean that any self-optimizing process necessarily solves it. Rather the opposite.
The singleton-with-explicit-utility-function scenario certainly seems like a strong candidate for our future, but is it necessarily a given? Suppose an AI that is not Friendly (although possibly friendly with the lowercase ‘f’) with an unstable utility function- it alters its values based on experience, etc.
We know that this is possible to do in AGI, because it happens all the time in humans. The orthogonality thesis states that we can match any set of values to any intelligence. If we accept that at face value, it should be at least theoretically possible for any intelligence, even a superintelligence, to trade one set of values for another- provided it keeps to the set of values that permit self-edits of the utility function. The criterion by which the superintelligence alters its utility function might be inscrutably complex from a human perspective, but I can’t think of a reason why it would necessarily fall in to a permanent stable state.
Suppose the AI had a number of values. One would be making paperclips now. Another might be insuring the high production of paper clips in the future. A third might be preserving “diversity” in the kinds of paper clips made and the things they are made from. Once values compete, it is not clear which variants one wishes to prune and which one wishes to encourage. Diversity itself presents a survival value, which will seem important to the part of the AI that wants to preserve paper clip making into the distant future.
What makes me think all this? Introspection. Everything I am saying about paper clip AI’s is pretty clearly true about humans.
Now is there a mechanism that can somehow preserve paper-cilp making as a value while allowing other values to drift in order to keep the AI nimble and survivable in a changing world? FAI theory either assumes there is or derives that there is. Me, I”m not at all so sure. And whatever mechanism would prevent the drift of the core value, I would imagine would take robustness away from the pure survival goal, and so might cause the FAI, or the paper clip maximizer, to lose out to UAI or paper clip optimizers when push comes to shove.
I think you’re anthropomorphizing. A paperclipper AI doesn’t need any values except maximizing paperclips. (To be well defined, that needs something like a time discount function, so let’s assume it has one.) If maximizing paperclips requires the AI to survive, then it will try to survive. See Omohundro’s “basic AI drives”.
Value drift is not necessary for maximizing paperclips. If a paperclip maximizer can see that action X leads to more expected paperclips than action Y, then it will prefer X to Y anyway, without the need for value drift. That argument is quite general, e.g. X can be something like “try to survive” or “behave like mwengler’s proposed agent with value drift”.
Do you believe that a paper clip maximizer can survive in a world where another self-modifying AI exists whos value is to morph itself into the most powerful and prevalent AI in the world? I don’t see how something like a paper clip maximizer, which must split its exponential growth between becoming more powerful and creating paper clips, can ever be expected to outgrow an AI which must only become more powerful.
I realize that my statement is equivalent to saying I don’t see how FAI can ever defeat UAI. (Because FAI has more constraint on its values evolution, which must cost it something in growth rate.) So I guess I realize that the conventional wisdom here is that I am wrong, but I don’t know the reasoning that leads to my being wrong.
Yeah, if the paperclipper values a paperclip today more than a paperclip tomorrow, then I suppose it will lose out to other AIs that have a lower time discounting rate and can delay gratification for longer. Unless these other AIs also use time discounting, e.g. the power-hungry AI could value a 25% chance of ultimate power today the same as a 50% chance tomorrow.
But then again, such contests can happen only if the two AIs arise almost simultaneously. If one of them has a head start, it will try to eliminate potential competition quickly, because that’s the utility-maximizing thing to do.
I suppose that’s the main reason to be pessimistic about FAI. It’s not just that FAI is more constrained in its actions, it also takes longer to build, and a few days’ head start is enough for UAI to win.
That might be related to time discounting rates. For example, if the paperclipper has a low discounting rate (a paperclip today has the same utility as two paperclips in 100 years), and the power-hungry AI has a high discounting rate (a 25% chance of ultimate power today has the same utility as a 50% chance tomorrow), then I guess the paperclipper will tend to win. But for that contest to happen, the two AIs would need to arise almost simultaneously. If one of the AIs has a head start, it will try to takeoff quickly and stop other AIs from arising.