With a recursively self-improving AI, once you create something able to run, running a test can turn to deploying even without programmer’s intention.
I’ve asked this elsewhere to no avail, but I’m still curious—does it follow from this that developing some reliable theoretical understanding about the properties of algorithms capable and incapable of self-improvement is a useful step towards safe AI research?
I mean, it’s clear that useful intelligence that is incapable of recursive self-improvement is possible… I’m an existence proof, for example.
If we can quantify the properties of such intelligences, and construct a tool that can inspect source code prior to executing it to ensure that it lacks those properties, then it seems to follow that we can safely construct human-level AIs of various sorts. (Supposing, of course, that we’re capable of building human-level AIs at all… an assumption that appears to be adopted by convention in this context.)
it’s clear that useful intelligence that is incapable of recursive self-improvement is possible… I’m an existence proof, for example.
I wouldn’t be so sure about it. Imagine that you are given unlimited time, perfect health, and you can use as much data storage (paper, computers, etc) as you need. Do you think your self-improvement would stop at some point?
Problem of humans is that they have limited time, much of which is wasted by gathering resources to survive, or by climbing up the social ladder… and then they die, and the next generation starts almost from zero. At least we have culture, education, books and other tools which allow next generation to use a part of the achievements of the previous generations—unfortunately, the learning also takes too much time. We are so limited by our hardware.
Imagine a child growing up. Imagine studying at elementary school, high school, university. Is this an improvement? Yes. Why does it stop? Because we run out of time and resources, our own health and abilities being also a limited resource. However as a species, humans are self-improving. We are just not fast enough to FOOM as individuals (yet).
Supposing all this is true, it nevertheless suggests a path for defining a safe route for research.
As you say, we are limited by our hardware, by our available resources, by the various rate-limiting steps in our self-improvement. There’s nothing magical about these limits; they are subject to study and to analysis. Sufficiently competent analysis could quantify those qualitative limits, could support a claim like “to achieve X level of self-improvement given Y resources would take a mind like mine Z years”. The same kind of analysis could justify similar claims about other sorts of optimizing systems other than my mind.
If I have written the source code for an optimizing system, and such an analysis of the source code concludes that for it to exceed TheOtherDave-2012′s capabilities on some particular reference platform would take no less than 35 minutes, then it seems to follow that I can safely execute that source code on that reference platform for half an hour.
Edit: Or, well, I suppose “safely” is relative; my own existence represents some risk, as I’m probably smart enough to (for example) kill a random AI researcher given the element of surprise should I choose to do so. But the problem of constraining the inimical behavior of human-level intelligences is one we have to solve whether we work on AGI or not.
Different humans can have the same lifespan, yet achieve different results. Think about Michelangelo or da Vinci… and then think about billions of unknown humans who lived the same age (at least thousands of them had the same lifespan and the same resources and opportunities for education). Analogically, half an hour may be not enough for an average self-improving AI, but may be plenty of time for a “Michelangelo of recursive self-improvement”. And before the experiments we don’t even know what is the distribution of such “Michelangelos” in the AI population.
Speaking about a learning curve is just a metaphor that can’t be taken too literally. Learning happens in jumps, some of them smaller, some of them larger, only in average in a long run it can be approximated by a curve. If a Moore’s law say that computers will be twice as fast in 18 months, it does not mean they get exactly 1.04 times faster every month. The improvement comes as a series of discrete steps. Learning to read gives humans ability to learn faster, but we cannot divide the effect and say what is the acceleration per letter of alphabet. And we also cannot look at the history of medical research, and conclude that according to its speed, the cure of cancer will be discovered exactly on the 31st of December 2017, 12:30 PM.
So if you think that to achieve some dangerous ability a system would have to run 35 minutes, I would recommend to run it no longer than 1 minute, preferably less, and then carefully analyze the results. Problem is, we can do this only as long as the AI is less intelligent that us. After this point we get to “AI does something, and we are too stupid to understand what it means, but it seems to work somehow” stage.
Agreed that once a system gets “smarter than us” all bets are off, which suggests that the important threshold for safety considerations is “how long will it take for algorithm X running on reference platform Y to get smarter than me.”
Agreed that if my understanding of the engineering constraints of the system is so unreliable that all I can say is “Well, the average algorithm will take about half an hour to get there on an average platform, but who knows how fast outliers can go?” then I can’t safely run an algorithm for any length of time… I just don’t know enough yet.
We don’t understand the engineering constraints that affect semiconductor development well enough to set a confident limit on how quickly can improve, so all we have is unreliable generalizations like Moore’s Law. We don’t understand the engineering constraints that affect learning in humans even that well. We understand the engineering constraints that affect the development of cancer cures even less well than that.
You’re absolutely right that, in that state of ignorance, we can’t say what’s safe and what’s not.
You seem to be assuming that that state of ignorance is something we can’t do anything about, an inherent limitation of the universe and the human condition… that the engineering constraints affecting the maximum rates of self-optimization of a particular algorithm on a particular platform are and will always be a mystery.
If that’s true, then sure, there’s never a safe threshold we can rely on.
I don’t really see why that would be true, though. It’s a hard problem, certainly, but it’s an engineering problem. If understanding the engineering constraints that govern rates of algorithm self-optimization is possible (that is, if it’s not some kind of ineluctable Mystery) and if that would let us predict reliably the maximum safe running time of a potentially self-optimizing algorithm, it seems like that would be a useful direction for further research.
You seem to be assuming that that state of ignorance is something we can’t do anything about
No, no, no. We probably can do something about it. I just assume that it will be more complicated than “make an estimate that complexity C will take time T, and then run a simulation for time S<T”; especially if we have no clue at all what the word ‘complexity’ means, despite pretending that it is a value we can somehow measure on a linear scale.
First step, we must somehow understand what “self-improvement” means and how to measure it. Even this idea can be confused, so we need to get a better understanding. Only then it makes sense to plan the second step. Or maybe I’m even confused about this all.
The only part I feel sure about is that we should first understand what self-improvement is, and only then we can try to measure it, and only then we can attempt to use some self-improvement treshold as a safety mechanism in an AI simulator.
This is a bit different from other situations, where you can first measure something, and then it is enough time to collect data and develop some understanding. Here a situation where you have something to measure (when there is a self-improving process), it is already an existential risk. If you have to make a map of minefield, you don’t start by walking on the field and stomping heavily, even if in other situation an analogical procedure would be very good.
First step, we must somehow understand what “self-improvement” means and how to measure it. Even this idea can be confused, so we need to get a better understanding.
Yes, absolutely agreed. That’s the place to start. I’m suggesting that doing this would be valuable, because if done properly it might ultimately lead to a point where our understanding is quantified enough that we can make reliable claims about how long we expect a given amount of self-improvement to take for a given algorithm given certain resources.
This is a bit different from other situations, where you can first measure something
Sure, situations where you can safely first measure something are very different from the situations we’re discussing.
If we are capable of building minds smarter than ourselves, that counts as self-improvement for the purposes of this discussion. If we are not, of course, we have nothing to worry about here.
Well, another possibility is that some of us are and others of us are not. (That sentiment gets expressed fairly often in the Sequences, for example.)
In which case we might still have something to worry about as a species, but nevertheless be able to safely construct human-level optimizers, given a reliable theoretical understanding of the properties of algorithms capable of self-improvement.
Conversely, such an understanding might demonstrate that all human-level minds are potentially self-improving in the sense we’re talking about (which I would not ordinarily label “self-improvement”, but leave that aside), in which case we’d know we can’t safely construct human-level optimizers without some other safety mechanism (e.g. Friendliness)… though we might at the same time know that we can safely construct chimpanzee-level optimizers, or dog-level optimizers, or whatever the threshold turns out to be.
Which would still put us in a position to be able to safely test some of our theories about the behavior of artificial optimizers, not to mention allow us to reap the practical short-term benefits of building such things. (Humans have certainly found wetware dog-level optimizers useful to have around over most of our history; I expect we’d find software ones useful as well.)
It isn’t Utopia, granted, but then few things are.
I’ve asked this elsewhere to no avail, but I’m still curious—does it follow from this that developing some reliable theoretical understanding about the properties of algorithms capable and incapable of self-improvement is a useful step towards safe AI research?
I mean, it’s clear that useful intelligence that is incapable of recursive self-improvement is possible… I’m an existence proof, for example.
If we can quantify the properties of such intelligences, and construct a tool that can inspect source code prior to executing it to ensure that it lacks those properties, then it seems to follow that we can safely construct human-level AIs of various sorts. (Supposing, of course, that we’re capable of building human-level AIs at all… an assumption that appears to be adopted by convention in this context.)
I wouldn’t be so sure about it. Imagine that you are given unlimited time, perfect health, and you can use as much data storage (paper, computers, etc) as you need. Do you think your self-improvement would stop at some point?
Problem of humans is that they have limited time, much of which is wasted by gathering resources to survive, or by climbing up the social ladder… and then they die, and the next generation starts almost from zero. At least we have culture, education, books and other tools which allow next generation to use a part of the achievements of the previous generations—unfortunately, the learning also takes too much time. We are so limited by our hardware.
Imagine a child growing up. Imagine studying at elementary school, high school, university. Is this an improvement? Yes. Why does it stop? Because we run out of time and resources, our own health and abilities being also a limited resource. However as a species, humans are self-improving. We are just not fast enough to FOOM as individuals (yet).
Supposing all this is true, it nevertheless suggests a path for defining a safe route for research.
As you say, we are limited by our hardware, by our available resources, by the various rate-limiting steps in our self-improvement.
There’s nothing magical about these limits; they are subject to study and to analysis.
Sufficiently competent analysis could quantify those qualitative limits, could support a claim like “to achieve X level of self-improvement given Y resources would take a mind like mine Z years”. The same kind of analysis could justify similar claims about other sorts of optimizing systems other than my mind.
If I have written the source code for an optimizing system, and such an analysis of the source code concludes that for it to exceed TheOtherDave-2012′s capabilities on some particular reference platform would take no less than 35 minutes, then it seems to follow that I can safely execute that source code on that reference platform for half an hour.
Edit: Or, well, I suppose “safely” is relative; my own existence represents some risk, as I’m probably smart enough to (for example) kill a random AI researcher given the element of surprise should I choose to do so. But the problem of constraining the inimical behavior of human-level intelligences is one we have to solve whether we work on AGI or not.
Different humans can have the same lifespan, yet achieve different results. Think about Michelangelo or da Vinci… and then think about billions of unknown humans who lived the same age (at least thousands of them had the same lifespan and the same resources and opportunities for education). Analogically, half an hour may be not enough for an average self-improving AI, but may be plenty of time for a “Michelangelo of recursive self-improvement”. And before the experiments we don’t even know what is the distribution of such “Michelangelos” in the AI population.
Speaking about a learning curve is just a metaphor that can’t be taken too literally. Learning happens in jumps, some of them smaller, some of them larger, only in average in a long run it can be approximated by a curve. If a Moore’s law say that computers will be twice as fast in 18 months, it does not mean they get exactly 1.04 times faster every month. The improvement comes as a series of discrete steps. Learning to read gives humans ability to learn faster, but we cannot divide the effect and say what is the acceleration per letter of alphabet. And we also cannot look at the history of medical research, and conclude that according to its speed, the cure of cancer will be discovered exactly on the 31st of December 2017, 12:30 PM.
So if you think that to achieve some dangerous ability a system would have to run 35 minutes, I would recommend to run it no longer than 1 minute, preferably less, and then carefully analyze the results. Problem is, we can do this only as long as the AI is less intelligent that us. After this point we get to “AI does something, and we are too stupid to understand what it means, but it seems to work somehow” stage.
Agreed that once a system gets “smarter than us” all bets are off, which suggests that the important threshold for safety considerations is “how long will it take for algorithm X running on reference platform Y to get smarter than me.”
Agreed that if my understanding of the engineering constraints of the system is so unreliable that all I can say is “Well, the average algorithm will take about half an hour to get there on an average platform, but who knows how fast outliers can go?” then I can’t safely run an algorithm for any length of time… I just don’t know enough yet.
We don’t understand the engineering constraints that affect semiconductor development well enough to set a confident limit on how quickly can improve, so all we have is unreliable generalizations like Moore’s Law.
We don’t understand the engineering constraints that affect learning in humans even that well.
We understand the engineering constraints that affect the development of cancer cures even less well than that.
You’re absolutely right that, in that state of ignorance, we can’t say what’s safe and what’s not.
You seem to be assuming that that state of ignorance is something we can’t do anything about, an inherent limitation of the universe and the human condition… that the engineering constraints affecting the maximum rates of self-optimization of a particular algorithm on a particular platform are and will always be a mystery.
If that’s true, then sure, there’s never a safe threshold we can rely on.
I don’t really see why that would be true, though. It’s a hard problem, certainly, but it’s an engineering problem. If understanding the engineering constraints that govern rates of algorithm self-optimization is possible (that is, if it’s not some kind of ineluctable Mystery) and if that would let us predict reliably the maximum safe running time of a potentially self-optimizing algorithm, it seems like that would be a useful direction for further research.
No, no, no. We probably can do something about it. I just assume that it will be more complicated than “make an estimate that complexity C will take time T, and then run a simulation for time S<T”; especially if we have no clue at all what the word ‘complexity’ means, despite pretending that it is a value we can somehow measure on a linear scale.
First step, we must somehow understand what “self-improvement” means and how to measure it. Even this idea can be confused, so we need to get a better understanding. Only then it makes sense to plan the second step. Or maybe I’m even confused about this all.
The only part I feel sure about is that we should first understand what self-improvement is, and only then we can try to measure it, and only then we can attempt to use some self-improvement treshold as a safety mechanism in an AI simulator.
This is a bit different from other situations, where you can first measure something, and then it is enough time to collect data and develop some understanding. Here a situation where you have something to measure (when there is a self-improving process), it is already an existential risk. If you have to make a map of minefield, you don’t start by walking on the field and stomping heavily, even if in other situation an analogical procedure would be very good.
Yes, absolutely agreed. That’s the place to start. I’m suggesting that doing this would be valuable, because if done properly it might ultimately lead to a point where our understanding is quantified enough that we can make reliable claims about how long we expect a given amount of self-improvement to take for a given algorithm given certain resources.
Sure, situations where you can safely first measure something are very different from the situations we’re discussing.
If we are capable of building minds smarter than ourselves, that counts as self-improvement for the purposes of this discussion. If we are not, of course, we have nothing to worry about here.
Well, another possibility is that some of us are and others of us are not. (That sentiment gets expressed fairly often in the Sequences, for example.)
In which case we might still have something to worry about as a species, but nevertheless be able to safely construct human-level optimizers, given a reliable theoretical understanding of the properties of algorithms capable of self-improvement.
Conversely, such an understanding might demonstrate that all human-level minds are potentially self-improving in the sense we’re talking about (which I would not ordinarily label “self-improvement”, but leave that aside), in which case we’d know we can’t safely construct human-level optimizers without some other safety mechanism (e.g. Friendliness)… though we might at the same time know that we can safely construct chimpanzee-level optimizers, or dog-level optimizers, or whatever the threshold turns out to be.
Which would still put us in a position to be able to safely test some of our theories about the behavior of artificial optimizers, not to mention allow us to reap the practical short-term benefits of building such things. (Humans have certainly found wetware dog-level optimizers useful to have around over most of our history; I expect we’d find software ones useful as well.)
It isn’t Utopia, granted, but then few things are.