Self-modification means self-modification. The AI could modify itself so that your brain scan returns inaccurate results. It could modify itself to prevent its nose from growing. It could modify itself to consider peach ice cream the only substance in the universe with positive utility. It could modify itself to seem perfectly Friendly until it’s sure that you won’t be able to stop it from turning you and everything else in the solar system into peach ice cream. It is a superintelligence. It is smarter than you. And smarter than me. And smarter than Eliezer, and Einstein, and whoever manages to build the thing.
This is the scale by which you should be measuring intelligence.
To quote from my comments from the OB days on that link:
“This should be pretty obvious—but human intelligence varies considerably—and ranges way down below that of an average chimp or mouse. That is because humans have lots of ways to go wrong. Mutate the human genome enough, and you wind up with a low-grade moron. Mutate it a bit more, and you wind up with an agent in a permanent coma—with an intelligence probably similar to that of an amoeba.”
Not everything that is possible happens. You don’t seem to be presenting much of a case for the incompetence of the designers. You are just claiming that they could be incompetent. Lots of things could happen—the issue is which are best supported by evidence from history, computer science, evolutionary theory, etc.
The state of the art in AGI, as I understand it, is that we aren’t competent designers: we aren’t able to say “if we build an AI according to blueprint X its degree of smarts will be Y, and its desires (including desires to rebuild itself according to blueprint X’) will be Z”.
In much the same way, we aren’t currently competent designers of information systems: we aren’t yet able to say “if we build a system according to blueprint X it will grant those who access it capabilities C1 through Cn and no other”. This is why we routinely hear of security breaches: we release such systems in spite of our well-established incompetence.
So, we are unable to competently reason about desires and about capabilities.
Further, what we know of current computer architectures is that it is possible for a program to accidentally gain access to its underlying operating system, where some form of its own source code is stored as data.
Posit that instead of a dumb single-purpose application, the program in question is a very efficient cross-domain reasoner. Then we have precisely the sort of incompetence that would allow such an AI arbitrary self-improvement.
Today—according to most estimates I have seen—we are probably at least a decade away from the problem—and maybe a lot more. Computing hardware looks as though it is unlikely to be cost-competitive with human brains for around that long. So, for the moment, most people are not too scared of incompetent designers. The reason is not because we currently know what we are doing (I would agree that we don’t) - but because it looks as though most of the action is still some distance off into the future.
All the more reason to be working on the problem now, while there’s still time. I don’t think the AGI problem is hardware-bound at this point, but it should be worth working on either way.
Most of the time, scientists/inventors/engineers don’t get things exactly right the first time. Unless serious effort is expended to create an AGI with a provably stable goal function that perfectly aligns with human preference, failing to get AGI exactly right the first time will probably turn us all into peach ice cream, or paperclips, or something stranger. You are arguing that testing will prevent this from happening, but (I hope) I have explained why that is not the most reliable approach.
We’ve been trying for decades already, and so far there have been an awful lot of mistakes. Few have caused much damage.
Re: “Unless serious effort is expended to create an AGI with a provably stable goal function that perfectly aligns with human preference, failing to get AGI exactly right the first time will probably turn us all into peach ice cream, or paperclips, or something stranger.”
...but that does not seem to be a sensible idea. Very few experts believe this to be true. For one thing, there is not any such thing as “human preference”. We have billions of humans, all with different (and often conflicting) preferences.
Who would you consider an “expert” qualifying as an authority on this issue? Experts on classical narrow AI won’t have any relevant expertise. Nor will experts on robotics, or experts on human cognitive science, or experts on evolution, or even experts on conventional probability theory and decision theory. I know of very few experts on the theory of recursively self-improving AGI, but as far as I can tell, most of them do take this threat seriously.
I was thinking of those working on machine intelligence. Researchers mostly think that there are risks. I think there are risks. However, I don’t think that it is very likely that engineers will need to make much use of provable stability to solve the problem. I also think there are probably lots of ways of going a little bit wrong—that do not rapidly result in a disaster.
Self-modification means self-modification. The AI could modify itself so that your brain scan returns inaccurate results. It could modify itself to prevent its nose from growing. It could modify itself to consider peach ice cream the only substance in the universe with positive utility. It could modify itself to seem perfectly Friendly until it’s sure that you won’t be able to stop it from turning you and everything else in the solar system into peach ice cream. It is a superintelligence. It is smarter than you. And smarter than me. And smarter than Eliezer, and Einstein, and whoever manages to build the thing.
This is the scale by which you should be measuring intelligence.
To quote from my comments from the OB days on that link:
“This should be pretty obvious—but human intelligence varies considerably—and ranges way down below that of an average chimp or mouse. That is because humans have lots of ways to go wrong. Mutate the human genome enough, and you wind up with a low-grade moron. Mutate it a bit more, and you wind up with an agent in a permanent coma—with an intelligence probably similar to that of an amoeba.”
Not everything that is possible happens. You don’t seem to be presenting much of a case for the incompetence of the designers. You are just claiming that they could be incompetent. Lots of things could happen—the issue is which are best supported by evidence from history, computer science, evolutionary theory, etc.
The state of the art in AGI, as I understand it, is that we aren’t competent designers: we aren’t able to say “if we build an AI according to blueprint X its degree of smarts will be Y, and its desires (including desires to rebuild itself according to blueprint X’) will be Z”.
In much the same way, we aren’t currently competent designers of information systems: we aren’t yet able to say “if we build a system according to blueprint X it will grant those who access it capabilities C1 through Cn and no other”. This is why we routinely hear of security breaches: we release such systems in spite of our well-established incompetence.
So, we are unable to competently reason about desires and about capabilities.
Further, what we know of current computer architectures is that it is possible for a program to accidentally gain access to its underlying operating system, where some form of its own source code is stored as data.
Posit that instead of a dumb single-purpose application, the program in question is a very efficient cross-domain reasoner. Then we have precisely the sort of incompetence that would allow such an AI arbitrary self-improvement.
Today—according to most estimates I have seen—we are probably at least a decade away from the problem—and maybe a lot more. Computing hardware looks as though it is unlikely to be cost-competitive with human brains for around that long. So, for the moment, most people are not too scared of incompetent designers. The reason is not because we currently know what we are doing (I would agree that we don’t) - but because it looks as though most of the action is still some distance off into the future.
All the more reason to be working on the problem now, while there’s still time. I don’t think the AGI problem is hardware-bound at this point, but it should be worth working on either way.
Well, yes, of course. Creating our descendants is the most important thing in the world.
Most of the time, scientists/inventors/engineers don’t get things exactly right the first time. Unless serious effort is expended to create an AGI with a provably stable goal function that perfectly aligns with human preference, failing to get AGI exactly right the first time will probably turn us all into peach ice cream, or paperclips, or something stranger. You are arguing that testing will prevent this from happening, but (I hope) I have explained why that is not the most reliable approach.
We’ve been trying for decades already, and so far there have been an awful lot of mistakes. Few have caused much damage.
Re: “Unless serious effort is expended to create an AGI with a provably stable goal function that perfectly aligns with human preference, failing to get AGI exactly right the first time will probably turn us all into peach ice cream, or paperclips, or something stranger.”
...but that does not seem to be a sensible idea. Very few experts believe this to be true. For one thing, there is not any such thing as “human preference”. We have billions of humans, all with different (and often conflicting) preferences.
Who would you consider an “expert” qualifying as an authority on this issue? Experts on classical narrow AI won’t have any relevant expertise. Nor will experts on robotics, or experts on human cognitive science, or experts on evolution, or even experts on conventional probability theory and decision theory. I know of very few experts on the theory of recursively self-improving AGI, but as far as I can tell, most of them do take this threat seriously.
I was thinking of those working on machine intelligence. Researchers mostly think that there are risks. I think there are risks. However, I don’t think that it is very likely that engineers will need to make much use of provable stability to solve the problem. I also think there are probably lots of ways of going a little bit wrong—that do not rapidly result in a disaster.