You mention “computing overhang” as a threat essentially akin to hard takeoff. But regarding the value of FAI knowledge, it does not seem similar to me at all. A hard-takeoff AI can, at least in principal, be free from darwinian pressure. A “computing overhang” explosion of many small AIs will tend to be diverse and thus subject to strong evolutionary pressures of all kinds[1]. Presuming that FAI-ness is more-or-less delicate[1.5], those pressures are likely to destroy it as AIs multiply across available computing power (or, if we’re extremely “lucky”[2], to cause FAI-ness of some kind to arise as an evolutionary adaptation). Thus, the “computing overhang” argument would seem to reduce, rather than increase, the probable value [3] of the FAI knowledge / expertise developed by SI. Can you comment on this?
[1] For instance, all else equal, an AI that was easier/faster to train, or able to install/care for its own “children”, or more attractive to humans to “download”, would have an advantage over one that wasn’t; and though certain speculative arguments can be made, it is impossible to predict the combined evolutionary consequences of these various factors.
[1.5] The presumption that FAI-ness is delicate seems to be uncontroversial in the SI paradigm.
[2] I put “lucky” in quotes, because whether or not evolution pushes AIs towards or away from friendliness is probably a fact of mathematics (modulo a sufficiently-clear definition of friendliness[4]). Thus, this is somewhat like saying, “If I’m lucky, 4319 (a number I just arbitrarily chose, not divisible by 2, 3, or 5) is a prime number.” This may or may not accord with your definition of probability theory and “luck”.
[3] Instrumental value, that is; in terms of averting existential risk. Computing overhang would do nothing to reduce the epistemic value – the scientific, moral, or aesthetic interest of knowing how doomed we are (and/or how we are doomed), which is probably quite significant – of the marginal knowledge/expertise developed by SI.
[4] By the way, for sufficiently-broad definitions of friendliness, it is very plausibly true that evolution produces them naturally. If “friendly” just means “not likely to result in a boring universe”, then evolution seems to fit the bill, from experience. But there are many tighter meanings of “friendly” for which it’s hard to imagine how evolution could hit the target. So YMMV a good amount in this regard. But it doesn’t change the argument that computing overhang generally argues against, not for, the instrumental value of SI knowledge/expertise.
One way for the world to quickly go from one single AI to millions of AIs is for the first AGI to deliberately copy itself, or arrange for itself to be copied many times, in order to take advantage of the world’s computing power.
In this scenario, assuming the AI takes the first halfway-intelligent security measure of checksumming all its copies to prevent corruption, the vast majority of the copies will have exactly the same code. Hence, to begin with, there’s no real variation for natural selection to work on. Secondly, unless the AI was programmed to have some kind of “selfish” goal system, the resulting copies will all also have the same utility function, so they’ll want to cooperate, not compete (which is, after all, the reason an AI would want to copy itself. No point doing it if your copies are going to be your enemies).
Of course, a more intelligent first AGI would—rather than creating copies—modify itself to run on a distributed architecture allowing the one AI to take advantage of all the available computing power without all the inefficiency of message passing between independent copies.
In this situation there would still seem to be huge advantages to making the first AGI Friendly, since if it’s at all competent, almost all its children ought to be Friendly too, and they can consequently use their combined computing power to weed out the defective copies. In some respects it’s rather like an intelligence explosion, but using extra computing power rather than code modification to increase its speed and intelligence.
I suppose one possible alternative is if the AGI isn’t smart enough to figure all this out by itself, and so the main method of copying is, to begin with, random humans downloading the FAI source code from, say, wikileaks. If humans are foolish, which they are, some of them will alter the code and run the modified programs, introducing the variation needed for evolution into the system.
The whole assumption that prompted this scenario is that there’s no hard takeoff, so the first agi is probably around human-level in insight and ingenuity, though plausibly much faster. It seems likely that in these circumstances, human actions would still be significant. If it starts aggressively taking over computing resources, humanity will react, and unless the original programmers were unable to prevent v1.0 from being skynet-level unfriendly, at least some humans will escalate as far as necessary to get “their” computers under their control. At that point, it would be trivially easy to start up a mutated version; perhaps even one designed for better friendliness. But once mutations happen, evolution takes over.
Oh, and by the way, checksums may not work to safeguard friendliness for v1.0. For instance, most humans seem pretty friendly, but the wrong upbringinging could turn them bad.
Tl;dr: no-mutations is an inherently more-conjunctive scenario than mutations.
You mention “computing overhang” as a threat essentially akin to hard takeoff. But regarding the value of FAI knowledge, it does not seem similar to me at all. A hard-takeoff AI can, at least in principal, be free from darwinian pressure. A “computing overhang” explosion of many small AIs will tend to be diverse and thus subject to strong evolutionary pressures of all kinds[1]. Presuming that FAI-ness is more-or-less delicate[1.5], those pressures are likely to destroy it as AIs multiply across available computing power (or, if we’re extremely “lucky”[2], to cause FAI-ness of some kind to arise as an evolutionary adaptation). Thus, the “computing overhang” argument would seem to reduce, rather than increase, the probable value [3] of the FAI knowledge / expertise developed by SI. Can you comment on this?
[1] For instance, all else equal, an AI that was easier/faster to train, or able to install/care for its own “children”, or more attractive to humans to “download”, would have an advantage over one that wasn’t; and though certain speculative arguments can be made, it is impossible to predict the combined evolutionary consequences of these various factors.
[1.5] The presumption that FAI-ness is delicate seems to be uncontroversial in the SI paradigm.
[2] I put “lucky” in quotes, because whether or not evolution pushes AIs towards or away from friendliness is probably a fact of mathematics (modulo a sufficiently-clear definition of friendliness[4]). Thus, this is somewhat like saying, “If I’m lucky, 4319 (a number I just arbitrarily chose, not divisible by 2, 3, or 5) is a prime number.” This may or may not accord with your definition of probability theory and “luck”.
[3] Instrumental value, that is; in terms of averting existential risk. Computing overhang would do nothing to reduce the epistemic value – the scientific, moral, or aesthetic interest of knowing how doomed we are (and/or how we are doomed), which is probably quite significant – of the marginal knowledge/expertise developed by SI.
[4] By the way, for sufficiently-broad definitions of friendliness, it is very plausibly true that evolution produces them naturally. If “friendly” just means “not likely to result in a boring universe”, then evolution seems to fit the bill, from experience. But there are many tighter meanings of “friendly” for which it’s hard to imagine how evolution could hit the target. So YMMV a good amount in this regard. But it doesn’t change the argument that computing overhang generally argues against, not for, the instrumental value of SI knowledge/expertise.
One way for the world to quickly go from one single AI to millions of AIs is for the first AGI to deliberately copy itself, or arrange for itself to be copied many times, in order to take advantage of the world’s computing power.
In this scenario, assuming the AI takes the first halfway-intelligent security measure of checksumming all its copies to prevent corruption, the vast majority of the copies will have exactly the same code. Hence, to begin with, there’s no real variation for natural selection to work on. Secondly, unless the AI was programmed to have some kind of “selfish” goal system, the resulting copies will all also have the same utility function, so they’ll want to cooperate, not compete (which is, after all, the reason an AI would want to copy itself. No point doing it if your copies are going to be your enemies).
Of course, a more intelligent first AGI would—rather than creating copies—modify itself to run on a distributed architecture allowing the one AI to take advantage of all the available computing power without all the inefficiency of message passing between independent copies.
In this situation there would still seem to be huge advantages to making the first AGI Friendly, since if it’s at all competent, almost all its children ought to be Friendly too, and they can consequently use their combined computing power to weed out the defective copies. In some respects it’s rather like an intelligence explosion, but using extra computing power rather than code modification to increase its speed and intelligence.
I suppose one possible alternative is if the AGI isn’t smart enough to figure all this out by itself, and so the main method of copying is, to begin with, random humans downloading the FAI source code from, say, wikileaks. If humans are foolish, which they are, some of them will alter the code and run the modified programs, introducing the variation needed for evolution into the system.
The whole assumption that prompted this scenario is that there’s no hard takeoff, so the first agi is probably around human-level in insight and ingenuity, though plausibly much faster. It seems likely that in these circumstances, human actions would still be significant. If it starts aggressively taking over computing resources, humanity will react, and unless the original programmers were unable to prevent v1.0 from being skynet-level unfriendly, at least some humans will escalate as far as necessary to get “their” computers under their control. At that point, it would be trivially easy to start up a mutated version; perhaps even one designed for better friendliness. But once mutations happen, evolution takes over.
Oh, and by the way, checksums may not work to safeguard friendliness for v1.0. For instance, most humans seem pretty friendly, but the wrong upbringinging could turn them bad.
Tl;dr: no-mutations is an inherently more-conjunctive scenario than mutations.