I am comparing the arguments that are being made against implied friendliness to the arguments that are being made about implied doom. Friendliness clearly isn’t something we should expect an artificial general intelligence to exhibit if it isn’t explicitly designed to be friendly. But why then would that be the case for other complex behaviors like recursive self-improvement?
I don’t doubt that it is possible to design an AGI that tends to undergo recursive self-improvement, I am just questioning that most AGI designs do so, that it is a basic AI drive.
Which considerations you believe distinguish AI doing something because it’s told to, from it doing something when not told to?
I believe that very complex behaviors (e.g. taking over the world) are unlikely to be the result of unforeseen implications. If it is explicitly designed (told) to do so, then it will do so. If it isn’t explicitly designed to undergo unbounded self-improvement, then it won’t happen as a side-effect either.
That is not to say that there are no information theoretically simple algorithms exhibiting extremely complex behaviors, but none that work given limited resources.
We should of course take the possibility seriously and take preemptive measures, just like the SIAI is doing it. But I don’t see that the available evidence allows us to believe that the first simple AGI’s will undergo unbounded and uncontrollable self-improvement. What led you to believe that?
Most AGI researchers don’t seem to share the opinion that the first AGI will take over the world on its own. Ben Goertzel calls it the scary idea. Shane Legg writes that “it’s more likely that it will be the team of humans who understand how it works that will scale it up to something significantly super human, rather than the machine itself.”
I am comparing the arguments that are being made against implied friendliness to the arguments that are being made about implied doom. Friendliness clearly isn’t something we should expect an artificial general intelligence to exhibit if it isn’t explicitly designed to be friendly. But why then would that be the case for other complex behaviors like recursive self-improvement?
I don’t doubt that it is possible to design an AGI that tends to undergo recursive self-improvement, I am just questioning that most AGI designs do so, that it is a basic AI drive.
I believe that very complex behaviors (e.g. taking over the world) are unlikely to be the result of unforeseen implications. If it is explicitly designed (told) to do so, then it will do so. If it isn’t explicitly designed to undergo unbounded self-improvement, then it won’t happen as a side-effect either.
That is not to say that there are no information theoretically simple algorithms exhibiting extremely complex behaviors, but none that work given limited resources.
We should of course take the possibility seriously and take preemptive measures, just like the SIAI is doing it. But I don’t see that the available evidence allows us to believe that the first simple AGI’s will undergo unbounded and uncontrollable self-improvement. What led you to believe that?
Most AGI researchers don’t seem to share the opinion that the first AGI will take over the world on its own. Ben Goertzel calls it the scary idea. Shane Legg writes that “it’s more likely that it will be the team of humans who understand how it works that will scale it up to something significantly super human, rather than the machine itself.”