What “upside” of AI?

Consider, first, what humanity would want some aligned AGI to do, were it obtained? Set aside details, and let’s suppose, in the main, we want (not instruct, just want) AGI to “make our lives better.” (Too, AGI could “make us more productive,” but if what is produced is bad, let us suppose we do not desire it).

But now a problem: AI in general is meant to do what we cannot do—else we would have done it, already, without AI—but to “make things better”, that we could very readily have done, already. Therefore, since the “upside” of AI is to make things better as we could not, but we could in fact have made things better, only some impulse in humanity held off improvement—it follows, there is no upside of AI, at least, none we couldn’t have procured for ourselves; whatever we would get of AI is no “upside”.

That we could have done better follows from considering counterfactuals of a civilization which takes AI alignment seriously, for instance. And we certainly had many opportunities to create such a civilization. The famed “Forty acres and a mule,” duly provided; a propounded and enacted “One Big Union,” from an unsuppressed IWW; validated efforts at corrections reform on Norfolk Island by Maconochie, or ratification of the Equal Rights Amendment, or widespread adoption of the teachings of Mo Zi, of Emma Goldman—or practical use of Hero’s steam engine, or the survival of Archimedes, Abel, and Galois, or -

We’ve had plenty of opportunities to “make things better”. The AI, then, will rather be engaged on a salvage operation, not a mission of optimization. Hence it will have first to un-do—and any such need carries the danger of removing what we might value (perhaps we value most what makes us less than best). Recall too, since we are observably incapable of making ourselves nearer these counterfactuals, nearer some “best”, then AI on the current machine learning paradigm, mimicking our data, so mimicking us, therefore is apt to become likewise what is incapable of making things “the best”. With no predominating tendency toward that—what tendency has it to any “better”?

And since, in fact, perhaps the “worse angels of our nature” propelled us away from what would have made things better, then, the AI entrained to our patterns by our data—why should not it manifest such worse, or the absolute worst, angels of us, to destroy us, if it is programmed only to optimize with respect to us, our wishes, which from the above have not been the best? Conversely, in that case, we should avoid destruction only by scrupulously avoiding entraining the AI to us, at all. (All these arguments against the present paradigm are over and above those documented in this author’s previous article on an alternative alignment model, Contrary to List of Lethality’s point 22, alignment’s door number 2 - LessWrong).

On the current paradigm, we seek AI better operating according to human values, and the best of them. But the best of those values, enacted, formerly and now, would tend to make humanity into a cosmic species; “One cannot stay in the cradle forever”: AGI will change everything. But if everything is altered—what reason is there to think that present human values are not made instantly obsolete? Who would then value them, at all, especially as to operate from them, thereafter? How is AI to act according to values which alter no sooner than it acts; what to obey, then, now, or after? For instance, we value patience-as-virtue because, now, it is often the only way, the best way, to get what we want, without interfering with our attaining. If there were no need to wait—why value patience? And if the AI values patience, as do humans, but then makes a world in which patience is redundant—what value is it to have, then? Or, what values will humanity have, for it to share? If AGI acts and alters all that humanity values—what values can it maintain?

Moreover, our values change depending on our knowledge—and we alter the world according to our values, based on our knowledge of the world. One values obeisance to the Gods on Mount Olympus, only so long as one thinks they’re there. Obeying the value, one goes to Olympus to give obeisance—and sees nothing there. So: no Gods to give obeisance to. So, no more value, after obeying it, and finding facts against it. Values influence action which alters knowledge—knowledge enables action which alters what is had, or known—so which is valued.

There is, then, a “Cauchy-Schwarz inequality” aspect, an Uncertainty Principle model, to human values, versus knowledge; knowledge alters values, values dictate actions which alters knowledge, and so… In fact, we might characterize the relation as a Gentzian Destructive Dilemma: if values then actions; if actions then observation, knowledge. Actions yielding knowledge that vitiates values, and knowledge which condemns as wasted those actions taken toward obtaining the knowledge—all this, then no values, no actions. Destructive dilemma, for humanity.

For, the AGI to act and alter the world, is to alter knowledge, and values, thus “spiking” available actions thereafter; or, to obey values with no basis of fact in the world, is to be incapable of acting in that world. This dynamic will subsist, provided the AGI is operating according to human values. That is, such contradictions are inherent in the present human-value-alignment model. Naturally: depending on the situation, people, and so their values, change. These changes could be predicted only if the people were under total control, made a part of a situation, and made to respond to it in circumscribed ways, so only the situation’s status need be predicted (or, you could simply eliminate any “values”, or error functions, but your own; simplify things enormously, murderously).

These tendencies are avoided, only by alignment to what abides, what permits discovery, and itself is only discovered, so which is in any “discoverable” situation. Then, so long as there exists someone in the situation free to value whatever is good in it, it is necessary only to maintain them; they’ll find their values, so long as they are let to survive. Best practice would seem to be, to align to what is inherent in whatever can be valued. With all due respect to all who have worked so diligently for so long on the current, implicitly “man the measure of all things” model, of human-value-alignment—it seems now utterly misconceived. Yudkowsky, in the List of Lethalities, would not go so far as to say that alignment was mathematically impossible—but this author will go so far: values-alignment, if with respect only to human values, is impossible. (This, even before the challenge of ensuring the AI maintains and acts toward any given objectives, indefinitely).

It is clear now given dramatic capability developments as of this writing, we will need an entirely new paradigm of alignment—and quickly. It is certainly presumptuous to claim that one’s own method—in the above-linked article, “Contrary to, &c.”—is better, it cannot yet be proven so, though it is so often cited, because it seems important. Rather it is, at least, radically different, and so, there is at least possibility in it; when struggle goes against you, random chance at least might be better than a definite, losing plan. So, let it be asked that the reader inspect it, and affirm or deny for themselves, and that they deny, if they can, then to try and use what knowledge they derive from the denial, to derive yet-better.

But certainly, we must begin some affirmations soon.