Benquo comments on Should Effective Altruism be at war with North Korea?

Benquo 8 May 2019 18:07 UTC
16 points
On merging utility functions, here’s the relevant quote from Coherent Extrapolated Volition, by Eliezer Yudkowsky:
Avoid creating a motive for modern-day humans to fight over the initial dynamic.
One of the occasional questions I get asked is “What if al-Qaeda programmers write an AI?” I am not quite sure how this constitutes an objection to the Singularity Institute’s work, but the answer is that the solar system would be tiled with tiny copies of the Qur’an. Needless to say, this is much more worrisome than the solar system being tiled with tiny copies of smiley faces or reward buttons. I’ll worry about terrorists writing AIs when I am through worrying about brilliant young well-intentioned university AI researchers with millions of dollars in venture capital. The outcome is exactly the same, and the academic and corporate researchers are far more likely to do it first. This is a critical point to keep in mind, as otherwise it provides an excuse to go back to screaming about politics, which feels so much more satisfying. When you scream about politics you are really making progress, according to an evolved psychology that thinks you are in a hunter-gatherer tribe of two hundred people. To save the human species you must first ignore a hundred tempting distractions.
I think the objection is that, in theory, someone can disagree about what a superintelligence ought to do. Like Dennis [sic], who thinks he ought to own the world outright. But do you, as a third party, want me to pay attention to Dennis? You can’t advise me to hand the world to you, personally; I’ll delete your name from any advice you give me before I look at it. So if you’re not allowed to mention your own name, what general policy do you want me to follow?
Let’s suppose that the al-Qaeda programmers are brilliant enough to have a realistic chance of not only creating humanity’s first Artificial Intelligence but also solving the technical side of the FAI problem. Humanity is not automatically screwed. We’re postulating some extraordinary terrorists. They didn’t fall off the first cliff they encountered on the technical side of Friendly AI. They are cautious enough and scared enough to double-check themselves. They are rational enough to avoid tempting fallacies, and extract themselves from mistakes of the existing literature. The al-Qaeda programmers will not set down Four Great Moral Principles, not if they have enough intelligence to solve the technical problems of Friendly AI. The terrorists have studied evolutionary psychology and Bayesian decision theory and many other sciences. If we postulate such extraordinary terrorists, perhaps we can go one step further, and postulate terrorists with moral caution, and a sense of historical perspective? We will assume that the terrorists still have all the standard al-Qaeda morals; they would reduce Israel and the United States to ash, they would subordinate women to men. Still, is humankind screwed?
Let us suppose that the al-Qaeda programmers possess a deep personal fear of screwing up humankind’s bright future, in which Islam conquers the United States and then spreads across stars and galaxies. The terrorists know they are not wise. They do not know that they are evil, remorseless, stupid terrorists, the incarnation of All That Is Bad; people like that live in the United States. They are nice people, by their lights. They have enough caution not to simply fall off the first cliff in Friendly AI. They don’t want to screw up the future of Islam, or hear future Muslim scholars scream in horror on contemplating their AI. So they try to set down precautions and safeguards, to keep themselves from screwing up.
One day, one of the terrorist programmers says: “Here’s an interesting thought experiment. Suppose there were an atheistic American Jew, writing a superintelligence; what advice would we give him, to make sure that even one so steeped in wickedness does not ruin the future of Islam? Let us follow that advice ourselves, for we too are sinners.” And another terrorist on the project team says: “Tell him to study the holy Qur’an, and diligently implement what is found there.” And another says: “It was specified that he was an atheistic American Jew, he’d never take that advice. The point of the Coherent Extrapolated Volition thought experiment is to search for general heuristics strong enough to leap out of really fundamental errors, the errors we’re making ourselves, but don’t know about. What if he should interpret the Qur’an wrongly?” And another says: “If we find any truly general advice, the argument to persuade the atheistic American Jew to accept it would be to point out that it is the same advice he would want us to follow.” And another says: “But he is a member of the Great Satan; he would only write an AI that would crush Islam.” And another says: “We necessarily postulate an atheistic Jew of exceptional caution and rationality, as otherwise his AI would tile the solar system with American music videos. I know no one like that would be an atheistic Jew, but try to follow the thought experiment.”
I ask myself what advice I would give to terrorists, if they were programming a superintelligence and honestly wanted not to screw it up, and then that is the advice I follow myself.
The terrorists, I think, would advise me not to trust the self of this passing moment, but try to extrapolate an Eliezer who knew more, thought faster, were more the person I wished I were, had grown up farther together with humanity. Such an Eliezer might be able to leap out of his fundamental errors. And the terrorists, still fearing that I bore too deeply the stamp of my mistakes, would advise me to include all the world in my extrapolation, being unable to advise me to include only Islam.
But perhaps the terrorists are still worried; after all, only a quarter of the world is Islamic. So they would advise me to extrapolate out to medium-distance, even against the force of muddled short-distance opposition, far enough to reach (they think) the coherence of all seeing the light of Islam. What about extrapolating out to long-distance volitions? I think the terrorists and I would look at each other, and shrug helplessly, and leave it up to our medium-distance volitions to decide. I can see turning the world over to an incomprehensible volition, but I would want there to be a comprehensible reason. Otherwise it is hard for me to remember why I care.
Suppose we filter out all the AI projects run by Dennises who just want to take over the world, and all the AI projects without the moral caution to fear themselves flawed, leaving only those AI projects that would prefer not to create a motive for present-day humans to fight over the initial conditions of the AI. Do these remaining AI projects have anything to fight over? This is an interesting question, and I honestly don’t know. In the real world there are currently only a handful of AI projects that might dabble. To the best of my knowledge, there isn’t more than one project that rises to the challenge of moral caution, let alone rises to the challenge of FAI theory, so I don’t know if two such projects would find themselves unable to agree. I think we would probably agree that we didn’t know whether we had anything to fight over, and as long as we didn’t know, we could agree not to care. A determined altruist can always find a way to cooperate on the Prisoner’s Dilemma.