As an aside, I don’t believe in malicious superintelligences. They are even more unlikely than automatically Friendly ones. An unFriendlyAI is dangerous as a side effect of its abilities and goals, not because it is, or is even likely to be, malicious.
I would assign a malicious superintelligence a higher probability than would pure entropy over the space of superintelligences due to the chance of something broken coming out of military research. I would assign this a relatively low likelihood. I am not certain whether I would assign it a higher or lower likelihood than “automatically Friendly ones”—it depends on what you mean by that. I would assign it a higher probability than an AI built without any thought to friendliness being friendly, given that it was built with thought to maliciousness, and there are perhaps a broader range of behaviors we might label “malicious”.
By an “automatically Friendly AI” I simply meant one that was Friendly without explicit programming for friendliness. I think that would be more likely than a malicious AI because there are good, rational reasons to be “friendly” (benefits from trade and so on) in the absence of reasons not to be. I can see no rational reason to be malicious—humans that are malicious are usually so for reasons (sadism, revenge, and so on) that I can’t see someone programming into an AI.
good, rational reasons to be “friendly” (benefits from trade and so on)
Is that why humans have been so friendly to the non-human inhabitants of lands we want to develop? Humans are likely to have almost nothing to offer an advanced super-intelligence, just as an ant hill has almost nothing to offer me (except as an opportunity to destroy it and plant more grass).
There are good, rational reasons to be friendly in the short term.
The rational reason to be unfriendly in the long term is that sufficiently advanced optimizing processes are powerful, and outcomes that maximize the utility of one agent are not likely to also maximize the utility of other agents with different goals.
there are good, rational reasons to be “friendly” (benefits from trade and so on)
that is a very dangerous statement. A superintelligent AI doesn’t care about you one bit. If it is (unlikely) in the situation where it needs something from you that it cannot take with violence, it may offer to trade, but I would give high confidence of it shooting you in the back and taking the goods the moment you let your guard down.
In that case the word arguably can’t be applied to people either, as Eliezer pointed out in this post. The only time people actively “desire to harm another” is when (they believe that) they are punishing the other according to what we would call TDT/UDT. Of course, the same motive applies to an AI even one whose terminal goals are indifferent to humans.
In that case the word arguably can’t be applied to people either, as Eliezer pointed out in this post.
Yes, it can. People really are malicious sometimes. That we are biased to attribute malice to enemies even when none is present does not rule out malice existing in people.
The only time people actively “desire to harm another” is when (they believe that) they are punishing the other according to what we would call TDT/UDT.
That just isn’t true. And even if it were the fact that a TDT or UDT agent may do a similar action does not mean that a person is not feeling malice.
Of course, the same motive applies to an AI even one whose terminal goals are indifferent to humans.
Yes, not that they would need to apply that reasoning with humans. You don’t need to punish people when you can just consume them as resources.
Whether they are real or not, malicious things are common in fantasy. I find breaking the laws of thermodynamics or anti-reductionism to be much more immersion-busting.
That’d be a good argument that explicitly malicious AI is technically simpler than Friendly AI, but technical complexity isn’t the only constraint on the likelihood of AI of a particular type arising. I’d consider it extremely unlikely that any development team would choose to inculcate a generally malicious value system in their charges; the AI research community is, fortunately, not made up of Bond villains. It doesn’t even work as a mutually-assured-destruction ploy, since the threat isn’t widely recognized.
Situational malice seems more plausible (in military applications, for example), but I’d call that a special case of ordinary unFriendliness.
As an aside, I don’t believe in malicious superintelligences. They are even more unlikely than automatically Friendly ones. An unFriendlyAI is dangerous as a side effect of its abilities and goals, not because it is, or is even likely to be, malicious.
I would assign a malicious superintelligence a higher probability than would pure entropy over the space of superintelligences due to the chance of something broken coming out of military research. I would assign this a relatively low likelihood. I am not certain whether I would assign it a higher or lower likelihood than “automatically Friendly ones”—it depends on what you mean by that. I would assign it a higher probability than an AI built without any thought to friendliness being friendly, given that it was built with thought to maliciousness, and there are perhaps a broader range of behaviors we might label “malicious”.
By an “automatically Friendly AI” I simply meant one that was Friendly without explicit programming for friendliness. I think that would be more likely than a malicious AI because there are good, rational reasons to be “friendly” (benefits from trade and so on) in the absence of reasons not to be. I can see no rational reason to be malicious—humans that are malicious are usually so for reasons (sadism, revenge, and so on) that I can’t see someone programming into an AI.
Is that why humans have been so friendly to the non-human inhabitants of lands we want to develop? Humans are likely to have almost nothing to offer an advanced super-intelligence, just as an ant hill has almost nothing to offer me (except as an opportunity to destroy it and plant more grass).
There are good, rational reasons to be friendly in the short term.
The rational reason to be unfriendly in the long term is that sufficiently advanced optimizing processes are powerful, and outcomes that maximize the utility of one agent are not likely to also maximize the utility of other agents with different goals.
that is a very dangerous statement. A superintelligent AI doesn’t care about you one bit. If it is (unlikely) in the situation where it needs something from you that it cannot take with violence, it may offer to trade, but I would give high confidence of it shooting you in the back and taking the goods the moment you let your guard down.
I think you’re using a non-standard definition of ‘malicious’.
He isn’t. “Desire to do harm to another”. This is distinct from callous indifference.
In that case the word arguably can’t be applied to people either, as Eliezer pointed out in this post. The only time people actively “desire to harm another” is when (they believe that) they are punishing the other according to what we would call TDT/UDT. Of course, the same motive applies to an AI even one whose terminal goals are indifferent to humans.
Yes, it can. People really are malicious sometimes. That we are biased to attribute malice to enemies even when none is present does not rule out malice existing in people.
That just isn’t true. And even if it were the fact that a TDT or UDT agent may do a similar action does not mean that a person is not feeling malice.
Yes, not that they would need to apply that reasoning with humans. You don’t need to punish people when you can just consume them as resources.
Whether they are real or not, malicious things are common in fantasy. I find breaking the laws of thermodynamics or anti-reductionism to be much more immersion-busting.
I disagree.
Both friendly and explicitly malicious AIs need to understand what sentience is.
In addition, a malicious AI needs to know some means of torturing them.
In addition, a friendly AI needs to know how to identify and preserve existing sentient beings, and all human values (highly nontrivial).
That’d be a good argument that explicitly malicious AI is technically simpler than Friendly AI, but technical complexity isn’t the only constraint on the likelihood of AI of a particular type arising. I’d consider it extremely unlikely that any development team would choose to inculcate a generally malicious value system in their charges; the AI research community is, fortunately, not made up of Bond villains. It doesn’t even work as a mutually-assured-destruction ploy, since the threat isn’t widely recognized.
Situational malice seems more plausible (in military applications, for example), but I’d call that a special case of ordinary unFriendliness.
I could easily see military application + bug in the safeguards ⇒ malicious AI.
Not as likely as ordinary unfriendliness, I think, but certainly plausible.