Objection 1: it seems to me that any AGI that was set to maximize a “Friendly” utility function would be extraordinarily dangerous.
Yes, Friendliness is hard, and that means that even the most dedicated team might screw it up. The point is that not trying as hard as you can to build Friendly AI is even worse, because then you almost certainly get uFAI. At least by trying to build FAI, we’ve got some chance of winning.
So this objection really just punts to objection #2, about tool-AGI, as the last paragraph here seems to indicate.
For certain values of “extraordinarily dangerous”, that is an excellent rebuttal to the objection. However, as I am sure you are aware, there are many possible values of “extraordinarily dangerous”. If I may present a plausible argument:
Let us declare a mind-dead universe (one with no agents) as having utility zero. It seems intuitive that working to build FAI decreases the probability of human extinction. However, a true uFAI (like a paperclip-maximizer) is hardly our only problem. A worse problem would be semi-FAI, that is, an AI which does not wipe out all of humanity, but does produce a world state-which is worse than a mind-dead universe. As the SI decreases probability of uFAI, it increases probability of semi-FAI.
Will_Newsome, myself, and probably several other users have mentioned such possibilities.
We’d need a pretty specific kind of “semi-FAI” to create an outcome worse than utility 0, so I’d prefer a term like eAI (“evil AI”) for an AI that produces a world state worse than utility 0.
So: Is eAI more probable given (1) the first AGIs are created by people explicitly aiming for Friendliness, or given (2) the first AGIs are not created by people explicitly aiming for Friendliness?
First, I prefer your terminology to my own. I had internally been calling such AIs sAIs (sadistic Artificial Intelligence). The etymology is chosen for a very specific reason. However, eAI is most satisfactory.
Second, I do apologize if I am being excessively naive. However, I must confess, I was rather convinced by Yudkowsky’s argumentation about such matters. I tentatively hold, and believe it is the SI’s position, that an uFAI is almost certain to produce human extinction. Again, I would like to call this utility 0.
Third, I do tentatively hold that p(eAI | attempt towards FAI) > p(eAI | attempt towards AGI).
I am well aware that it is neither your duty nor the duty of the SI to respond to every minor criticism. However, if you have a reason to believe that my third point is incorrect, I would very much like to be made aware of it.
((A possible counterargument to my position: any proper attempt to reduce the chance of human extinction does increase the probability of a world of negative-utility, generally speaking. If my argument too closely resembles negative utilitarianism, then I revoke my argument.))
I tentatively hold, and believe it is the SI’s position, that an uFAI is almost certain to produce human extinction. Again, I would like to call this utility 0.
I hold with timtyler that a uFAI probably wouldn’t kill off all humanity. There’s little benefit to doing so and it potentially incurs a huge cost by going against the wishes of potential simulators, counterfactual FAIs (acausally) (not necessarily human-designed, just designed by an entity or entities that cared about persons in general), hidden AGIs (e.g. alien AGIs that have already swept by the solar system but are making it look as if they haven’t (note that this resolves the Fermi paradox)), et cetera. Such a scenario is still potentially a huge loss relative to FAI scenarios, but it implies that AGI isn’t a sure-thing existential catastrophe, and is perhaps less likely to lead to human extinction than certain other existential risks. If for whatever reason you think that humans are easily satisfied, then uFAI is theoretically just as good as FAI; but that really doesn’t seem plausible to me. There might also be certain harm-minimization moral theories that would be ambivalent between uFAI and FAI. But I think most moral theories would still place huge emphasis on FAI versus uFAI even if uFAI would actually be human-friendly in some local sense.
Given such considerations, I’m not sure whether uFAI or wannabe-FAI is more likely to lead to evil AI. Wannabe-FAI is more likely to have a stable goal system that is immune to certain self-modifications and game theoretic pressures that a less stable AI or a coalition of splintered AI successors would be relatively influenced by. E.g. a wannabe-FAI might disregard certain perceived influences (even influences from hypothetical FAIs that it was considering self-modifying into, or acausal influences generally) as “blackmail” or as otherwise morally requiring ignorance. This could lead to worse outcomes than a messier, more adaptable, more influence-able uFAI. One might want to avoid letting a single wannabe-FAI out into the world which could take over existing computing infrastructure and thus halt most AI work but would be self-limiting in some important respect (e.g. due to sensitivity to Pascalian considerations due to a formal, consistent decision theory, of the sort that a less formal AI architecture wouldn’t have trouble with). Such a scenario could be worse than one where a bunch of evolving AGIs with diverse initial goal systems get unleashed and compete with each other, keeping self-limiting AIs from reaching evil or at least relatively suboptimal singleton status. And so on; one could list considerations like this for a long time. At any rate I don’t think there are any obviously overwhelming answers. Luckily in the meantime there are meta-level strategies like intelligence amplification (in a very broad sense) which could make such analysis more tractable.
(The above analysis is written from what I think is a SingInst-like perspective, i.e., hard takeoff is plausible, FAI as defined by Eliezer is especially desirable, et cetera. I don’t necessarily agree with such a perspective, and my analysis could fail given different background assumptions.)
Most math kills you quietly, neatly, and cleanly, unless the apparent obstacles to distant timeless trade are overcome in practice and we get a certain kind of “luck” on how a vast net of mostly-inhuman timeless trades sum out, in which case we get an unknown fixed selection from some subjective probability distribution over “fate much worse than death” to “death” to “fate much better than death but still much worse than FAI”. I don’t spend much time talking about this on LW because timeless trade speculation eats people’s brains and doesn’t produce any useful outputs from the consumption; only decision theorists whose work is plugging into FAI theory need to think about timeless trade, and I wish everyone else would shut up about the subject on grounds of sheer cognitive unproductivity, not to mention the horrid way it sounds from the perspective of traditional skeptics (and not wholly unjustifiably so). (I have expressed this opinion in the past whenever I hear LWers talking about timeless trade; it is not limited to Newsome, though IIRC he has an unusual case of undue optimism about outcomes of timeless trade, owing to theological influences that I understand timeless trade speculations helped exacerbate his vulnerability to.)
I don’t spend much time talking about this on LW because timeless trade speculation eats people’s brains and doesn’t produce any useful outputs from the consumption; only decision theorists whose work is plugging into FAI theory need to think about timeless trade, and I wish everyone else would shut up about the subject on grounds of sheer cognitive unproductivity
I don’t trust any group who wishes to create or make efforts towards influencing the creation of a superintelligence when they try to suppress discussion of the very decision theory that the superintelligence will implement. How such an agent interacts with the concept of acausal trade completely and fundamentally alters the way it can be expected to behave. That is the kind of thing that needs to be disseminated among an academic community, digested and understood in depth. It is not something to trust to an isolated team, with all the vulnerability to group think that entails.
If someone were to announce credibly “We’re creating a GAI. Nobody else but us is allowed to even think about what it is going to do. Just trust us, it’s Friendly.” then the appropriate response is to shout “Watch out! It’s a dangerous crackpot! Stop him before he takes over the world and potentially destroys us all!” And make no mistake, if this kind of attempt at suppression were taken by anyone remotely near developing an FAI theory that is what it would entail. Fortunately at this point it is still at the “Mostly Harmless” stage.
and doesn’t produce any useful outputs from the consumption
I don’t believe you. At least, it produces outputs at least as useful and interesting as all other discussions of decision theory produce. There are plenty of curious avenues to explore on the subject and fascinating implications and strategies that are at least worth considering.
Sure, the subject may deserve a warning “Do not consider this topic if you are psychologically unstable or have reason to believe that you are particularly vulnerable to distress or fundamental epistemic damage by the consideration of abstract concepts.”
not to mention the horrid way it sounds from the perspective of traditional skeptics (and not wholly unjustifiably so).
If this were the real reason for Eliezer’s objection I would not be troubled by his attitude. I would still disagree—the correct approach is not to try to suppress all discussion by other people of the subject but rather to apply basic political caution and not comment on it oneself (or allow anyone within one’s organisation to do so.)
If someone were to announce credibly “We’re creating a GAI. Nobody else but us is allowed to even think about what it is going to do. Just trust us, it’s Friendly.” then the appropriate response is to shout “Watch out! It’s a dangerous crackpot! Stop him before he takes over the world and potentially destroys us all!” And make no mistake, if this kind of attempt at suppression were taken by anyone remotely near developing an FAI theory that is what it would entail. Fortunately at this point it is still at the “Mostly Harmless” stage.
I don’t see how anyone could credibly announce that. The announcement radiates crackpottery.
Most math kills you quietly, neatly, and cleanly, unless the apparent obstacles to distant timeless trade are overcome in practice
Will mentioned a couple of other possible ways in which UFAI fails to kill off humanity, besides distant timeless trade. (BTW I think the current standard term for this is “acausal trade” which incorporates the idea of trading across possible worlds as well as across time.) Although perhaps “hidden AGIs” is unlikely and you consider “potential simulators” to be covered under “distant timeless trade”.
I don’t spend much time talking about this on LW because timeless trade speculation eats people’s brains and doesn’t produce any useful outputs from the consumption; only decision theorists whose work is plugging into FAI theory need to think about timeless trade
The idea is relevant not just for actually building FAI, but also for deciding strategy (ETA: for example how much chance of creating UFAI should we accept in order to build FAI). See here for an example of such discussion (between people who perhaps you think are saner than Will Newsome).
not to mention the horrid way it sounds from the perspective of traditional skeptics
I agreed with this, but it’s not clear what we should do about it (e.g., whether we should stop talking about it), given the strategic relevance.
The idea is relevant not just for actually building FAI, but also for deciding strategy
And also relevant, I hasten to point out, for solving moral philosophy. I want to be morally justified whether or not I’m involved with an FAI team and whether or not I’m in a world where the Singularity is more than just a plot device. Acausal influence elucidates decision theory, and decision theory elucidates morality.
Will mentioned a couple of other possible ways in which UFAI fails to kill off humanity, besides distant timeless trade. [...] Although perhaps “hidden AGIs” is unlikely and you consider “potential simulators” to be covered under “distant timeless trade”.
This is considered unlikely ’round these parts, but one should also consider God, Who is alleged by some to be omnipotent and Who might prefer to keep humans around. Insofar as such a God is metaphysically necessary this is mechanistically but not phenomenologically distinct from plain “hidden AGI”.
(IIRC he has an unusual case of undue optimism about outcomes of timeless trade, owing to theological influences that I understand timeless trade speculations helped exacerbate his vulnerability to.)
The theology and the acausal trade stuff are completely unrelated; they both have to do with decision theory, but that’s it. I also don’t think my thoughts about acausal trade differ in any substantial way from those of Wei Dai or Vladimir Nesov. So even assuming that I’m totally wrong for granting theism-like-ideas non-negligible probability, the discussion of acausal influence doesn’t seem to have directly contributed to my brain getting eaten. That said, I agree with Eliezer that it’s generally not worth speculating about, except possibly in the context of decision theory or, to a very limited extent, singularity strategy.
Oh yes they are. One can leave you penniless and other scarred for life. If you’re doing them very wrong, of course. Same with thinking about acausal trade.
Although I am extremely interested in your theories, it would take significant time and energy for me to reformulate my ideas in such a way as to satisfactorily incorporate the points you are making. As such, for purposes of this discussion, I shall be essentially speaking as if I had not been made aware of the post which you just made.
However, if you could clarify a minor point: am I mistaken in my belief that it is the SI’s position that uFAI will probably result in human extinction? Or, have they incorporated the points you are making into their theories?
I know that Anna at least has explicitly taken such considerations into account and agrees with them to some extent. Carl likely has as well. I don’t know about Eliezer or Luke, I’ll ask Luke next time I see him. ETA: That is, I know Carl and Anna have considered the points in my first paragraph, but I don’t know how thoroughly they’ve explored the classes of scenarios like those in my second paragraph which are a lot more speculative.
Eliezer replied here, but it seems he’s only addressed one part of my argument thus far. I personally think the alien superintelligence variation of my argument, which Eliezer didn’t address, is the strongest, because it’s well-grounded in known physical facts, unlike simulation-based speculation.
Third, I do tentatively hold that p(eAI | attempt towards FAI) > p(eAI | attempt towards AGI).
Clearly, this is possible. If an FAI team comes to think this is true during development, I hope they’ll reconsider their plans. But can you provide, or link me to, some reasons for suspecting that p(eAI | attempt towards FAI) > p(eAI | attempt towards AGI)?
I tentatively hold, and believe it is the SI’s position, that an uFAI is almost certain to produce human extinction.
When humans are a critical clue to the couse of evolution on the planet? Surely they would repeatedly reconstruct and rerun history to gain clues about the forms of alien that they might encounter—if they held basic universal instrumental values and didn’t have too short a planning horizon.
Yes, Friendliness is hard, and that means that even the most dedicated team might screw it up. The point is that not trying as hard as you can to build Friendly AI is even worse, because then you almost certainly get uFAI. At least by trying to build FAI, we’ve got some chance of winning.
So this objection really just punts to objection #2, about tool-AGI, as the last paragraph here seems to indicate.
For certain values of “extraordinarily dangerous”, that is an excellent rebuttal to the objection. However, as I am sure you are aware, there are many possible values of “extraordinarily dangerous”. If I may present a plausible argument:
Let us declare a mind-dead universe (one with no agents) as having utility zero. It seems intuitive that working to build FAI decreases the probability of human extinction. However, a true uFAI (like a paperclip-maximizer) is hardly our only problem. A worse problem would be semi-FAI, that is, an AI which does not wipe out all of humanity, but does produce a world state-which is worse than a mind-dead universe. As the SI decreases probability of uFAI, it increases probability of semi-FAI.
Will_Newsome, myself, and probably several other users have mentioned such possibilities.
We’d need a pretty specific kind of “semi-FAI” to create an outcome worse than utility 0, so I’d prefer a term like eAI (“evil AI”) for an AI that produces a world state worse than utility 0.
So: Is eAI more probable given (1) the first AGIs are created by people explicitly aiming for Friendliness, or given (2) the first AGIs are not created by people explicitly aiming for Friendliness?
First, I prefer your terminology to my own. I had internally been calling such AIs sAIs (sadistic Artificial Intelligence). The etymology is chosen for a very specific reason. However, eAI is most satisfactory.
Second, I do apologize if I am being excessively naive. However, I must confess, I was rather convinced by Yudkowsky’s argumentation about such matters. I tentatively hold, and believe it is the SI’s position, that an uFAI is almost certain to produce human extinction. Again, I would like to call this utility 0.
Third, I do tentatively hold that p(eAI | attempt towards FAI) > p(eAI | attempt towards AGI).
I am well aware that it is neither your duty nor the duty of the SI to respond to every minor criticism. However, if you have a reason to believe that my third point is incorrect, I would very much like to be made aware of it.
((A possible counterargument to my position: any proper attempt to reduce the chance of human extinction does increase the probability of a world of negative-utility, generally speaking. If my argument too closely resembles negative utilitarianism, then I revoke my argument.))
I hold with timtyler that a uFAI probably wouldn’t kill off all humanity. There’s little benefit to doing so and it potentially incurs a huge cost by going against the wishes of potential simulators, counterfactual FAIs (acausally) (not necessarily human-designed, just designed by an entity or entities that cared about persons in general), hidden AGIs (e.g. alien AGIs that have already swept by the solar system but are making it look as if they haven’t (note that this resolves the Fermi paradox)), et cetera. Such a scenario is still potentially a huge loss relative to FAI scenarios, but it implies that AGI isn’t a sure-thing existential catastrophe, and is perhaps less likely to lead to human extinction than certain other existential risks. If for whatever reason you think that humans are easily satisfied, then uFAI is theoretically just as good as FAI; but that really doesn’t seem plausible to me. There might also be certain harm-minimization moral theories that would be ambivalent between uFAI and FAI. But I think most moral theories would still place huge emphasis on FAI versus uFAI even if uFAI would actually be human-friendly in some local sense.
Given such considerations, I’m not sure whether uFAI or wannabe-FAI is more likely to lead to evil AI. Wannabe-FAI is more likely to have a stable goal system that is immune to certain self-modifications and game theoretic pressures that a less stable AI or a coalition of splintered AI successors would be relatively influenced by. E.g. a wannabe-FAI might disregard certain perceived influences (even influences from hypothetical FAIs that it was considering self-modifying into, or acausal influences generally) as “blackmail” or as otherwise morally requiring ignorance. This could lead to worse outcomes than a messier, more adaptable, more influence-able uFAI. One might want to avoid letting a single wannabe-FAI out into the world which could take over existing computing infrastructure and thus halt most AI work but would be self-limiting in some important respect (e.g. due to sensitivity to Pascalian considerations due to a formal, consistent decision theory, of the sort that a less formal AI architecture wouldn’t have trouble with). Such a scenario could be worse than one where a bunch of evolving AGIs with diverse initial goal systems get unleashed and compete with each other, keeping self-limiting AIs from reaching evil or at least relatively suboptimal singleton status. And so on; one could list considerations like this for a long time. At any rate I don’t think there are any obviously overwhelming answers. Luckily in the meantime there are meta-level strategies like intelligence amplification (in a very broad sense) which could make such analysis more tractable.
(The above analysis is written from what I think is a SingInst-like perspective, i.e., hard takeoff is plausible, FAI as defined by Eliezer is especially desirable, et cetera. I don’t necessarily agree with such a perspective, and my analysis could fail given different background assumptions.)
To reply to Wei Dai’s incoming link:
Most math kills you quietly, neatly, and cleanly, unless the apparent obstacles to distant timeless trade are overcome in practice and we get a certain kind of “luck” on how a vast net of mostly-inhuman timeless trades sum out, in which case we get an unknown fixed selection from some subjective probability distribution over “fate much worse than death” to “death” to “fate much better than death but still much worse than FAI”. I don’t spend much time talking about this on LW because timeless trade speculation eats people’s brains and doesn’t produce any useful outputs from the consumption; only decision theorists whose work is plugging into FAI theory need to think about timeless trade, and I wish everyone else would shut up about the subject on grounds of sheer cognitive unproductivity, not to mention the horrid way it sounds from the perspective of traditional skeptics (and not wholly unjustifiably so). (I have expressed this opinion in the past whenever I hear LWers talking about timeless trade; it is not limited to Newsome, though IIRC he has an unusual case of undue optimism about outcomes of timeless trade, owing to theological influences that I understand timeless trade speculations helped exacerbate his vulnerability to.)
I don’t trust any group who wishes to create or make efforts towards influencing the creation of a superintelligence when they try to suppress discussion of the very decision theory that the superintelligence will implement. How such an agent interacts with the concept of acausal trade completely and fundamentally alters the way it can be expected to behave. That is the kind of thing that needs to be disseminated among an academic community, digested and understood in depth. It is not something to trust to an isolated team, with all the vulnerability to group think that entails.
If someone were to announce credibly “We’re creating a GAI. Nobody else but us is allowed to even think about what it is going to do. Just trust us, it’s Friendly.” then the appropriate response is to shout “Watch out! It’s a dangerous crackpot! Stop him before he takes over the world and potentially destroys us all!” And make no mistake, if this kind of attempt at suppression were taken by anyone remotely near developing an FAI theory that is what it would entail. Fortunately at this point it is still at the “Mostly Harmless” stage.
I don’t believe you. At least, it produces outputs at least as useful and interesting as all other discussions of decision theory produce. There are plenty of curious avenues to explore on the subject and fascinating implications and strategies that are at least worth considering.
Sure, the subject may deserve a warning “Do not consider this topic if you are psychologically unstable or have reason to believe that you are particularly vulnerable to distress or fundamental epistemic damage by the consideration of abstract concepts.”
If this were the real reason for Eliezer’s objection I would not be troubled by his attitude. I would still disagree—the correct approach is not to try to suppress all discussion by other people of the subject but rather to apply basic political caution and not comment on it oneself (or allow anyone within one’s organisation to do so.)
I don’t see how anyone could credibly announce that. The announcement radiates crackpottery.
Will mentioned a couple of other possible ways in which UFAI fails to kill off humanity, besides distant timeless trade. (BTW I think the current standard term for this is “acausal trade” which incorporates the idea of trading across possible worlds as well as across time.) Although perhaps “hidden AGIs” is unlikely and you consider “potential simulators” to be covered under “distant timeless trade”.
The idea is relevant not just for actually building FAI, but also for deciding strategy (ETA: for example how much chance of creating UFAI should we accept in order to build FAI). See here for an example of such discussion (between people who perhaps you think are saner than Will Newsome).
I agreed with this, but it’s not clear what we should do about it (e.g., whether we should stop talking about it), given the strategic relevance.
And also relevant, I hasten to point out, for solving moral philosophy. I want to be morally justified whether or not I’m involved with an FAI team and whether or not I’m in a world where the Singularity is more than just a plot device. Acausal influence elucidates decision theory, and decision theory elucidates morality.
To clarify what I assume to be Eliezers point: “here there be basilisks, take it somewhere less public”
There only be basilisks if you don’t accept SSA or assume that utility scales superlinearly with computations performed.
There’s more than one kind. For obvious reasons I wont elaborate.
This is considered unlikely ’round these parts, but one should also consider God, Who is alleged by some to be omnipotent and Who might prefer to keep humans around. Insofar as such a God is metaphysically necessary this is mechanistically but not phenomenologically distinct from plain “hidden AGI”.
For the LW public:
The theology and the acausal trade stuff are completely unrelated; they both have to do with decision theory, but that’s it. I also don’t think my thoughts about acausal trade differ in any substantial way from those of Wei Dai or Vladimir Nesov. So even assuming that I’m totally wrong for granting theism-like-ideas non-negligible probability, the discussion of acausal influence doesn’t seem to have directly contributed to my brain getting eaten. That said, I agree with Eliezer that it’s generally not worth speculating about, except possibly in the context of decision theory or, to a very limited extent, singularity strategy.
But it’s fun! Why only a select group of people is to be allowed to have it?
Because it’s dangerous.
So is mountain skiing, starting new companies, learning chemistry, and entering into relashionships.
Mountain skiing maybe, depending on the mountain in question, chemistry only if you’re doing it very wrong, the others not.
Oh yes they are. One can leave you penniless and other scarred for life. If you’re doing them very wrong, of course. Same with thinking about acausal trade.
Although I am extremely interested in your theories, it would take significant time and energy for me to reformulate my ideas in such a way as to satisfactorily incorporate the points you are making. As such, for purposes of this discussion, I shall be essentially speaking as if I had not been made aware of the post which you just made.
However, if you could clarify a minor point: am I mistaken in my belief that it is the SI’s position that uFAI will probably result in human extinction? Or, have they incorporated the points you are making into their theories?
I know that Anna at least has explicitly taken such considerations into account and agrees with them to some extent. Carl likely has as well. I don’t know about Eliezer or Luke, I’ll ask Luke next time I see him. ETA: That is, I know Carl and Anna have considered the points in my first paragraph, but I don’t know how thoroughly they’ve explored the classes of scenarios like those in my second paragraph which are a lot more speculative.
Eliezer replied here, but it seems he’s only addressed one part of my argument thus far. I personally think the alien superintelligence variation of my argument, which Eliezer didn’t address, is the strongest, because it’s well-grounded in known physical facts, unlike simulation-based speculation.
Clearly, this is possible. If an FAI team comes to think this is true during development, I hope they’ll reconsider their plans. But can you provide, or link me to, some reasons for suspecting that p(eAI | attempt towards FAI) > p(eAI | attempt towards AGI)?
Some relevant posts/comments:
http://lesswrong.com/lw/ajm/ai_risk_and_opportunity_a_strategic_analysis/5ylx
http://lesswrong.com/lw/axj/the_ai_design_space_near_the_fai_draft/
http://lesswrong.com/lw/axj/the_ai_design_space_near_the_fai_draft/623p
When humans are a critical clue to the couse of evolution on the planet? Surely they would repeatedly reconstruct and rerun history to gain clues about the forms of alien that they might encounter—if they held basic universal instrumental values and didn’t have too short a planning horizon.
Sadly, this seems right to me. The easiest way to build an eAI is to try to build an FAI and get the sign of something wrong.