lukeprog comments on Holden Karnofsky’s Singularity Institute Objection 1

lukeprog 12 May 2012 3:12 UTC
20 points
0

Objection 1: it seems to me that any AGI that was set to maximize a “Friendly” utility function would be extraordinarily dangerous.

Yes, Friendliness is hard, and that means that even the most dedicated team might screw it up. The point is that not trying as hard as you can to build Friendly AI is even worse, because then you almost certainly get uFAI. At least by trying to build FAI, we’ve got some chance of winning.

So this objection really just punts to objection #2, about tool-AGI, as the last paragraph here seems to indicate.
What links here?
- timtyler's comment on Thoughts on the Singularity Institute (SI) by HoldenKarnofsky (13 May 2012 0:00 UTC; 1 point)
- J_Taylor 12 May 2012 21:32 UTC
  6 points
  0
  Parent
  For certain values of “extraordinarily dangerous”, that is an excellent rebuttal to the objection. However, as I am sure you are aware, there are many possible values of “extraordinarily dangerous”. If I may present a plausible argument:
  
  Let us declare a mind-dead universe (one with no agents) as having utility zero. It seems intuitive that working to build FAI decreases the probability of human extinction. However, a true uFAI (like a paperclip-maximizer) is hardly our only problem. A worse problem would be semi-FAI, that is, an AI which does not wipe out all of humanity, but does produce a world state-which is worse than a mind-dead universe. As the SI decreases probability of uFAI, it increases probability of semi-FAI.
  
  Will_Newsome, myself, and probably several other users have mentioned such possibilities.
  - lukeprog 13 May 2012 0:16 UTC
    8 points
    0
    Parent
    We’d need a pretty specific kind of “semi-FAI” to create an outcome worse than utility 0, so I’d prefer a term like eAI (“evil AI”) for an AI that produces a world state worse than utility 0.
    
    So: Is eAI more probable given (1) the first AGIs are created by people explicitly aiming for Friendliness, or given (2) the first AGIs are not created by people explicitly aiming for Friendliness?
    What links here?
    lukeprog's comment on Lone Genius Bias and Returns on Additional Researchers by ChrisHallquist (1 Nov 2013 2:04 UTC; 58 points)
    - J_Taylor 13 May 2012 7:02 UTC
      3 points
      0
      Parent
      First, I prefer your terminology to my own. I had internally been calling such AIs sAIs (sadistic Artificial Intelligence). The etymology is chosen for a very specific reason. However, eAI is most satisfactory.
      
      Second, I do apologize if I am being excessively naive. However, I must confess, I was rather convinced by Yudkowsky’s argumentation about such matters. I tentatively hold, and believe it is the SI’s position, that an uFAI is almost certain to produce human extinction. Again, I would like to call this utility 0.
      
      Third, I do tentatively hold that p(eAI | attempt towards FAI) > p(eAI | attempt towards AGI).
      
      I am well aware that it is neither your duty nor the duty of the SI to respond to every minor criticism. However, if you have a reason to believe that my third point is incorrect, I would very much like to be made aware of it.
      
      ((A possible counterargument to my position: any proper attempt to reduce the chance of human extinction does increase the probability of a world of negative-utility, generally speaking. If my argument too closely resembles negative utilitarianism, then I revoke my argument.))
      - Will_Newsome 13 May 2012 15:10 UTC
        8 points
        0
        Parent
        
        I tentatively hold, and believe it is the SI’s position, that an uFAI is almost certain to produce human extinction. Again, I would like to call this utility 0.
        
        I hold with timtyler that a uFAI probably wouldn’t kill off all humanity. There’s little benefit to doing so and it potentially incurs a huge cost by going against the wishes of potential simulators, counterfactual FAIs (acausally) (not necessarily human-designed, just designed by an entity or entities that cared about persons in general), hidden AGIs (e.g. alien AGIs that have already swept by the solar system but are making it look as if they haven’t (note that this resolves the Fermi paradox)), et cetera. Such a scenario is still potentially a huge loss relative to FAI scenarios, but it implies that AGI isn’t a sure-thing existential catastrophe, and is perhaps less likely to lead to human extinction than certain other existential risks. If for whatever reason you think that humans are easily satisfied, then uFAI is theoretically just as good as FAI; but that really doesn’t seem plausible to me. There might also be certain harm-minimization moral theories that would be ambivalent between uFAI and FAI. But I think most moral theories would still place huge emphasis on FAI versus uFAI even if uFAI would actually be human-friendly in some local sense.
        
        Given such considerations, I’m not sure whether uFAI or wannabe-FAI is more likely to lead to evil AI. Wannabe-FAI is more likely to have a stable goal system that is immune to certain self-modifications and game theoretic pressures that a less stable AI or a coalition of splintered AI successors would be relatively influenced by. E.g. a wannabe-FAI might disregard certain perceived influences (even influences from hypothetical FAIs that it was considering self-modifying into, or acausal influences generally) as “blackmail” or as otherwise morally requiring ignorance. This could lead to worse outcomes than a messier, more adaptable, more influence-able uFAI. One might want to avoid letting a single wannabe-FAI out into the world which could take over existing computing infrastructure and thus halt most AI work but would be self-limiting in some important respect (e.g. due to sensitivity to Pascalian considerations due to a formal, consistent decision theory, of the sort that a less formal AI architecture wouldn’t have trouble with). Such a scenario could be worse than one where a bunch of evolving AGIs with diverse initial goal systems get unleashed and compete with each other, keeping self-limiting AIs from reaching evil or at least relatively suboptimal singleton status. And so on; one could list considerations like this for a long time. At any rate I don’t think there are any obviously overwhelming answers. Luckily in the meantime there are meta-level strategies like intelligence amplification (in a very broad sense) which could make such analysis more tractable.
        
        (The above analysis is written from what I think is a SingInst-like perspective, i.e., hard takeoff is plausible, FAI as defined by Eliezer is especially desirable, et cetera. I don’t necessarily agree with such a perspective, and my analysis could fail given different background assumptions.)
        What links here?
        Wei Dai's comment on Thoughts on the Singularity Institute (SI) by HoldenKarnofsky (13 May 2012 18:57 UTC; 9 points)
        Eliezer Yudkowsky 15 May 2012 19:51 UTC
        20 points
        0
        Parent
        To reply to Wei Dai’s incoming link:
        
        Most math kills you quietly, neatly, and cleanly, unless the apparent obstacles to distant timeless trade are overcome in practice and we get a certain kind of “luck” on how a vast net of mostly-inhuman timeless trades sum out, in which case we get an unknown fixed selection from some subjective probability distribution over “fate much worse than death” to “death” to “fate much better than death but still much worse than FAI”. I don’t spend much time talking about this on LW because timeless trade speculation eats people’s brains and doesn’t produce any useful outputs from the consumption; only decision theorists whose work is plugging into FAI theory need to think about timeless trade, and I wish everyone else would shut up about the subject on grounds of sheer cognitive unproductivity, not to mention the horrid way it sounds from the perspective of traditional skeptics (and not wholly unjustifiably so). (I have expressed this opinion in the past whenever I hear LWers talking about timeless trade; it is not limited to Newsome, though IIRC he has an unusual case of undue optimism about outcomes of timeless trade, owing to theological influences that I understand timeless trade speculations helped exacerbate his vulnerability to.)
        What links here?
        Will_Newsome's comment on Holden Karnofsky’s Singularity Institute Objection 1 by Paul Crowley (15 May 2012 22:00 UTC; 0 points)
        wedrifid 26 May 2012 3:39 UTC
        5 points
        0
        Parent
        
        I don’t spend much time talking about this on LW because timeless trade speculation eats people’s brains and doesn’t produce any useful outputs from the consumption; only decision theorists whose work is plugging into FAI theory need to think about timeless trade, and I wish everyone else would shut up about the subject on grounds of sheer cognitive unproductivity
        
        I don’t trust any group who wishes to create or make efforts towards influencing the creation of a superintelligence when they try to suppress discussion of the very decision theory that the superintelligence will implement. How such an agent interacts with the concept of acausal trade completely and fundamentally alters the way it can be expected to behave. That is the kind of thing that needs to be disseminated among an academic community, digested and understood in depth. It is not something to trust to an isolated team, with all the vulnerability to group think that entails.
        
        If someone were to announce credibly “We’re creating a GAI. Nobody else but us is allowed to even think about what it is going to do. Just trust us, it’s Friendly.” then the appropriate response is to shout “Watch out! It’s a dangerous crackpot! Stop him before he takes over the world and potentially destroys us all!” And make no mistake, if this kind of attempt at suppression were taken by anyone remotely near developing an FAI theory that is what it would entail. Fortunately at this point it is still at the “Mostly Harmless” stage.
        
        and doesn’t produce any useful outputs from the consumption
        
        I don’t believe you. At least, it produces outputs at least as useful and interesting as all other discussions of decision theory produce. There are plenty of curious avenues to explore on the subject and fascinating implications and strategies that are at least worth considering.
        
        Sure, the subject may deserve a warning “Do not consider this topic if you are psychologically unstable or have reason to believe that you are particularly vulnerable to distress or fundamental epistemic damage by the consideration of abstract concepts.”
        
        not to mention the horrid way it sounds from the perspective of traditional skeptics (and not wholly unjustifiably so).
        
        If this were the real reason for Eliezer’s objection I would not be troubled by his attitude. I would still disagree—the correct approach is not to try to suppress all discussion by other people of the subject but rather to apply basic political caution and not comment on it oneself (or allow anyone within one’s organisation to do so.)
        timtyler 28 May 2012 17:48 UTC
        0 points
        0
        Parent
        
        If someone were to announce credibly “We’re creating a GAI. Nobody else but us is allowed to even think about what it is going to do. Just trust us, it’s Friendly.” then the appropriate response is to shout “Watch out! It’s a dangerous crackpot! Stop him before he takes over the world and potentially destroys us all!” And make no mistake, if this kind of attempt at suppression were taken by anyone remotely near developing an FAI theory that is what it would entail. Fortunately at this point it is still at the “Mostly Harmless” stage.
        
        I don’t see how anyone could credibly announce that. The announcement radiates crackpottery.
        Wei Dai 15 May 2012 20:43 UTC
        3 points
        0
        Parent
        
        Most math kills you quietly, neatly, and cleanly, unless the apparent obstacles to distant timeless trade are overcome in practice
        
        Will mentioned a couple of other possible ways in which UFAI fails to kill off humanity, besides distant timeless trade. (BTW I think the current standard term for this is “acausal trade” which incorporates the idea of trading across possible worlds as well as across time.) Although perhaps “hidden AGIs” is unlikely and you consider “potential simulators” to be covered under “distant timeless trade”.
        
        I don’t spend much time talking about this on LW because timeless trade speculation eats people’s brains and doesn’t produce any useful outputs from the consumption; only decision theorists whose work is plugging into FAI theory need to think about timeless trade
        
        The idea is relevant not just for actually building FAI, but also for deciding strategy (ETA: for example how much chance of creating UFAI should we accept in order to build FAI). See here for an example of such discussion (between people who perhaps you think are saner than Will Newsome).
        
        not to mention the horrid way it sounds from the perspective of traditional skeptics
        
        I agreed with this, but it’s not clear what we should do about it (e.g., whether we should stop talking about it), given the strategic relevance.
        Will_Newsome 15 May 2012 21:01 UTC
        3 points
        0
        Parent
        
        The idea is relevant not just for actually building FAI, but also for deciding strategy
        
        And also relevant, I hasten to point out, for solving moral philosophy. I want to be morally justified whether or not I’m involved with an FAI team and whether or not I’m in a world where the Singularity is more than just a plot device. Acausal influence elucidates decision theory, and decision theory elucidates morality.
        Armok_GoB 15 May 2012 23:55 UTC
        0 points
        0
        Parent
        To clarify what I assume to be Eliezers point: “here there be basilisks, take it somewhere less public”
        faul_sname 17 May 2012 0:49 UTC
        2 points
        0
        Parent
        There only be basilisks if you don’t accept SSA or assume that utility scales superlinearly with computations performed.
        Armok_GoB 17 May 2012 1:03 UTC
        0 points
        0
        Parent
        There’s more than one kind. For obvious reasons I wont elaborate.
        Will_Newsome 15 May 2012 22:49 UTC
        −1 points
        0
        Parent
        
        Will mentioned a couple of other possible ways in which UFAI fails to kill off humanity, besides distant timeless trade. [...] Although perhaps “hidden AGIs” is unlikely and you consider “potential simulators” to be covered under “distant timeless trade”.
        
        This is considered unlikely ’round these parts, but one should also consider God, Who is alleged by some to be omnipotent and Who might prefer to keep humans around. Insofar as such a God is metaphysically necessary this is mechanistically but not phenomenologically distinct from plain “hidden AGI”.
        Will_Newsome 15 May 2012 20:23 UTC
        3 points
        0
        Parent
        For the LW public:
        
        (IIRC he has an unusual case of undue optimism about outcomes of timeless trade, owing to theological influences that I understand timeless trade speculations helped exacerbate his vulnerability to.)
        
        The theology and the acausal trade stuff are completely unrelated; they both have to do with decision theory, but that’s it. I also don’t think my thoughts about acausal trade differ in any substantial way from those of Wei Dai or Vladimir Nesov. So even assuming that I’m totally wrong for granting theism-like-ideas non-negligible probability, the discussion of acausal influence doesn’t seem to have directly contributed to my brain getting eaten. That said, I agree with Eliezer that it’s generally not worth speculating about, except possibly in the context of decision theory or, to a very limited extent, singularity strategy.
        gRR 15 May 2012 22:26 UTC
        −1 points
        0
        Parent
        
        only decision theorists whose work is plugging into FAI theory need to think about timeless trade
        
        But it’s fun! Why only a select group of people is to be allowed to have it?
        Armok_GoB 15 May 2012 23:56 UTC
        −1 points
        0
        Parent
        Because it’s dangerous.
        gRR 16 May 2012 0:47 UTC
        3 points
        0
        Parent
        So is mountain skiing, starting new companies, learning chemistry, and entering into relashionships.
        Armok_GoB 16 May 2012 2:23 UTC
        −3 points
        0
        Parent
        Mountain skiing maybe, depending on the mountain in question, chemistry only if you’re doing it very wrong, the others not.
        gRR 16 May 2012 2:34 UTC
        3 points
        0
        Parent
        Oh yes they are. One can leave you penniless and other scarred for life. If you’re doing them very wrong, of course. Same with thinking about acausal trade.
        J_Taylor 13 May 2012 22:58 UTC
        0 points
        0
        Parent
        Although I am extremely interested in your theories, it would take significant time and energy for me to reformulate my ideas in such a way as to satisfactorily incorporate the points you are making. As such, for purposes of this discussion, I shall be essentially speaking as if I had not been made aware of the post which you just made.
        
        However, if you could clarify a minor point: am I mistaken in my belief that it is the SI’s position that uFAI will probably result in human extinction? Or, have they incorporated the points you are making into their theories?
        Will_Newsome 14 May 2012 1:18 UTC
        1 point
        0
        Parent
        I know that Anna at least has explicitly taken such considerations into account and agrees with them to some extent. Carl likely has as well. I don’t know about Eliezer or Luke, I’ll ask Luke next time I see him. ETA: That is, I know Carl and Anna have considered the points in my first paragraph, but I don’t know how thoroughly they’ve explored the classes of scenarios like those in my second paragraph which are a lot more speculative.
        Will_Newsome 15 May 2012 22:00 UTC
        0 points
        0
        Parent
        Eliezer replied here, but it seems he’s only addressed one part of my argument thus far. I personally think the alien superintelligence variation of my argument, which Eliezer didn’t address, is the strongest, because it’s well-grounded in known physical facts, unlike simulation-based speculation.
      - lukeprog 13 May 2012 12:43 UTC
        1 point
        0
        Parent
        
        Third, I do tentatively hold that p(eAI | attempt towards FAI) > p(eAI | attempt towards AGI).
        
        Clearly, this is possible. If an FAI team comes to think this is true during development, I hope they’ll reconsider their plans. But can you provide, or link me to, some reasons for suspecting that p(eAI | attempt towards FAI) > p(eAI | attempt towards AGI)?
        What links here?
        lukeprog's comment on How can we ensure that a Friendly AI team will be sane enough? by Wei Dai (2 Sep 2013 21:17 UTC; 5 points)
        Wei Dai 13 May 2012 12:58 UTC
        6 points
        0
        Parent
        
        But can you provide, or link me to, some reasons for suspecting that p(eAI | attempt towards FAI) > p(eAI | attempt towards AGI)?
        
        Some relevant posts/comments:
        
        http://lesswrong.com/lw/ajm/ai_risk_and_opportunity_a_strategic_analysis/5ylx
        http://lesswrong.com/lw/axj/the_ai_design_space_near_the_fai_draft/
        http://lesswrong.com/lw/axj/the_ai_design_space_near_the_fai_draft/623p
      - timtyler 13 May 2012 11:45 UTC
        0 points
        0
        Parent
        
        I tentatively hold, and believe it is the SI’s position, that an uFAI is almost certain to produce human extinction.
        
        When humans are a critical clue to the couse of evolution on the planet? Surely they would repeatedly reconstruct and rerun history to gain clues about the forms of alien that they might encounter—if they held basic universal instrumental values and didn’t have too short a planning horizon.
      - Paul Crowley 13 May 2012 8:46 UTC
        0 points
        0
        Parent
        Sadly, this seems right to me. The easiest way to build an eAI is to try to build an FAI and get the sign of something wrong.