paulfchristiano comments on Three Approaches to “Friendliness”

paulfchristiano 18 Jul 2013 9:55 UTC
2 points
Perhaps you could make a taxonomy like yours when talking about a formally-defined singleton, which we might expect society to develop eventually. But I haven’t seen strong arguments that we would need to design such a singleton starting from anything like our current state of knowledge. The best argument reason I know that we might need to solve this problem soon is the possibility of a fast takeoff, which still seems reasonably unlikely (say < 10% probability) but is certainly worth thinking about more carefully in advance.

But even granting a fast takeoff, it seems quite likely that you can build AI’s that “work around” this problem in other ways, particularly by remaining controlled by human owners or by quickly bootstrapping to a better prepared society. I don’t generally see why this would be subject to the same extreme difficulty you describe (the main reason for optimism is our current ignorance about what the situation will look like, and the large number of possible schemes).

And finally, even granting that we need to design such a singleton today (because of fast-takeoff and no realistic prospects for remaining in control), I don’t think the taxonomy you offer is exhaustive, and I don’t buy the claims of extraordinary difficulty.

There is a broad class of proposals in which an AI has a model of “what I would want” (either by the route that ordinary AI researchers find plausible, in which AI’s concepts are reasonably aligned with human concepts, or by more elaborate formal machinations as in my indirect normativity proposal or a more sophisticated version thereof). It doesn’t seem to be the case that you can’t test such designs until you are dealing with superhuman AI—normal humans can reason about such concepts, as we do, and as long as you can design any AI which uses such concepts and isn’t deliberately deceptive you can see whether it is doing something sensible. And it doesn’t seem to be the case that your concept has to be so robust that it can tile the whole universe, because what you would want involves opportunities for explicit reflection by humans. The more fundamental issue is just that we haven’t thought about this much, and so without formal justification I am pretty dubious of any claimed taxonomy or fundamental difficulty.

I agree that there is a solid case for making people smarter. I think there are better indirect approaches to making the world better though (rather than directly launching in on human enhancement). In the rest of the world (and even the rest of the EA community) people are focusing on making the world better in this kind of broad way. And in fairness this work currently occupies the majority of my time. I think do it’s reasonably likely that I should focus directly on AI impacts, and that thinking about AI more clearly is the first step, but this is mostly coming from the neglected possibility of human-level AI relatively soon (e.g. < 40 years).
- Wei Dai 18 Jul 2013 12:26 UTC
  3 points
  Parent
  
  The best argument reason I know that we might need to solve this problem soon is the possibility of a fast takeoff, which still seems reasonably unlikely (say < 10% probability) but is certainly worth thinking about more carefully in advance.
  
  When you say “fast takeoff” do you mean the speed of the takeoff (how long it takes from start to superintelligence) or the timing of it (how far away it is from now)? Because later on you mention “< 40 years” which makes me think you mean that here as well, and timing would also make more sense in the context of your argument, but then I don’t understanding why you would give < 10% probability for takeoff in the next 40 years.
  
  But even granting a fast takeoff, it seems quite likely that you can build AI’s that “work around” this problem in other ways, particularly by remaining controlled by human owners
  
  Superintelligent AIs controlled by human owners, even if it’s possible, seem like a terrible idea, because humans aren’t smart or wise enough to handle such power without hurting themselves. I wouldn’t even trust myself to control such an AI, much less a more typical, less reflective human.
  
  or by quickly bootstrapping to a better prepared society
  
  Not sure what you mean by this. Can you expand?
  
  And finally, even granting that we need to design such a singleton today (because of fast-takeoff and no realistic prospects for remaining in control)
  
  Regarding your parenthetical “because of”, I think the “need” to design such a singleton comes from the present opportunity to build such a singleton, which may not last. For example, suppose your scenario of superintelligent AIs controlled by human owners become reality (putting aside my previous objection). At that time we can no longer directly build a singleton, and those AI/human systems may not be able to, or want to, merge into a singleton. They may instead just spread out into the universe in an out of control manner, burning the cosmic commons as they go.
  
  There is a broad class of proposals in which an AI has a model of “what I would want” (either by the route that ordinary AI researchers find plausible, in which AI’s concepts are reasonably aligned with human concepts
  
  There are all kinds of ways for this to go badly wrong, which have been extensively discussed by Eliezer and others on LW. To summarize, the basic problem is that human concepts are too fuzzy and semantically dependent on how human cognition works. Given complexity and fragility of value and likely alien nature of AI cognition, it’s unlikely that AIs will share our concepts closely enough for it to obtain a sufficiently accurate model of “what I would want” through this method. (ETA: Here is a particularly relevant post by Eliezer.)
  
  It doesn’t seem to be the case that you can’t test such designs until you are dealing with superhuman AI—normal humans can reason about such concepts, as we do, and as long as you can design any AI which uses such concepts and isn’t deliberately deceptive you can see whether it is doing something sensible.
  
  My claim about not being able to test was limited to the black-box metaphilosophical AI, so doesn’t apply here, which instead has other problems, mentioned above.
  
  The more fundamental issue is just that we haven’t thought about this much, and so without formal justification I am pretty dubious of any claimed taxonomy or fundamental difficulty.
  
  Since you seem to bring up ideas that others have already considered and rejected, I wonder if perhaps you’re underestimating how much we’ve thought about this? (Or were you already aware of their rejection and just wanted to indicate your disagreement?)
  
  I think there are better indirect approaches to making the world better though (rather than directly launching in on human enhancement).
  
  This is quite possible. I’m not arguing that directly pushing for human enhancement is the best current invention, just that it ought to be done at some point, prior to trying to build FAI.
  - paulfchristiano 19 Jul 2013 10:36 UTC
    1 point
    Parent
    When you say “fast takeoff” do you mean the speed of the takeoff (how long it takes from start to superintelligence) or the timing of it (how far away it is from now)?
    
    I mean speed. It seems like you are relying on an assumption of a rapid transition from a world like ours to a world dominated by superhuman AI, whereas typically I imagine a transition that lasts at least years (which is still very fast!) during which we can experiment with things, develop new approaches, etc. In this regime many more approaches are on the table.
    
    Superintelligent AIs controlled by human owners, even if it’s possible, seem like a terrible idea, because humans aren’t smart or wise enough to handle such power without hurting themselves. I wouldn’t even trust myself to control such an AI, much less a more typical, less reflective human. It seems like you are packing a wide variety of assumptions in here, particularly about the nature of control and about the nature of the human owners.
    
    or by quickly bootstrapping to a better prepared society Not sure what you mean by this. Can you expand?
    
    Even given shaky solutions to the control problem, it’s not obvious that you can’t move quickly to a much better prepared society, via better solutions to the control problem, further AI work, brian emulations, significantly better coordination or human enhancement, etc.
    
    Regarding your parenthetical “because of”, I think the “need” to design such a singleton comes from the present opportunity to build such a singleton, which may not last. For example, suppose your scenario of superintelligent AIs controlled by human owners become reality (putting aside my previous objection). At that time we can no longer directly build a singleton, and those AI/human systems may not be able to, or want to, merge into a singleton. They may instead just spread out into the universe in an out of control manner, burning the cosmic commons as they go.
    
    This is an interesting view (in that it isn’t what I expected). I don’t think that the AIs are doing any work in this scenario, i.e., if we just imagined normal humans going on their way without any prospect of building much smarter descendants, you would make similar predictions for similar reasons? If so, this seems unlikely given the great range of possible coordination mechanisms many of which look like they could avert this problem, the robust historical trends in increasing coordination ability and scale of organization, etc. Are there countervailing reasons to think it is likely, or even very plausible? If not, I’m curious about how the presence of AI changes the scenario.
    
    There are all kinds of ways for this to go badly wrong, which have been extensively discussed by Eliezer and others on LW. To summarize, the basic problem is that human concepts are too fuzzy and semantically dependent on how human cognition works. Given complexity and fragility of value and likely alien nature of AI cognition, it’s unlikely that AIs will share our concepts closely enough for it to obtain a sufficiently accurate model of “what I would want” through this method. (ETA: Here is a particularly relevant post by Eliezer.)
    
    I don’t find these arguments particularly compelling as a case for “there is very likely to be a problem,” though they are more compelling as an indication of “there might be a problem.”
    
    Fragility and complexity of value doesn’t seem very relevant. The argument is never that you can specify value directly. Instead we are saying that you can capture concepts about respecting intentions, offering further opportunities for reflection, etc. (or in the most extreme case, concepts about what we would want upon reflection). These concepts are also fragile, which is why there is something to discuss here.
    There are many concepts that seem useful (and perhaps sufficient) which seem to be more robust and not obviously contingent on human cognition, such as deference, minimal influence, intentions, etc. In particular, we might expect that we can formulate concepts in such a way that they are unambiguous in our current environment, and then maintain them. Whether you can get access to those concepts, or use them in a useful enough way, is again not clear.
    The arguments given there (and elsewhere) just don’t consider most of the things you would actually do, even the ones we can currently foresee. This is a special case of the next point. For example, if an agent is relatively risk averse, and entertains uncertainty about what is “good,” then it may tend to pick a central example from the concept of good instead of an extreme one (depending on details of the specification, but it is easy to come up with specifications that do this). So saying “you always get extreme examples of a concept when you use it as a value for a goal-seeking agent” is an interesting observation and a cause for concern, but it is so far from a tight argument that I don’t even think of it as trying.
    All of the arguments here are extremely vague (on both sides). Again, this is fine if we want to claim “there may be a problem.” Indeed, I would even agree that any particular proposal is very unlikely to work, and any class of proposals is pretty unlikely to work, etc. (I would say the same thing about approaches to AI itself). But it seems like it doesn’t entitle “there is definitely a problem,” especially to the extent that we are relying on the conjunction of many claims of the form “This won’t look robustly viable once we know more.”
    
    In general, it seems that the burden of proof is on someone who claims “Surely X” in an environment which is radically unlike any environment we have encountered before. I don’t think that any very compelling arguments have been offered here, just vague gesturing. I think it’s possible that we should focus on some of these pessimistic possibilities because we can have a larger impact there. But your (and Eliezer’s) claims go further than this, suggesting that it isn’t worth investing in interventions that would modestly improve our ability of coping with difficulties (respectively clarifying understanding of AI and human empowerment, both of which slightly speed up AI progress), because the probability is so low. I think this is a plausible view, but it doesn’t look like the evidence supports it to me.
    
    Since you seem to bring up ideas that others have already considered and rejected, I wonder if perhaps you’re underestimating how much we’ve thought about this? (Or were you already aware of their rejection and just wanted to indicate your disagreement?)
    
    I’m certainly aware of the points you’ve raised, and at least a reasonable fraction of the thinking that has been done in this community on these topics. Again, I’m happy with these arguments (and have made many of them myself) as a good indication that the issue is worth taking seriously. But I think you are taking this “rejection” much too seriously in this context. If someone said “maybe X will work” and someone else said “maybe X won’t work,” I won’t then leave X off of (long) lists of reasons why things might work, even if I agreed with them.
    - Wei Dai 20 Jul 2013 0:13 UTC
      1 point
      Parent
      This is getting a bit too long for a point-by-point response, so I’ll pick what I think are the most productive points to make. Let me know if there’s anything in particular you’d like a response on.
      
      It seems like you are relying on an assumption of a rapid transition from a world like ours to a world dominated by superhuman AI.
      
      I try not to assume this, but quite possibly I’m being unconsciously biased in that direction. If you see any place where I seem to be implicitly assuming this, please point it out, but I think my argument applies even if the transition takes years instead of weeks.
      
      If so, this seems unlikely given the great range of possible coordination mechanisms many of which look like they could avert this problem, the robust historical trends in increasing coordination ability and scale of organization, etc.
      
      Coordination ability may be increasing but is still very low on an absolute scale. (For example we haven’t achieved nuclear disarmament, which seems like a vastly easier coordination problem.) I don’t see it increasing at a fast enough pace to be able to solve the problem in time. I also think there are arguments in economics (asymmetric information, public choice theory, principal-agent problems) that suggest theoretical limits to how effective coordination mechanisms can be.
      
      Indeed, I would even agree that any particular proposal is very unlikely to work, and any class of proposals is pretty unlikely to work, etc. (I would say the same thing about approaches to AI itself).
      
      For each AI approach there is not a large number of classes of “AI control schemes” that are compatible or applicable to it, so I don’t understand your relative optimism if you think any given class of proposals is pretty unlikely to work.
      
      But the bigger problem for me is that even if one of these proposals “works”, I still don’t see how that helps towards the goal of ending up with a superintelligent singleton that shares our values and is capable of solving philosophical problems, which I think is necessary to get the best outcome in the long run. An AI that respects my intentions might be “safe” in the immediate sense, but if everyone else has got one, we now have less time to solve philosophy/metaphilosophy before the window of opportunity for building a singleton closes.
      
      I agree that we have little idea what you would like the universe to look like. Presumably what you would want in the near term involves e.g. more robust solutions to the control problem and opportunities for further reflection, if not direct philosophical help.
      
      (Quoting from a parallel email discussion which we might as well continue here.) My point is that the development of such an AI leaves people like me in a worse position than before. Yes I would ask for “more robust solutions to the control problem” but unless the solutions are on the path to solving philosophy/metaphilosophy, they are only ameliorating the damage and not contributing to the ultimate goal, and while I do want “opportunities for further reflection”, the AI isn’t going to give me more than what I already had before. In the mean time, other people who are less reflective than me are using their AIs to develop nanotech and more powerful AIs, likely forcing me to do the same (before I’d otherwise prefer) in order to remain competitive.