Vladimir_Nesov comments on Three Approaches to “Friendliness”

Vladimir_Nesov 17 Jul 2013 11:50 UTC
12 points
0
The difficulty is still largely due to the security problem. Without catastrophic risks (including UFAI and value drift), we could take as much time as necessary and/or go with making people smarter first.

The aspect of FAI that is supposed to solve the security problem is optimization power aimed at correct goals. Optimization power addresses the “external” threats (and ensures progress), and correctness of goals represents “internal” safety. If an AI has sufficient optimization power, the (external) security problem is taken care of, even if the goals are given by a complicated definition that the AI is unable to evaluate at the beginning: it’ll protect the original definition even without knowing what it evaluates to, and aim to evaluate it (for instrumental reasons).

This suggests that a minimal solution is to pack all the remaining difficulties in AI’s goal definition, at which point the only object level problems are to figure out what a sufficiently general notion of “goal” is (decision theory; the aim of this part is to give the goal definition sufficient expressive power, to avoid constraining its decisions while extracting the optimization part), how to build an AI that follows a goal definition and is at least competitive in its optimization power, and how to compose the goal definition. The simplest idea for the goal definition seems to be some kind of WBE-containing program, so learning to engineer stable WBE superorganisms might be relevant for this part (UFAI and value drift will remain a problem, but might be easier to manage in this setting).

(It might be also good to figure out how to pack a reference to the state of the Earth at a recent point in time into the goal definition, so that the AI has an instrumental drive to capture its state when it still doesn’t understand its goals and so will probably use the Earth itself for something else; this might then also lift the requirement of having WBE tech in order to construct the goal definition.)
What links here?
- Vladimir_Nesov's comment on An argument against indirect normativity by cousin_it (24 Jul 2013 23:42 UTC; 3 points)
- Vladimir_Nesov's comment on How does MIRI Know it Has a Medium Probability of Success? by Peter Wildeford (3 Aug 2013 12:38 UTC; 2 points)
- Wei Dai 17 Jul 2013 20:01 UTC
  7 points
  0
  Parent
  
  Without catastrophic risks (including UFAI and value drift), we could take as much time as necessary and/or go with making people smarter first.
  
  You appear to be operating under the assumption that it’s already too late or otherwise impractical to “go with making people smarter first”, but I don’t see why, compared to “build FAI first”.
  
  Human cloning or embryo selection look like parallelizable problems that would be easily amenable to the approach of “throwing resources at it”. It just consists of a bunch of basic science and engineering problems, which humans are generally pretty good at, compared to the kind of philosophical problems that need to be solved for building FAI. Nor do we have to get all those problems right on the first try or face existential disaster. Nor is intelligence enhancement known to be strictly harder than building UFAI (i.e., solving FAI requires solving AGI as a subproblem). And there must be many other research directions that could be funded in addition to these two. All it would take is for some government or maybe even large corporation or charitable organization to take the problem of “astronomical waste” seriously (again referring to the more general concept than Bostrom’s, which I wish had its own established name).
  
  If it’s not already too late or impractical to make people smarter first (and nobody has made a case that it is, as far as I know) then FAI work has the counterproductive consequence of making it harder to make people smarter first (by shortening AI timelines). MIRI and other FAI advocates do not seem to have taken this into account adequately.
  - Vladimir_Nesov 17 Jul 2013 21:18 UTC
    5 points
    0
    Parent
    My point was that when we expand on “black box metaphilosophical AI”, it seems to become much less mysterious than the whole problem, we only need to solve decision theory and powerful optimization and maybe (wait for) WBE. If we can pack a morality/philosophy research team into the goal definition, the solution of the friendliness part can be deferred almost completely to after the current risks are eliminated, at which point the team will have a large amount of time to solve it.
    
    (I agree that building smarter humans is a potentially workable point of intervention. This needs a champion to at least outline the argument, but actually making this happen will be much harder.)
    - Wei Dai 17 Jul 2013 21:49 UTC
      4 points
      0
      Parent
      
      My point was that when we expand on “black box metaphilosophical AI”, it seems to become much less mysterious than the whole problem, we only need to solve decision theory and powerful optimization and maybe (wait for) WBE.
      
      I think I understand the basic motivation for pursuing this approach, but what’s your response to the point I made in the post, that such an AI has to achieve superhuman levels of optimizing power, in order to acquire enough computing power to run the WBE, before it can start producing philosophical solutions, and therefore there’s no way for us to safely test it to make sure that the “black box” would produce sane answers as implemented? It’s hard for me to see how we can get something this complicated right on the first try.
      - Vladimir_Nesov 17 Jul 2013 22:10 UTC
        4 points
        0
        Parent
        The black box is made of humans and might be tested the usual way when (human-designed) WBE tech is developed. The problem of designing its (long term) social organisation might also be deferred to the box. The point of the box is that it can be made safe from external catastrophic risks, not that it represents any new progress towards FAI.
        
        The AI doesn’t produce philosophical answers, the box does, and the box doesn’t contain novel/dangerous things like AIs. This only requires solving the separate problems of having AI care about evaluating a program, and preparing a program that contains people who would solve the remaining problems (and this part doesn’t involve AI). The AI is something that can potentially be theoretically completely understood and it can be very carefully tested under controlled conditions, to see that it does evaluate simpler black boxes that we also understand. Getting decision theory wrong seems like a more elusive risk.
        Wei Dai 17 Jul 2013 22:30 UTC
        6 points
        0
        Parent
        
        The black box is made of humans and might be tested the usual way when (human-designed) WBE tech is developed.
        
        Ok, I think I misunderstood you earlier, and thought that your idea was similar to Paul Christiano’s, where the FAI would essentially develop the WBE tech instead of us. I had also suggested waiting for WBE tech before building FAI (although due to a somewhat different motivation), and in response someone (maybe Carl Shulman?) argued that brain-inspired AGI or low-fidelity brain emulations would likely be developed before high-fidelity brain emulations, which means the FAI would probably come too late if it waited for WBE. This seems fairly convincing to me.
        Vladimir_Nesov 17 Jul 2013 22:44 UTC
        4 points
        0
        Parent
        Waiting for WBE is risky in many ways, but I don’t see a potentially realistic plan that doesn’t go through it, even if we have (somewhat) smarter humans. This path (and many variations, such as a WBE superorg just taking over “manually” and not leaving anyone else with access to physical world) I can vaguely see working, solving the security/coordination problem, if all goes right; other paths seem much more speculative to me (but many are worth trying, given resources; if somehow possible to do reliably, AI-initiated WBE when there is no human-developed WBE would be safer).
    - Yosarian2 24 Jul 2013 17:23 UTC
      0 points
      0
      Parent
      
      (I agree that building smarter humans is a potentially workable point of intervention. This needs a champion to at least outline the argument, but actually making this happen will be much harder.)
      
      It seems fairly clear to me that we will have the ability to “build smarter humans” within the next few decades, just because there are so many different possible research paths that could get us to that goal, all of which look promising.
      
      There’s starting to be some good research done right now on which genes correlate with intelligence. It looks like a very complicated subject, with thousands of genes contributing; nonetheless, =that would be enough to make it possible to do pre-implantation genetic screening to select “smarter” babies with current day technology, and it doesn’t put us that far from actually genetically engineering fertilized eggs before implantation, or possibly even doing genetic therapies to adults (although, of course, that’s inherently dodgier, and is likely to have a smaller effect).
      
      Other likely paths to IA include:
      
      -We’re making a lot of progress on brain-computer interfaces right now, of all types.
      
      -Brain stimulation also seems to have a lot of potential; it was already shown to improve people’s ability to learn math in school in published research.
      
      -Nootropic drugs also may some potential, although we aren’t really throwing a lot of research in that direction right now. It is worth mentioning, though, that one possible outcome to that research on genes correlated with intelligence might be to figure out what proteins those genes code for and figure out drugs that have the same effect.
      
      -Looking at the more cybernetic side, a scientist has recently managed to create an implantable chip that could connect with the brain of a rat and both store memories and give them back to the mouse directly, basically an artificial hippocampus. http://www.technologyreview.com/featuredstory/513681/memory-implants/
      
      -The sudden focus on brain research and modeling in the US and the UK is also likely to have significant impacts
      
      -There’s other, more futuristic possible technologies here as well (nanotech, computer exocortex, ect). Not as likely to happen in the time frame we’re talking about, though.
      
      Anyway, unless GAI comes much sooner then I expect it to, I would expect that some of the things on that list are likely to happen before GAI. Many of them we’re already quite close to, and there’s enough different paths to get to enhanced human intelligence that I put a low probability on all of them being dead ends. I think there’s a very good chance that we’ll develop some kind of way to increase human intelligence first, before any kind of true GAI becomes possible, especially if we put more effort into research in that direction.
      
      The real question, I think, is how much of intelligence boost any of that that going to give us, and if that’s going to be enough to make FAI problems easier to solve, and I’m not sure if that’s answerable at this point.