Wei Dai comments on Three Approaches to “Friendliness”

Wei Dai 14 Apr 2015 3:20 UTC
2 points

Is there some other reason to expect failure to be catastrophic?

I’m not pointing out any specific reasons, but just expect that in general, failures when dealing with large amounts of computing power can easily be catastrophic. You have theoretical arguments for why they won’t be, given a specific design, but again I am skeptical of such arguments in general.
- paulfchristiano 14 Apr 2015 15:59 UTC
  2 points
  Parent
  I agree there is some risk that cannot be removed with either theoretical arguments or empirical evidence. But why is it greater for this kind of AI than any other, and in particular than white-box metaphilosophical or normative AI?
  
  Normative AI seems like by far the worst, since:
  1. it generally demonstrates a treacherous turn if you make an error,
  2. it must work correctly across a range of unanticipated environments
  So in that case we have particular concrete reasons to think that emprical testing won’t be adequate, in addition to the general concern that empirical testing and theoretical argument is never sufficient. To me, white box metaphilosophical AI seems somewhere in between.
  
  (One complaint is that I just haven’t given an especially strong theoretical argument. I agree with that, and I hope that whatever systems people actually use, they are backed by something more convicing. But the current state of the argument seems like it can’t point in any direction other than in favor of black box designs, since we don’t yet have any arguments at all that any other kind of system could work.)