paulfchristiano comments on Three Approaches to “Friendliness”

paulfchristiano 14 Apr 2015 15:59 UTC
2 points
I agree there is some risk that cannot be removed with either theoretical arguments or empirical evidence. But why is it greater for this kind of AI than any other, and in particular than white-box metaphilosophical or normative AI?

Normative AI seems like by far the worst, since:
1. it generally demonstrates a treacherous turn if you make an error,
2. it must work correctly across a range of unanticipated environments
So in that case we have particular concrete reasons to think that emprical testing won’t be adequate, in addition to the general concern that empirical testing and theoretical argument is never sufficient. To me, white box metaphilosophical AI seems somewhere in between.

(One complaint is that I just haven’t given an especially strong theoretical argument. I agree with that, and I hope that whatever systems people actually use, they are backed by something more convicing. But the current state of the argument seems like it can’t point in any direction other than in favor of black box designs, since we don’t yet have any arguments at all that any other kind of system could work.)