Charlie Steiner comments on If I were a well-intentioned AI… I: Image classifier

Charlie Steiner 27 Apr 2020 11:37 UTC
LW: 4 AF: 2
0
AF
I went back and re-read these. I think the main anthropomorphic power that AI-you uses is that it already models the world using key human-like assumptions, where necessary.
For example, when you think about how an AI would be predisposed to break images down into “metal with knobs” and “beefy arm” rather than e.g. “hand holding metal handle,” “knobs,” and “forearm,” (or worse, some combination of hundreds of edge detector activations), it’s pretty tricky. It needs some notion of human-common-sense “things” and what scale things tend to be. Maybe it needs some notion of occlusion and thing composition before it can explain held-dumbbell images in terms of a dumbbell thing. It might even need to have figured out that these things are interacting in an implied 3D space before it can draw human-common-sense associations between things that are close together in this 3D space. All of which is so obvious to us that it can be implicit for AI-you.
Would you agree with this interpretation, or do you think there’s some more important powers used?
- Stuart_Armstrong 27 Apr 2020 12:35 UTC
  LW: 3 AF: 1
  0
  AF Parent
  I think that might be a generally good critique, but I don’t think it applies to this post (it may apply better to post #3 in the series).
  
  I used “metal with knobs” and “beefy arm” as human-parsable examples, but the main point is detecting when something is out-off-distribution, which relies on the image being different in AI-detectable ways, not on the specifics of the categories I mentioned.
  - Charlie Steiner 27 Apr 2020 20:22 UTC
    LW: 3 AF: 1
    0
    AF Parent
    I don’t think this is necessarily a critique—after all, it’s inevitable that AI-you is going to inherit some anthropomorphic powers. The trick is figuring out what they are and seeing if it seems like a profitable research avenue to try and replicate them :)
    In this case, I think this is an already-known problem, because detecting out-of-distribution images in a way that matches human requirements requires the AI’s distribution to be similar to human distribution (and conversely, mismatches in distribution allow for adversarial examples). But maybe there’s something different in part 4 where I think there’s some kind of “break down actions in obvious ways” power that might not be as well-analyzed elsewhere (though it’s probably related to self-supervised learning of hierarchical planning problems).
    - Stuart_Armstrong 28 Apr 2020 9:51 UTC
      LW: 3 AF: 1
      0
      AF Parent
      I don’t think critiques are necessarily bad ^_^