Rohin Shah comments on [AN #80]: Why AI risk might be solved without additional intervention from longtermists

Rohin Shah 3 Jan 2020 21:42 UTC
LW: 6 AF: 4
AF
Neural nets have around human performance on Imagenet.
But those trained neural nets are very subhuman on other image understanding tasks.
Then you can form an equally good, nonhuman concept by taking the better alien concept and adding random noise.
I would expect that the alien concepts are something we haven’t figured out because we don’t have enough data or compute or logic or some other resource, and that constraint will also apply to the AI. If you take that concept and “add random noise” (which I don’t really understand), it would presumably still require the same amount of resources, and so the AI still won’t find it.
For the rest of your comment, I agree that we can’t theoretically rule those scenarios out, but there’s no theoretical reason to rule them in either. So far the empirical evidence seems to me to be in favor of “abstractions are determined by the territory”, e.g. ImageNet neural nets seems to have human-interpretable low-level abstractions (edge detectors, curve detectors, color detectors), while having strange high-level abstractions; I claim that the strange high-level abstractions are bad and only work on ImageNet because they were specifically designed to do so and ImageNet is sufficiently narrow that you can get to good performance with bad abstractions.
- Donald Hobson 4 Jan 2020 0:56 UTC
  LW: 1 AF: 1
  AF Parent
  By adding random noise, I meant adding wiggles to the edge of the set in thingspace for example adding noise to “bird” might exclude “ostrich” and include “duck bill platypus”.
  I agree that the high level image net concepts are bad in this sense, however are they just bad. If they were just bad and the limit to finding good concepts was data or some other resource, then we should expect small children and mentally impaired people to have similarly bad concepts. This would suggest a single gradient from better to worse. If however current neural networks used concepts substantially different from small children, and not just uniformly worse or uniformly better, that would show different sets of concepts at the same low level. This would be fairly strong evidence of multiple concepts at the smart human level.
  I would also want to point out that a small fraction of the concepts being different would be enough to make alignment much harder. Even if their was a perfect scale, if ¹⁄₃ of the concepts are subhuman, ¹⁄₃ human level and ¹⁄₃ superhuman, it would be hard to understand the system. To get any safety, you need to get your system very close to human concepts. And you need to be confidant that you have hit this target.