Charbel-Raphaël comments on AGI Safety FAQ / all-dumb-questions-allowed thread

Charbel-Raphaël 14 Jun 2022 0:02 UTC
6 points
0
Why won’t this alignment idea work?
Researchers have already succeeded in creating face detection systems from scratch, by coding the features one by one, by hand. The algorithm they coded was not perfect, but was sufficient to be used industrially in digital cameras of the last decade.
The brain’s face recognition algorithm is not perfect either. It has a tendency to create false positives, which explains a good part of the paranormal phenomena. The other hard-coded networks of the brain seem to rely on the same kind of heuristics, hard-coded by evolution, and imperfect.
However, it turns out that humans, despite these imperfect evolutionary heuristics, are generally cooperative and friendly.
This suggests that the seed of alignment can be roughly coded and yet work.
1. Can’t we replicate the kind of research effort of hand-crafting human detectors, and hand-crafting “friendly” behaviour?
2. Nowadays, this quest would be facilitated by deep learning: no need to hand-craft a baby detector, just train a neural network that recognizes babies and triggers a reaction at a certain threshold that releases the hormones of tenderness. There is no need to code the detector, just train it. And then, only the reaction corresponding to the tenderness hormone must be coded.
3. By this process, there will be gaping holes, which will have to be covered one by one. But this is certainly what happened during evolution.
The problems are:
- We are not allowed to iterate with a strong AI
- We are not sure that this would extrapolate well to higher levels of capability
Ok
But if we were to work on it today, it would only have a sub-human level, and we could iterate like on a child. And even if we had the complete code of the brain stem, and we had “Reverse-enginered human social instincts” as Steven Byrnes proposes here, it seems to me that we still would have to do all this.
What do you think?
- Yonatan Cale 14 Jun 2022 20:12 UTC
  1 point
  0
  Parent
  You suggested:
  But if we were to work on it today, it would only have a sub-human level, and we could iterate like on a child
  But as you yourself pointed out: “We are not sure that this would extrapolate well to higher levels of capability”
  You suggested:
  and we had “Reverse-enginered human social instincts”
  As you said, “The brain’s face recognition algorithm is not perfect either. It has a tendency to create false positives”
  And so perhaps the AI would make human pictures that create false positives. Or, as you said, “We are not sure that this would extrapolate well to higher levels of capability”
  The classic example is humans creating condoms, which is a very unfriendly thing to do to Evolution, even though it raised us like children, sort of
  Adding: “Intro to Brain-Like-AGI Safety” (I didn’t read it yet, seems interesting)
  - Charbel-Raphaël 16 Jun 2022 20:42 UTC
    2 points
    0
    Parent
    Ok. But don’t you think “reverse engineering human instincts” is a necessary part of the solution?
    
    My intuition is that value is fragile, so we need to specify it. If we want to specify it correctly, either we learn it or we reverse engineer it, no?
    - Yonatan Cale 17 Jun 2022 21:43 UTC
      1 point
      0
      Parent
      But don’t you think “reverse engineering human instincts” is a necessary part of the solution?
      I don’t know, I don’t have a coherent idea for a solution. Here’s one of my best ideas (not so good).
      Yudkowsky split up the solutions in his post, see point 24. The first sub-bullet there is about inferring human values.
      Maybe someone else will have different opinions

Charbel-Raphaël comments on AGI Safety FAQ /​ all-dumb-questions-allowed thread

Charbel-Raphaël comments on AGI Safety FAQ / all-dumb-questions-allowed thread