But if we were to work on it today, it would only have a sub-human level, and we could iterate like on a child
But as you yourself pointed out: “We are not sure that this would extrapolate well to higher levels of capability”
You suggested:
and we had “Reverse-enginered human social instincts”
As you said, “The brain’s face recognition algorithm is not perfect either. It has a tendency to create false positives”
And so perhaps the AI would make human pictures that create false positives. Or, as you said, “We are not sure that this would extrapolate well to higher levels of capability”
The classic example is humans creating condoms, which is a very unfriendly thing to do to Evolution, even though it raised us like children, sort of
You suggested:
But as you yourself pointed out: “We are not sure that this would extrapolate well to higher levels of capability”
You suggested:
As you said, “The brain’s face recognition algorithm is not perfect either. It has a tendency to create false positives”
And so perhaps the AI would make human pictures that create false positives. Or, as you said, “We are not sure that this would extrapolate well to higher levels of capability”
The classic example is humans creating condoms, which is a very unfriendly thing to do to Evolution, even though it raised us like children, sort of
Adding: “Intro to Brain-Like-AGI Safety” (I didn’t read it yet, seems interesting)
Ok. But don’t you think “reverse engineering human instincts” is a necessary part of the solution?
My intuition is that value is fragile, so we need to specify it. If we want to specify it correctly, either we learn it or we reverse engineer it, no?
I don’t know, I don’t have a coherent idea for a solution. Here’s one of my best ideas (not so good).
Yudkowsky split up the solutions in his post, see point 24. The first sub-bullet there is about inferring human values.
Maybe someone else will have different opinions