I like this post. I’m not sure how decision-relevant it is for technical research though…
If there isn’t a broad basin of attraction around human values, then we really want the AI (or the human-AI combo) to have “values” that, though they need not be exactly the same as the human, are at least within the human distribution. If there is a broad basin of attraction, then we still want the same thing, it’s just that we’ll ultimately get graded on a more forgiving curve. We’re trying to do the same thing either way, right?
I like this post. I’m not sure how decision-relevant it is for technical research though…
If there isn’t a broad basin of attraction around human values, then we really want the AI (or the human-AI combo) to have “values” that, though they need not be exactly the same as the human, are at least within the human distribution. If there is a broad basin of attraction, then we still want the same thing, it’s just that we’ll ultimately get graded on a more forgiving curve. We’re trying to do the same thing either way, right?