We were arguing about concept learning order and the ease of internalizing human values versus other features for basing decisions on. We were arguing that human values are easy features to learn / internalize / hook up to decision making, so on any natural progression up the learning capacity ladder, you end up with an AI that’s aligned before you end up with one that’s so capable it can destroy the entirety of human civilization by itself.
Yes, but you were arguing for that using examples of “morally evaluating” and “grokking the underlying simple moral rule”, not of caring.
Yes, but you were arguing for that using examples of “morally evaluating” and “grokking the underlying simple moral rule”, not of caring.