I thought the argument about the kindly mask was assuming that the scenario of “I just took over the world” is sufficiently out-of-distribution that we might reasonably fear that the in-distribution track record of aligned behavior might not hold?
I thought the argument about the kindly mask was assuming that the scenario of “I just took over the world” is sufficiently out-of-distribution that we might reasonably fear that the in-distribution track record of aligned behavior might not hold?