(Sure, in some sense we agree that the assumption is required, but I think that’s a misleading way of putting it, but whatever)
Thank you for the detailed and lengthy explanation! I agree with your first point probably, it seems to me to be similar to what the shard theory people are exploring and yes this is a promising line of research which may if we are lucky overturn the default hypothesis that misaligned-but-deceptive AIs are most likely. I say similar things about the second point I guess. Both points are just basically saying “we don’t know what the prior is like” so sure but they aren’t positive arguments that the prior will be benign. Not sure whether I agree with the third point but anyhow it also just seems to be a warning that we are ignorant about the prior, not an argument that the prior is benign.
I don’t think I understand your more detailed argument that begins with “the point about nondescriptive goodharting.” I’m tired now so will go away but hopefully will return and try to think more deeply about it. I strongly encourage you to write up a post on it, with emphasis on clarity. I really hope you are right!
(Sure, in some sense we agree that the assumption is required, but I think that’s a misleading way of putting it, but whatever)
Thank you for the detailed and lengthy explanation! I agree with your first point probably, it seems to me to be similar to what the shard theory people are exploring and yes this is a promising line of research which may if we are lucky overturn the default hypothesis that misaligned-but-deceptive AIs are most likely. I say similar things about the second point I guess. Both points are just basically saying “we don’t know what the prior is like” so sure but they aren’t positive arguments that the prior will be benign. Not sure whether I agree with the third point but anyhow it also just seems to be a warning that we are ignorant about the prior, not an argument that the prior is benign.
I don’t think I understand your more detailed argument that begins with “the point about nondescriptive goodharting.” I’m tired now so will go away but hopefully will return and try to think more deeply about it. I strongly encourage you to write up a post on it, with emphasis on clarity. I really hope you are right!