No. The kind of intelligent agent that is scary is the kind that would notice its own overconfidence—after some small number of experiences being overconfident—and then work out how to correct for it.
I mean, the main source of current x-risk is that humans are agents which are capable enough to do dangerous things(like making AI) but too overconfident to notice that doing so is a bad idea, no?
“Overconfident” gets thrown around a lot by people who just mean “incorrect”. Rarely do they mean actual systematic overconfidence. If everyone involved in building AI shifted their confidence down across the board, I’d be surprised if this changed their safety-related decisions very much. The mistakes they are making are more complicated, e.g. some people seem “underconfident” about how to model future highly capable AGI, and are therefore adopting a wait-and-see strategy. This isn’t real systematic underconfidence, it’s just a mistake (from my perspective). And maybe some are “overconfident” that early AGI will be helpful for solving future problems, but again this is just a mistake, not systemic overconfidence.
I think that generally when people say “overconfident” they have a broader class of irrational beliefs in mind than “overly narrow confidence intervals around their beliefs”, things like bias towards thinking well of yourself can be part of it too.
And maybe some are “overconfident” that early AGI will be helpful for solving future problems, but again this is just a mistake, not systemic overconfidence
OK but whatever the exact pattern of irrationality is, it clearly exists simultanaeously with humans being competent enough to possibly cause x-risk. It seems plausible that AIs might share similar (or novel!) patterns of irrationality that contribute to x-risk probability while being orthogonal to alignment per se.
I mean, the main source of current x-risk is that humans are agents which are capable enough to do dangerous things(like making AI) but too overconfident to notice that doing so is a bad idea, no?
“Overconfident” gets thrown around a lot by people who just mean “incorrect”. Rarely do they mean actual systematic overconfidence. If everyone involved in building AI shifted their confidence down across the board, I’d be surprised if this changed their safety-related decisions very much. The mistakes they are making are more complicated, e.g. some people seem “underconfident” about how to model future highly capable AGI, and are therefore adopting a wait-and-see strategy. This isn’t real systematic underconfidence, it’s just a mistake (from my perspective). And maybe some are “overconfident” that early AGI will be helpful for solving future problems, but again this is just a mistake, not systemic overconfidence.
I think that generally when people say “overconfident” they have a broader class of irrational beliefs in mind than “overly narrow confidence intervals around their beliefs”, things like bias towards thinking well of yourself can be part of it too.
OK but whatever the exact pattern of irrationality is, it clearly exists simultanaeously with humans being competent enough to possibly cause x-risk. It seems plausible that AIs might share similar (or novel!) patterns of irrationality that contribute to x-risk probability while being orthogonal to alignment per se.
Yes, I agree with that.