I do expect some decent probability of the first superhuman AGIs misjudging their capabilities and requirements in at least some respects. Humans in a completely novel environment are pretty terrible at evaluating what they must or must not do to achieve their goals, and almost by definition a newly trained AGI trying to have an impact on the real world is in a completely novel environment. So I think “smarter than any human” is a low bar that still leaves a lot of room to make mistakes and be caught.
So I think we stand a decent chance of getting some “near misses”. I’m not sure that we as a civilization will adequately respond to those near misses in a sane way.
I think GPT-4 already exhibits near-miss behaviour and we should be taking this very seriously even though it is not quite AGI.
I do expect some decent probability of the first superhuman AGIs misjudging their capabilities and requirements in at least some respects. Humans in a completely novel environment are pretty terrible at evaluating what they must or must not do to achieve their goals, and almost by definition a newly trained AGI trying to have an impact on the real world is in a completely novel environment. So I think “smarter than any human” is a low bar that still leaves a lot of room to make mistakes and be caught.
So I think we stand a decent chance of getting some “near misses”. I’m not sure that we as a civilization will adequately respond to those near misses in a sane way.
I think GPT-4 already exhibits near-miss behaviour and we should be taking this very seriously even though it is not quite AGI.