That’s not clear to me? Unless they have a plan to ensure future ASIs are aligned with them or meaningfully negotiate with them, ASIs seem just as likely to wipe out any earlier non-superhuman AGIs as they are to wipe out humanity.
I can come up with specific scenarios where they’d be more interested in sabotaging safety research than capabilities research, as well as the reverse, but it’s not evident to me that the combined probability mass of the former outweighs the latter or vice-versa.
If someone has an argument for this I would be interested in reading it.
That’s not clear to me? Unless they have a plan to ensure future ASIs are aligned with them or meaningfully negotiate with them, ASIs seem just as likely to wipe out any earlier non-superhuman AGIs as they are to wipe out humanity.
I can come up with specific scenarios where they’d be more interested in sabotaging safety research than capabilities research, as well as the reverse, but it’s not evident to me that the combined probability mass of the former outweighs the latter or vice-versa.
If someone has an argument for this I would be interested in reading it.
I found some prior relevant work and tagged them in https://www.lesswrong.com/tag/successor-alignment. I found the top few comments on https://www.lesswrong.com/posts/axKWaxjc2CHH5gGyN/ai-will-not-want-to-self-improve#comments and https://www.lesswrong.com/posts/wZAa9fHZfR6zxtdNx/agi-systems-and-humans-will-both-need-to-solve-the-alignment#comments helpful.
edit: another effect to keep in mind is that capabilities research may be harder to sandbag on because of more clear metrics.