Tell me your most-probable story in which we still get a mildly friendly [edit: superintelligent] AGI
Research along the Agent Foundations direction ends up providing alignment insights that double as capability insights, as per this model, leading to some alignment research group abruptly winning the AGI race out of nowhere.
Looking at it another way, perhaps the reasoning failures that lead to AI Labs not taking AI Risk seriously enough are correlated with wrong models of how cognition works and how to get to AGI, meaning research along their direction will enter a winter, allowing a more alignment-friendly paradigm time to come into existence.
Research along the Agent Foundations direction ends up providing alignment insights that double as capability insights, as per this model, leading to some alignment research group abruptly winning the AGI race out of nowhere.
Looking at it another way, perhaps the reasoning failures that lead to AI Labs not taking AI Risk seriously enough are correlated with wrong models of how cognition works and how to get to AGI, meaning research along their direction will enter a winter, allowing a more alignment-friendly paradigm time to come into existence.
That seems… plausible enough. Of course, it’s also possible that we’re ~1 insight from AGI along the “messy bottom-up atheoretical empirical tinkering” approach and the point is moot.