Thanks for your detailed and nuanced answer. I really appreciate how you distinguish between different forms of misalignment and how s-risks fit within that picture. Your comment helped clarify a lot.
If you have time, I’d love to hear you expand a bit more on the likelihood of s-risks relative to other AGI outcomes. You mentioned that s-risks seem much less likely than extinction or successful alignment, but could you give a rough probability estimate (even if it’s just an intuitive order-of-magnitude guess, like “1 in a thousand” or “1 in a million”)?
It would also be interesting to hear your thoughts on what factors most strongly influence that probability, for example, how much governance or alignment progress would need to fail for s-risks to become plausible, whether you think “instrumental torture” (as opposed to large-scale indifferent suffering) deserves separate consideration, and how much you think the risk depends on who ends up in control of early AGIs (e.g., sociopathic or sadistic actors).
Basically, I’m trying to understand not just whether s-risks are neglected, but how much weight they deserve compared to extinction in our overall AGI-risk prioritization.
Thanks again for engaging with these hard questions.
In a way you need computers to wirehead everyone. But you don’t necessarily need AI to wirehead everyone. I think we as humans can figure out how the reward system works.