EY’s concern is that superhuman AI might behave wildly different than any lesser system, especially if there is a step where the AI replaces itself with something different and more powerful (recursive self-improvement). In machine learning, this is called “out-of-distribution behavior” because being superintelligent is not in the ‘distribution’ of the training data—it’s something that’s never happened before, so we can’t plan for it.
That said, there are people working on alignment with current AI, such as Redwood Research Institute, Anthropic AI and others.
But isn’t it problematic to start the analysis at “superhuman AGI exists”? Then we need to make assumptions about how that AGI came into being. What are those assumptions, and how robust are they?
I strongly agree with this objection. You might be interested in Comprehensive AI Services, a different story of how AI develops that doesn’t involve a single superintelligent machine, as well as “Prosaic Alignment” and “The case for aligning narrowly superhuman systems”. Right now, I’m working on language model alignment because it seems like a subfield with immediate problems and solutions that could be relevant if we see extreme growth in AI over the next 5-10 years.
EY’s concern is that superhuman AI might behave wildly different than any lesser system, especially if there is a step where the AI replaces itself with something different and more powerful (recursive self-improvement). In machine learning, this is called “out-of-distribution behavior” because being superintelligent is not in the ‘distribution’ of the training data—it’s something that’s never happened before, so we can’t plan for it.
That said, there are people working on alignment with current AI, such as Redwood Research Institute, Anthropic AI and others.
Just to add to this list: Aligned AI.
But isn’t it problematic to start the analysis at “superhuman AGI exists”? Then we need to make assumptions about how that AGI came into being. What are those assumptions, and how robust are they?
I strongly agree with this objection. You might be interested in Comprehensive AI Services, a different story of how AI develops that doesn’t involve a single superintelligent machine, as well as “Prosaic Alignment” and “The case for aligning narrowly superhuman systems”. Right now, I’m working on language model alignment because it seems like a subfield with immediate problems and solutions that could be relevant if we see extreme growth in AI over the next 5-10 years.