Interesting. Your numbers imply a pretty good chance of everyone not dying soon after an AI takeover. I’d imagine that’s either from a slow transition period in which humans are still useful, or from partially aligned AI. Partially successful alignment isn’t discussed much. It’s generally been assumed that we’d get alignment right or we won’t.
But it seems much more possible to get partial alignment with systems based on deep networks with complex representations. These might be something like an AI that won’t kill humans but will let us die out, or more subtle or arbitrary mixes of aligned and unaligned behavior.
That’s not particularly helpful, but it does point to a potentially important and relatively unaddressed question: how precise (and stable) does alignment need to be to get good results?
If anyone could point me to work on partial alignment, or the precision necessary for alignment, I’d appreciate it.
The probability of human survival is primarily driven by AI systems caring a small amount about humans (whether due to ECL, commonsense morality, complicated and messy values, acausal trade, or whatever—I find all of those plausible).
I haven’t thought deeply about this question, because a world where AI systems don’t care very much about humans seems pretty bad for humans in expectation. I do think it matters whether the probability we all literally die is 10% or 50% or 90%, but it doesn’t matter very much to my personal prioritization.
Interesting. Your numbers imply a pretty good chance of everyone not dying soon after an AI takeover. I’d imagine that’s either from a slow transition period in which humans are still useful, or from partially aligned AI. Partially successful alignment isn’t discussed much. It’s generally been assumed that we’d get alignment right or we won’t.
But it seems much more possible to get partial alignment with systems based on deep networks with complex representations. These might be something like an AI that won’t kill humans but will let us die out, or more subtle or arbitrary mixes of aligned and unaligned behavior.
That’s not particularly helpful, but it does point to a potentially important and relatively unaddressed question: how precise (and stable) does alignment need to be to get good results?
If anyone could point me to work on partial alignment, or the precision necessary for alignment, I’d appreciate it.
The probability of human survival is primarily driven by AI systems caring a small amount about humans (whether due to ECL, commonsense morality, complicated and messy values, acausal trade, or whatever—I find all of those plausible).
I haven’t thought deeply about this question, because a world where AI systems don’t care very much about humans seems pretty bad for humans in expectation. I do think it matters whether the probability we all literally die is 10% or 50% or 90%, but it doesn’t matter very much to my personal prioritization.