I don’t think anyone is against incremental progress. It’s just that if after incremental progress AI takes over, then it’s not good enough alignment. And what’s the source of confidence in it being enough?
“Final or nonexistent” seems to be appropriate for scheming detection—if you missed only one way for AI to hide it’s intentions, it will take over. So yes, degree of scheming in broad sense and how much you can prevent it is a crux and other things depend on it. Again, I don’t see how you can be confident that future AI wouldn’t scheme.
It’s just that if after incremental progress AI takes over,
Why would that be discontinuous?
if you missed only one way for AI to hide it’s intentions, it will take over.
Assuming it has an intention, and a malign one. Deception depends on a chain of assumptions. They all have to be well over 90% to lead to a conclusion of near certain doom.
Again, I don’t see how you can be confident that future AI wouldn’t scheme.
I’m not arguing for 0% p(doom) , I’m arguing against 99%.
If all AIs are scheming, they can take over together. If a world with a powerful AI that is actually on humanity’s side is assumed instead, then at some level of power of friendly AI you probably can run unaligned AI and it will not be able to do much harm. But just assuming there being many AIs doesn’t solve scheming by itself—if training actually works as bad as predicted, then no AI of many would be aligned enough.
I don’t think anyone is against incremental progress. It’s just that if after incremental progress AI takes over, then it’s not good enough alignment. And what’s the source of confidence in it being enough?
“Final or nonexistent” seems to be appropriate for scheming detection—if you missed only one way for AI to hide it’s intentions, it will take over. So yes, degree of scheming in broad sense and how much you can prevent it is a crux and other things depend on it. Again, I don’t see how you can be confident that future AI wouldn’t scheme.
Why would that be discontinuous?
Assuming it has an intention, and a malign one. Deception depends on a chain of assumptions. They all have to be well over 90% to lead to a conclusion of near certain doom.
I’m not arguing for 0% p(doom) , I’m arguing against 99%.
Because incremental progress missed deception.
I agree such confidence lacks justification.
I’m talking about the how of takeover. Could any AI, even one of many, take over successfully in its first attempt?
If all AIs are scheming, they can take over together. If a world with a powerful AI that is actually on humanity’s side is assumed instead, then at some level of power of friendly AI you probably can run unaligned AI and it will not be able to do much harm. But just assuming there being many AIs doesn’t solve scheming by itself—if training actually works as bad as predicted, then no AI of many would be aligned enough.
All AI’s scheming co-operatively is less likely than on scheming.