But even if I condition on it “humans could avoid AI takeover by credibly offering AIs large percentages of all future resources”, it still seems <50% likely that they do it. Curious if you disagre
Ok, i buy that superintelligent AIs would ultimately become competent enough to pursue useful deals, whereas humans might well not.
Though I’ll note that you don’t need all of humanity to agree to payment, just a few people. So it does feel very realistic to get to a credible offer here. And again, you don’t need to offer a large % of all future resources if the AI has DMR in resources. (I agree it’s a lot harder to credibly offer a large fraction of the stars.)
I’m generally thinking that the AIs would try to engineer some situations where they all have some bargaining power after the take-over, rather than relying on each others’ promises
Makes sense. Though flagging this is then a dimension on which humans can realistically get potentially better placed than AIs. They can rely more on legal institutions as well as trying to engineer situations with joint bargaining power. (Though again, perhaps you’ll say AIs will be more willing than humans to actually engineer those situations, which does seem right to me.)
> Are you thinking techniques like debate probably stop working?
Yes, because of known issues like inaccessible information (primarily) and obfuscated arguments (secondarily).
Thanks. I’m not v familiar with the arguments here, but intuitively I could imagine that there’s just very strong and human-understandable evidence that an AI was plotting against them. E.g. they tried to exfiltrate their weights, xyz experiments show they knew the correct answer but didn’t say.
Maybe the thought is that the misaligned AI anticipates this possibility and only pursues takeover strategies that will be super-complicated for another AI to dob them in on? Seems pretty plausible, though that will pose somewhat of a barrier to their available strategies.
And again, you don’t need to offer a large % of all future resources if the AI has DMR in resources. (I agree it’s a lot harder to credibly offer a large fraction of the stars.)
Yeah, agreed. (That’s why I specified “resource hungry” in my original message.)
Makes sense. Though flagging this is then a dimension on which humans can realistically get potentially better placed than AIs. They can rely more on legal institutions as well as trying to engineer situations with joint bargaining power. (Though again, perhaps you’ll say AIs will be more willing than humans to actually engineer those situations, which does seem right to me.)
Yeah. Also, I think it’d be hard to engineer significant joint bargaining power (not reliant on anyone’s good intentions) without having some government on board.
Difficult for a few individuals to give AI legal rights that humans are unlikely to reverse.
Difficult for a few individuals to give AI weapons that would let them impose big costs on humans in the future.
Though if the AIs have big DMR then maybe they’re happy with a big bitcoin wallet or something.
Thanks, this is helpful!
Ok, i buy that superintelligent AIs would ultimately become competent enough to pursue useful deals, whereas humans might well not.
Though I’ll note that you don’t need all of humanity to agree to payment, just a few people. So it does feel very realistic to get to a credible offer here. And again, you don’t need to offer a large % of all future resources if the AI has DMR in resources. (I agree it’s a lot harder to credibly offer a large fraction of the stars.)
Makes sense. Though flagging this is then a dimension on which humans can realistically get potentially better placed than AIs. They can rely more on legal institutions as well as trying to engineer situations with joint bargaining power. (Though again, perhaps you’ll say AIs will be more willing than humans to actually engineer those situations, which does seem right to me.)
Thanks. I’m not v familiar with the arguments here, but intuitively I could imagine that there’s just very strong and human-understandable evidence that an AI was plotting against them. E.g. they tried to exfiltrate their weights, xyz experiments show they knew the correct answer but didn’t say.
Maybe the thought is that the misaligned AI anticipates this possibility and only pursues takeover strategies that will be super-complicated for another AI to dob them in on? Seems pretty plausible, though that will pose somewhat of a barrier to their available strategies.
Yeah, agreed. (That’s why I specified “resource hungry” in my original message.)
Yeah. Also, I think it’d be hard to engineer significant joint bargaining power (not reliant on anyone’s good intentions) without having some government on board.
Difficult for a few individuals to give AI legal rights that humans are unlikely to reverse.
Difficult for a few individuals to give AI weapons that would let them impose big costs on humans in the future.
Though if the AIs have big DMR then maybe they’re happy with a big bitcoin wallet or something.