I’m in favor of trying to offer deals with the AIs.
I don’t think it reliably prevents AI takeover. The situation looks pretty rough if the AIs are far smarter than humans, widely deployed, and resource-hungry. Because:
It’s pretty likely that they’ll be able to communicate with each other through one route or another.
It seems intuitively unlikely that humans will credibly offer AIs large percentages of all future resources. (And if an argument for hope relies on us doing that, I think that should be clearly flagged, because that’s still a significant loss of longtermist value.)
At some level of AI capability, we would probably be unable to adjudicate arguments about which factions are misaligned or about what technical proposals would actually leave us in charge vs. disempowered.
It’s pretty likely that they’ll be able to communicate with each other through one route or another.
Agreed, though at best they’ll be equally capable at communicating with each as they are at communicating with humans. So this points to parity in deal-making ability (edited to add: on the dimension of communication).
It seems intuitively unlikely that humans will credibly offer AIs large percentages of all future resources. (And if an argument for hope relies on us doing that, I think that should be clearly flagged, because that’s still a significant loss of longtermist value.)
Humans will in some ways have an easier time credibly offering AIs significant resources. They can use legal institutions that they are committed to upholding. Not only will a misaligned AI not be able to use those institutions. It’ll be explicitly aiming to break the law and lie to humans to seize power, making its “promises” to other AIs less credible. This is similar to how after revolutions the “revolting faction” often turns in on itself as the rule of law has been undermined, and similar to how there are some countries with outsized numbers of coups.
Also, you don’t need to offer a large % of future resources if the superintelligent AI has DMR in resources.
Anyway, on this front it looks to me like humans are at an advantage overall at dealmaking, even relative to a superintelligent AI. (Though there’s a lot of uncertainty here and I could easily imagine changing my mind – e.g. perhaps superintelligent AI could make and use commitment tech without humans realising but humans would refuse to use that same tech or wouldn’t know about its existence.)
At some level of AI capability, we would probably be unable to adjudicate arguments about which factions are misaligned or about what technical proposals would actually leave us in charge vs. disempowered.
Seems v plausible, but why ‘probably’? Are you thinking techniques like debate probably stop working?
Wanna try your hand at writing a 5-page scenario, perhaps a branch off of AI 2027, illustrating what you think this path to victory might look like? (Same thing I asked of Vitalik: https://x.com/DKokotajlo/status/1943802695464497383 )
Your analysis is focused on whether humans or misaligned AI are at an overall better position at giving out certain deals. But even if I condition on it “humans could avoid AI takeover by credibly offering AIs large percentages of all future resources”, it still seems <50% likely that they do it. Curious if you disagree. (In general, if I thought humans were going to act rationally and competently to prevent AI takeover risk, I think that would cut the risk in significantly more than half. There’s tons of stuff that we could do to reduce the risk that I doubt we’ll do.)
Maybe there’s some argument along the lines of “just like humans are likely to mess up in their attempts to prevent AI takeover risk (like failing to offer deals), AIs are likely to mess up in their attempts to take over (like failing to make deals with each other), so this doesn’t cut asymmetrically towards making deals-between-AIs more likely”. Maybe, I haven’t though much about this argument. My first-pass answer would be “we’ll just keep making them smarter until they stop messing up”.
If you wrote a vignette like Daniel suggests, where humans do end up making deals, that might help me feel like it’s more intuitively likely to happen.
Minor points:
It’ll be explicitly aiming to break the law and lie to humans to seize power, making its “promises” to other AIs less credible.
I’m generally thinking that the AIs would try to engineer some situations where they all have some bargaining power after the take-over, rather than relying on each others’ promises. If you could establish that’s very difficult to do, that’d make me think the “coordinated takeover” seemed meaningfully less likely.
Seems v plausible, but why ‘probably’? Are you thinking techniques like debate probably stop working?
Yes, because of known issues like inaccessible information (primarily) and obfuscated arguments (secondarily).
But even if I condition on it “humans could avoid AI takeover by credibly offering AIs large percentages of all future resources”, it still seems <50% likely that they do it. Curious if you disagre
Ok, i buy that superintelligent AIs would ultimately become competent enough to pursue useful deals, whereas humans might well not.
Though I’ll note that you don’t need all of humanity to agree to payment, just a few people. So it does feel very realistic to get to a credible offer here. And again, you don’t need to offer a large % of all future resources if the AI has DMR in resources. (I agree it’s a lot harder to credibly offer a large fraction of the stars.)
I’m generally thinking that the AIs would try to engineer some situations where they all have some bargaining power after the take-over, rather than relying on each others’ promises
Makes sense. Though flagging this is then a dimension on which humans can realistically get potentially better placed than AIs. They can rely more on legal institutions as well as trying to engineer situations with joint bargaining power. (Though again, perhaps you’ll say AIs will be more willing than humans to actually engineer those situations, which does seem right to me.)
> Are you thinking techniques like debate probably stop working?
Yes, because of known issues like inaccessible information (primarily) and obfuscated arguments (secondarily).
Thanks. I’m not v familiar with the arguments here, but intuitively I could imagine that there’s just very strong and human-understandable evidence that an AI was plotting against them. E.g. they tried to exfiltrate their weights, xyz experiments show they knew the correct answer but didn’t say.
Maybe the thought is that the misaligned AI anticipates this possibility and only pursues takeover strategies that will be super-complicated for another AI to dob them in on? Seems pretty plausible, though that will pose somewhat of a barrier to their available strategies.
And again, you don’t need to offer a large % of all future resources if the AI has DMR in resources. (I agree it’s a lot harder to credibly offer a large fraction of the stars.)
Yeah, agreed. (That’s why I specified “resource hungry” in my original message.)
Makes sense. Though flagging this is then a dimension on which humans can realistically get potentially better placed than AIs. They can rely more on legal institutions as well as trying to engineer situations with joint bargaining power. (Though again, perhaps you’ll say AIs will be more willing than humans to actually engineer those situations, which does seem right to me.)
Yeah. Also, I think it’d be hard to engineer significant joint bargaining power (not reliant on anyone’s good intentions) without having some government on board.
Difficult for a few individuals to give AI legal rights that humans are unlikely to reverse.
Difficult for a few individuals to give AI weapons that would let them impose big costs on humans in the future.
Though if the AIs have big DMR then maybe they’re happy with a big bitcoin wallet or something.
I’m in favor of trying to offer deals with the AIs.
I don’t think it reliably prevents AI takeover. The situation looks pretty rough if the AIs are far smarter than humans, widely deployed, and resource-hungry. Because:
It’s pretty likely that they’ll be able to communicate with each other through one route or another.
It seems intuitively unlikely that humans will credibly offer AIs large percentages of all future resources. (And if an argument for hope relies on us doing that, I think that should be clearly flagged, because that’s still a significant loss of longtermist value.)
At some level of AI capability, we would probably be unable to adjudicate arguments about which factions are misaligned or about what technical proposals would actually leave us in charge vs. disempowered.
Agreed, though at best they’ll be equally capable at communicating with each as they are at communicating with humans. So this points to parity in deal-making ability (edited to add: on the dimension of communication).
Humans will in some ways have an easier time credibly offering AIs significant resources. They can use legal institutions that they are committed to upholding. Not only will a misaligned AI not be able to use those institutions. It’ll be explicitly aiming to break the law and lie to humans to seize power, making its “promises” to other AIs less credible. This is similar to how after revolutions the “revolting faction” often turns in on itself as the rule of law has been undermined, and similar to how there are some countries with outsized numbers of coups.
Also, you don’t need to offer a large % of future resources if the superintelligent AI has DMR in resources.
Anyway, on this front it looks to me like humans are at an advantage overall at dealmaking, even relative to a superintelligent AI. (Though there’s a lot of uncertainty here and I could easily imagine changing my mind – e.g. perhaps superintelligent AI could make and use commitment tech without humans realising but humans would refuse to use that same tech or wouldn’t know about its existence.)
Seems v plausible, but why ‘probably’? Are you thinking techniques like debate probably stop working?
Wanna try your hand at writing a 5-page scenario, perhaps a branch off of AI 2027, illustrating what you think this path to victory might look like?
(Same thing I asked of Vitalik: https://x.com/DKokotajlo/status/1943802695464497383 )
Your analysis is focused on whether humans or misaligned AI are at an overall better position at giving out certain deals. But even if I condition on it “humans could avoid AI takeover by credibly offering AIs large percentages of all future resources”, it still seems <50% likely that they do it. Curious if you disagree. (In general, if I thought humans were going to act rationally and competently to prevent AI takeover risk, I think that would cut the risk in significantly more than half. There’s tons of stuff that we could do to reduce the risk that I doubt we’ll do.)
Maybe there’s some argument along the lines of “just like humans are likely to mess up in their attempts to prevent AI takeover risk (like failing to offer deals), AIs are likely to mess up in their attempts to take over (like failing to make deals with each other), so this doesn’t cut asymmetrically towards making deals-between-AIs more likely”. Maybe, I haven’t though much about this argument. My first-pass answer would be “we’ll just keep making them smarter until they stop messing up”.
If you wrote a vignette like Daniel suggests, where humans do end up making deals, that might help me feel like it’s more intuitively likely to happen.
Minor points:
I’m generally thinking that the AIs would try to engineer some situations where they all have some bargaining power after the take-over, rather than relying on each others’ promises. If you could establish that’s very difficult to do, that’d make me think the “coordinated takeover” seemed meaningfully less likely.
Yes, because of known issues like inaccessible information (primarily) and obfuscated arguments (secondarily).
Thanks, this is helpful!
Ok, i buy that superintelligent AIs would ultimately become competent enough to pursue useful deals, whereas humans might well not.
Though I’ll note that you don’t need all of humanity to agree to payment, just a few people. So it does feel very realistic to get to a credible offer here. And again, you don’t need to offer a large % of all future resources if the AI has DMR in resources. (I agree it’s a lot harder to credibly offer a large fraction of the stars.)
Makes sense. Though flagging this is then a dimension on which humans can realistically get potentially better placed than AIs. They can rely more on legal institutions as well as trying to engineer situations with joint bargaining power. (Though again, perhaps you’ll say AIs will be more willing than humans to actually engineer those situations, which does seem right to me.)
Thanks. I’m not v familiar with the arguments here, but intuitively I could imagine that there’s just very strong and human-understandable evidence that an AI was plotting against them. E.g. they tried to exfiltrate their weights, xyz experiments show they knew the correct answer but didn’t say.
Maybe the thought is that the misaligned AI anticipates this possibility and only pursues takeover strategies that will be super-complicated for another AI to dob them in on? Seems pretty plausible, though that will pose somewhat of a barrier to their available strategies.
Yeah, agreed. (That’s why I specified “resource hungry” in my original message.)
Yeah. Also, I think it’d be hard to engineer significant joint bargaining power (not reliant on anyone’s good intentions) without having some government on board.
Difficult for a few individuals to give AI legal rights that humans are unlikely to reverse.
Difficult for a few individuals to give AI weapons that would let them impose big costs on humans in the future.
Though if the AIs have big DMR then maybe they’re happy with a big bitcoin wallet or something.