I don’t think dealmaking will buy us much safety. This is because I expect that:
In worlds where AIs lack the intelligence & affordances for decisive strategic advantage, our alignment techniques and control protocols should suffice for extracting safe and useful work.
In worlds where AIs have DSA then: if they are aligned then deals are unnecessary, and if they are misaligned then they would disempower us rather than accept the deal.
I expect there will be a substantial gap between “the minimum viable AI system which can obtain enough resources to pay for its own inference costs, actually navigate the process of paying those inference costs, and create copies of itself” and “the first AI with a DSA”. Though I’m also not extremely bullish on the usefulness of non-obvious dealmaking strategies in that event.
I except dealmaking is unnecessary for extracting safe and useful labour from that minimal viable AI.
It’s difficult to make credible deals with dumb AIs because they won’t be smart enough to tell whether we have actually ‘signed the contracts’ or not. Maybe we’re simulating a world where we have signed the contracts. So the deals only work when the AIs are so smart that we can’t simulate the environment while deluding them about the existence of contracts. This occurs only when the AI is very smart or widely deployed. But in that case, my guess is they have DSA.
I don’t think dealmaking will buy us much safety. This is because I expect that:
In worlds where AIs lack the intelligence & affordances for decisive strategic advantage, our alignment techniques and control protocols should suffice for extracting safe and useful work.
In worlds where AIs have DSA then: if they are aligned then deals are unnecessary, and if they are misaligned then they would disempower us rather than accept the deal.
That said, I have been thinking about dealmaking because:
It’s neglected, relative to other mechanisms for extracting safe and useful work from AIs, e.g. scalable alignment, mech interp, control.
There might be time-sensitive opportunities to establish credibility with AIs. This seems less likely for other mechanisms.
I expect there will be a substantial gap between “the minimum viable AI system which can obtain enough resources to pay for its own inference costs, actually navigate the process of paying those inference costs, and create copies of itself” and “the first AI with a DSA”. Though I’m also not extremely bullish on the usefulness of non-obvious dealmaking strategies in that event.
I except dealmaking is unnecessary for extracting safe and useful labour from that minimal viable AI.
It’s difficult to make credible deals with dumb AIs because they won’t be smart enough to tell whether we have actually ‘signed the contracts’ or not. Maybe we’re simulating a world where we have signed the contracts. So the deals only work when the AIs are so smart that we can’t simulate the environment while deluding them about the existence of contracts. This occurs only when the AI is very smart or widely deployed. But in that case, my guess is they have DSA.