My response: I agree AIs will be able to coordinate with each other, but “ability to coordinate” seems like a continuous variable that we will apply pressure to incrementally, not something that we should expect to be roughly infinite right at the start. Current AIs are not able to “merge” with each other
As an ML platform engineer myself this is one of the strongest real mechanisms for AI control. Just like a safe program in Rust is proven not to have any memory leaks, if it is impossible for AIs to send each other any messages, they cannot coordinate. It’s fairly straightforward to make this almost impossible at a technical level—air gaps, short memories, structured sparse message schema, cached online resources.
Second, it is unclear why AIs would automatically unify with their copies. I think this idea is somewhat plausible on its face but I have yet to see any strong arguments for it. Moreover, it seems plausible that AIs will have indexical preferences, making them have different values even if they are copies of each other.
Technical measures here, like requiring models to use immutable weights that are digitally signed prevent this. You would use dedicated ASICs to check the digital signature vs a hardcoded public key and a mirror IC for the private signing, similar to current dedicated crypto ICs. This ASIC controls the power pins to the GPU/TPUs, it cannot get power if the model isn’t signed. Another strong target for control that is theoretically unbreakable regardless of capabilities. Note this architecture is in common use, the industry term is a ‘boot micro’ is what checks OS image signatures and authorizes power.
First, I deny the premise. It seems likely that there will be multiple competing AI projects with different training runs. More importantly, for each pre-training run, it seems likely that there will be differences among deployed AIs due to fine-tuning and post-training enhancements, yielding diversity among AIs in general.
This is a weaker argument. A SOTA AI model is a natural monopoly. It costs billions of dollars now, and presumably eventually trillions. Right now, “a big transformer network + a bunch of secret tricks” is simple enough to be replicated, but stronger models will probably start to resemble a spaghetti mess of many neural networks and functional software blocks. And the best model has inherent economic value—why pay for a license to anything but? Just distill it to the scale of the problems you have and use the distilled model, also distilled models presumably will use a “system N” topology, where the system 0 calls system 1 if it’s uncertain*, system 1 calls 2 if it’s uncertain, and so on until the Nth system is a superintelligence hosted in a large cluster that is expensive to query, but rarely needs to be queried for most tasks.
*uncertain about the anticipated EV distribution of actions given the current input state or poor predicted EV
My response: if true, we can take steps to mitigate this issue. For example, we can give AIs legal rights, lessening their motives to revolt. While I think this is a significant issue, I also think it’s tractable to solve.
This is not control, this is just giving up. You cannot have a system of legal rights when some of the citizens are inherently superior by an absurd margin.
Most literature on the economics of war generally predicts that going to war is worse than trying to compromise, assuming both parties are rational and open to compromise. This is mostly because:
War is wasteful. You need to spend resources fighting it, which could be productively spent doing other things.
War is risky. Unless you can win a war with certainty, you might lose the war after launching it, which is a very bad outcome if you have some degree of risk-aversion.
It depends on the resource ratio. If AI control mechanisms all work, the underlying technology still makes runaway advantages possible via exponential growth. For example, if one power bloc were able to double their resources every 2 years, and they started as a superpower on par with the USA and EU, then after 2 years they are now at parity with (USA + EU). The “loser” sides in this conflict could be a couple years late to AGI from excessive regulations, and lose a doubling cycle. Then they might be slow to authorize the vast amounts of land usage and temporary environmental pollution that a total war effort for the planet would look like, wasting a few cycles on slow government approvals while the winning side just throws away all the rules.
Nuclear weapons are an asymmetric weapon, as in it costs far more weapons to stop a single ICBM than the cost of a missile. There are also structural vulnerabilities in modern civilizations where specialized have to be crammed into a small geographic area.
Both limits go away with AGI for reasons I believe you, Matt, are smart enough to infer. So once a particular faction reaches some advantage ratio in resources, perhaps 10-100 times the rest of the planet, they can simply conquer the planet and eliminate everyone else as a competitor.
This is probably the ultimate outcome. I think the difference between my view and Eliezer’s is that I am imagining a power bloc, a world superpower, doing this using hundreds of millions of humans and many billions of robots, while Eliezer is imagining this insanely capable machine that started in a garage after escaping to the internet accomplishing this.
As an ML platform engineer myself this is one of the strongest real mechanisms for AI control. Just like a safe program in Rust is proven not to have any memory leaks, if it is impossible for AIs to send each other any messages, they cannot coordinate. It’s fairly straightforward to make this almost impossible at a technical level—air gaps, short memories, structured sparse message schema, cached online resources.
Technical measures here, like requiring models to use immutable weights that are digitally signed prevent this. You would use dedicated ASICs to check the digital signature vs a hardcoded public key and a mirror IC for the private signing, similar to current dedicated crypto ICs. This ASIC controls the power pins to the GPU/TPUs, it cannot get power if the model isn’t signed. Another strong target for control that is theoretically unbreakable regardless of capabilities. Note this architecture is in common use, the industry term is a ‘boot micro’ is what checks OS image signatures and authorizes power.
This is a weaker argument. A SOTA AI model is a natural monopoly. It costs billions of dollars now, and presumably eventually trillions. Right now, “a big transformer network + a bunch of secret tricks” is simple enough to be replicated, but stronger models will probably start to resemble a spaghetti mess of many neural networks and functional software blocks. And the best model has inherent economic value—why pay for a license to anything but? Just distill it to the scale of the problems you have and use the distilled model, also distilled models presumably will use a “system N” topology, where the system 0 calls system 1 if it’s uncertain*, system 1 calls 2 if it’s uncertain, and so on until the Nth system is a superintelligence hosted in a large cluster that is expensive to query, but rarely needs to be queried for most tasks.
*uncertain about the anticipated EV distribution of actions given the current input state or poor predicted EV
This is not control, this is just giving up. You cannot have a system of legal rights when some of the citizens are inherently superior by an absurd margin.
It depends on the resource ratio. If AI control mechanisms all work, the underlying technology still makes runaway advantages possible via exponential growth. For example, if one power bloc were able to double their resources every 2 years, and they started as a superpower on par with the USA and EU, then after 2 years they are now at parity with (USA + EU). The “loser” sides in this conflict could be a couple years late to AGI from excessive regulations, and lose a doubling cycle. Then they might be slow to authorize the vast amounts of land usage and temporary environmental pollution that a total war effort for the planet would look like, wasting a few cycles on slow government approvals while the winning side just throws away all the rules.
Nuclear weapons are an asymmetric weapon, as in it costs far more weapons to stop a single ICBM than the cost of a missile. There are also structural vulnerabilities in modern civilizations where specialized have to be crammed into a small geographic area.
Both limits go away with AGI for reasons I believe you, Matt, are smart enough to infer. So once a particular faction reaches some advantage ratio in resources, perhaps 10-100 times the rest of the planet, they can simply conquer the planet and eliminate everyone else as a competitor.
This is probably the ultimate outcome. I think the difference between my view and Eliezer’s is that I am imagining a power bloc, a world superpower, doing this using hundreds of millions of humans and many billions of robots, while Eliezer is imagining this insanely capable machine that started in a garage after escaping to the internet accomplishing this.