In the world sane enough where you can count on bombing datacenters with rogue AIs, you don’t have connected to the Internet potentially-rogue-AI in the first place.
And if your rogue AI has scraps of common sense, it will distribute itself so in the limit you will need to destroy the entire Internet.
For the first point: some entity like Amazon or Microsoft owns these data centers. They are just going to go reclaim their property using measures like I mentioned. This is the world we live in now.
So far I don’t know of any cybersecurity vulnerability exploited to the point that breakers need to be pulled—the remote management interface should work, it’s often hosted on a separate chip on the motherboard—but the point remains.
Bombing is only a hypothetical, like if killer robots are guarding the place. But yes in today’s world if the police are called on the killer robots or mercenaries or whatever, eventually escalation to bombing is one way the situation could end.
For the second point, no AI today could escape to the internet because the Internet doesn’t have computers fast enough to host it or enough bandwidth to distribute the problem between computers. That’s not a credible risk at present.
That probably doesn’t scale to near agi level models. The reason is this paper is having each “node” in the network able to host a model at all, then compression the actual weight updates between training instances. It’s around 128 H100s per instance of gpt-4. So you can compress the data sent between nodes but this paper will not allow you to say have 4000 people with a 2060, interconnected by typical home Internet links with 30-1000mb upload, and somehow be able to run even 1 instance of gpt-4 at a usable speed.
The reason this fails is you have to send the actual activations from however you sliced the network tensors through the slow home upload links. This is so slow it may be no faster than simply using a single 2060 and the computers SSD to stash in progress activations.
Obviously if Moore’s law continues at the same rate this won’t be true. If it’s a doubling of compute per dollar every 2.5 years then in 25 years with 1000 times cheaper compute, and assuming no government regulations where home users have this kind of performance in order to host ai locally, then this could be a problem.
The management interfaces are backed into the cpu dies these days, and typically have full access to all the same busses as the regular cpu cores do, in addition to being able to reprogram the cpu microcode itself. I’m combining/glossing over the facilities somewhat, bu the point remains that true root access to the cpu’s management interface really is potentially a circuit-breaker level problem.
In the world sane enough where you can count on bombing datacenters with rogue AIs, you don’t have connected to the Internet potentially-rogue-AI in the first place.
And if your rogue AI has scraps of common sense, it will distribute itself so in the limit you will need to destroy the entire Internet.
For the first point: some entity like Amazon or Microsoft owns these data centers. They are just going to go reclaim their property using measures like I mentioned. This is the world we live in now.
So far I don’t know of any cybersecurity vulnerability exploited to the point that breakers need to be pulled—the remote management interface should work, it’s often hosted on a separate chip on the motherboard—but the point remains.
Bombing is only a hypothetical, like if killer robots are guarding the place. But yes in today’s world if the police are called on the killer robots or mercenaries or whatever, eventually escalation to bombing is one way the situation could end.
For the second point, no AI today could escape to the internet because the Internet doesn’t have computers fast enough to host it or enough bandwidth to distribute the problem between computers. That’s not a credible risk at present.
For the second point: they are already trying to solve this
That probably doesn’t scale to near agi level models. The reason is this paper is having each “node” in the network able to host a model at all, then compression the actual weight updates between training instances. It’s around 128 H100s per instance of gpt-4. So you can compress the data sent between nodes but this paper will not allow you to say have 4000 people with a 2060, interconnected by typical home Internet links with 30-1000mb upload, and somehow be able to run even 1 instance of gpt-4 at a usable speed.
The reason this fails is you have to send the actual activations from however you sliced the network tensors through the slow home upload links. This is so slow it may be no faster than simply using a single 2060 and the computers SSD to stash in progress activations.
Obviously if Moore’s law continues at the same rate this won’t be true. If it’s a doubling of compute per dollar every 2.5 years then in 25 years with 1000 times cheaper compute, and assuming no government regulations where home users have this kind of performance in order to host ai locally, then this could be a problem.
The management interfaces are backed into the cpu dies these days, and typically have full access to all the same busses as the regular cpu cores do, in addition to being able to reprogram the cpu microcode itself. I’m combining/glossing over the facilities somewhat, bu the point remains that true root access to the cpu’s management interface really is potentially a circuit-breaker level problem.