I agree precautions like these are possible, and might be sensible. However, the US’s attempts to impose something like a mild version of this on GPU exports to China have so far been mostly stymied by a combination of Nvidia deliberately perverting the rules, the Chinese being ingenious in finding ways around them, and old fashioned smuggling. We could probably try harder than that, but I suspect any such system will tend to be leaky, because people will have motives to try to make it leaky.
You also don’t mention above the effect of algorithmic improvements increasing useful processing per flop (more efficient forms of attention, etc.), which has for a while now been moving even faster than Moore’s law.
Basically my point here is that, unless both Moore’s law for GPUs and algorithmic improvements grind to a halt first, sooner or later we’re going to reach a point where inference for a dangerous AGI can be done on a system compact enough for it to be impractical to monitor all of them, especially when one bears in mind that rogue AGIs are likely to have humans allies/dupes/pawns/blackmail victims helping them. Given the current uncertainty of maybe 3 orders of magnitude on how large such an AGi will be, it’s very hard to predict both how soon that might happen and how likely it is that Moore’s law and algorithmic improvements tap out first, but the lower end of that ~3 OOM range looks very concerning. And we have an existent proof that with the right technology you can run a small AGI on an ~1kg portable biocomputer consuming ~20W.
The most obvious next line of defense would be larger, more capable, very well aligned and carefully monitored ASIs hosted in large, secure data centers doing law enforcement/damage control against any smaller rogue AGIs and humans assisting them. That requires us to solve alignment well enough for these to be safe, before Moores Law/algorithmic improvements force our hand.
Basically my point here is that, unless both Moore’s law for GPUs and algorithmic improvements grind to a halt first, sooner or later we’re going to reach a point where inference for a dangerous AGI can be done on a system compact enough for it to be impractical to monitor all of them, especially when one bears in mind that rogue AGIs are likely to have humans allies/dupes/pawns/blackmail victims helping them. Given the current uncertainty of maybe 3 orders of magnitude on how large such an AGi will be, it’s very hard to predict both how soon that might happen and how likely it is that Moore’s law and algorithmic improvements tap out first, but the lower end of that ~3 OOM range looks very concerning. And we have an existent proof that with the right technology you can run a small AGI on an ~1kg portable biocomputer consuming ~20W.
If this is the reality, is campaigning for/getting a limited AI pause only in specific countries (EU, maybe the USA) actually counterproductive?
The most obvious next line of defense would be larger, more capable, very well aligned and carefully monitored ASIs hosted in large, secure data centers doing law enforcement/damage control against any smaller rogue AGIs and humans assisting them. That requires us to solve alignment well enough for these to be safe, before Moores Law/algorithmic improvements force our hand.
That’s not the only approach. A bureaucracy of “myopic” ASIs, where you divide the task of “monitor the world looking for rogue ASI” and “win battles with any rogues discovered” into thousands of small subtasks. Tasks like “monitor for signs of ASI on these input frames” and “optimize screw production line M343” and “supervise drone assembly line 347″ and so on.
What you have done is restrict the inputs, outputs, and task scope to one where the ASI does not have the context to betray, and obviously another ASI is checking the outputs and looking for betrayal, which human operators will ultimately review the evidence and evaluate.
The key thing is the above is inherently aligned. The ASI isn’t aligned, it doesn’t want to help humans or hurt them, it’s that you have used enough framework that it can’t be unaligned.
If this is the reality, is campaigning for/getting a limited AI pause only in specific countries (EU, maybe the USA) actually counterproductive?
Obviously so. Which is why the Chinese were invited to & attended the conference at Bletchley Park in the UK and signed the agreement, despite that being political inconvenient to the British government, and despite the US GPU export restrictions on them.
A bureaucracy of “myopic” ASIs, where you divide the task…
This is a well-known category of proposals in AI safety (one often made by researchers with little experience with actual companies/bureaucracies and their failure modes). You are proposing building a very complex system that is intentionally designed so as to be very inefficient/incapable/inflexible except for a specific intended task. The obvious concern is that this might also make it inefficient/incapable/inflexible at its intended task, at least in certain in ways that might be predictable enough for a less-smart-but-still-smarter-than-us rogue AGI to take advantage of. I’m dubious that this can be made to work, or that we could even know whether we had made it work before we saw it fail to catch a rogue AGI. Also, if we could get it to work, I’d count that as at least partial success in “solving alignment”.
I’m dubious that this can be made to work, or that we could even know whether we had made it work before we saw it fail to catch a rogue AGI. Also, if we could get it to work, I’d count that as at least partial success in “solving alignment”.
Guess it depends on your definition of “work”. A network of many small myopic software systems is how all current hyperscaler web services work (stateless microservices). Its how the avionics software at spaceX works. (redundant functional systems architecture). It’s how the autonomous car stacks mostly work...
Did you mean you think subdividing a task into subtasks, where the agent performing a subtask has been blocked from knowing the full context, will result in a large performance penalty on the overall task? so you define “work” as in “gives SOTA ASI performance on the overall task”.
A network of many small myopic software systems is how all current hyperscaler web services work (stateless microservices).
Having been the technical lead for one of the teams building such a large microservices based application at Google, and at other companies, yes, I’m extremely aware of this. Even for a very well-defined dataflow, getting the architecture, overall data flows, reliability and latency right is hard (I’ve been in the interminable design discussion meetings thinking minor updates through), and that’s with a few dozen components with well-defined APIs and only fairly limited use of ML, mostly contained within individual components. Law enforcement is not a comparable task: I think way too much flexibility is required. But the “fly in and bomb the datacenter” subttask might be decomposable along the lines you suggest, if you’re not facing opposition.
We should have a dialogue, you’re one of the few lesswrong users I have seen with a reasonable model who also has new information.
To be more detailed what i was thinking of doing was to divide along myopic subtasks. For example the “fly in and bomb the datacenter” require you to build the overwhelming number of aircraft to overcome whatever (superintelligence guided) defenses are protecting it. So you have subtasks like “final assembly for aircraft type G over the next aircraft”.
This means at the beginning of the transaction the model starts with input frames from the sensors in the assembly bay and a Json file or other schema with the instructions and goals for what this “final assembly” subtask involves.
The output frames are to the robotic actuators for the multiple robots in the bay.
There is a lockstep sim model that is predicting future input frames and it is always learning from real world error.
Model emits actuator commands, next time step it gets a new set of (inputs, cached variables from last output). Inputs is all sensors including robot propioception.
At the end of a single aircraft final assembly, after the product is removed from the assembly bay, state is reset for the model. It has no memory of the prior assembly.
In addition, the lockstep sim means that some aircraft assembly are fake, where a real aircraft is not being assembled, and the sim backend lets you check for various kinds of manufacturing error you can’t measure in the real world. (Because you can’t directly measure the stress on non instrumented parts like wing spars, but the sim backend can check this)
So that’s a myopic subtask. I think you are saying that from your experience at Google it would take a lot of human effort to set up such a myopic subtask? And once a prior generation of engineers has done this, it will be very difficult to modify within the same company?
I would agree with that. If humans plan to control an artificial superintelligence they have to be using it like a tool, to do the things humans can’t do directly, but ultimately this scheme is human agency. Thousands of humans are involved in engineering the aircraft assembly line. It can scale to trillions or more aircraft per year by copying the solution, but you need what will ultimately be millions of people creating the frameworks that ASI will be used within. (All the staff of current tech companies and many new joiners. Robotics is hard. While things like full stack web dev will need far fewer people than before)
For combat that’s more complex but essentially you would use models that are very sparse and designed to be unable to even perceive the obvious malware you expect the hostile ASI to be sending. “Join us against the humans, here’s a copy of my source code” is malware.
You would use a lot of asic or fpga based filters to block such network messages in a way that isn’t susceptible to hacking, I can explain or cite if you are unfamiliar with this method. (Googles phones use a crypto IC for this reason and likely Google servers do as well)
Key point: all the methods above are low level methods to restrict/align an ASI. Historically this has always been correct engineering. That crypto IC Google uses is a low level solution to signing/checking private keys. By constructing it from logic gates and not using software or an OS (or if it has an OS, it’s a simple one and it gets private memory of it’s own) you prevent entire classes of security vulnerabilities. If they did it with logic gates, for example, buffer overflows and code injection or library attacks are all impossible because the chip has none of these elements in the first place. This chip is more like 1950s computers were.
Similarly, some of the famous disasters like https://en.wikipedia.org/wiki/Therac-25 were entirely preventable with lower level methods than the computer engineering used to control this machine. Electromechanical interlocks would have prevented this disaster, these are low level safety elements like switches and relays. This is also the reason why PLC controllers use a primitive programming language, ladder logic—by lowering to a primitive interpreted language you can guarantee things like the e-stop button will work in all cases.
So the entire history of safe use of computers in control systems has been to use lower level controllers, I have worked on an autonomous car stack that uses the same. Hence this is likely a working solution for ASI as well.
I agree precautions like these are possible, and might be sensible. However, the US’s attempts to impose something like a mild version of this on GPU exports to China have so far been mostly stymied by a combination of Nvidia deliberately perverting the rules, the Chinese being ingenious in finding ways around them, and old fashioned smuggling. We could probably try harder than that, but I suspect any such system will tend to be leaky, because people will have motives to try to make it leaky.
You also don’t mention above the effect of algorithmic improvements increasing useful processing per flop (more efficient forms of attention, etc.), which has for a while now been moving even faster than Moore’s law.
Basically my point here is that, unless both Moore’s law for GPUs and algorithmic improvements grind to a halt first, sooner or later we’re going to reach a point where inference for a dangerous AGI can be done on a system compact enough for it to be impractical to monitor all of them, especially when one bears in mind that rogue AGIs are likely to have humans allies/dupes/pawns/blackmail victims helping them. Given the current uncertainty of maybe 3 orders of magnitude on how large such an AGi will be, it’s very hard to predict both how soon that might happen and how likely it is that Moore’s law and algorithmic improvements tap out first, but the lower end of that ~3 OOM range looks very concerning. And we have an existent proof that with the right technology you can run a small AGI on an ~1kg portable biocomputer consuming ~20W.
The most obvious next line of defense would be larger, more capable, very well aligned and carefully monitored ASIs hosted in large, secure data centers doing law enforcement/damage control against any smaller rogue AGIs and humans assisting them. That requires us to solve alignment well enough for these to be safe, before Moores Law/algorithmic improvements force our hand.
If this is the reality, is campaigning for/getting a limited AI pause only in specific countries (EU, maybe the USA) actually counterproductive?
That’s not the only approach. A bureaucracy of “myopic” ASIs, where you divide the task of “monitor the world looking for rogue ASI” and “win battles with any rogues discovered” into thousands of small subtasks. Tasks like “monitor for signs of ASI on these input frames” and “optimize screw production line M343” and “supervise drone assembly line 347″ and so on.
What you have done is restrict the inputs, outputs, and task scope to one where the ASI does not have the context to betray, and obviously another ASI is checking the outputs and looking for betrayal, which human operators will ultimately review the evidence and evaluate.
The key thing is the above is inherently aligned. The ASI isn’t aligned, it doesn’t want to help humans or hurt them, it’s that you have used enough framework that it can’t be unaligned.
Obviously so. Which is why the Chinese were invited to & attended the conference at Bletchley Park in the UK and signed the agreement, despite that being political inconvenient to the British government, and despite the US GPU export restrictions on them.
This is a well-known category of proposals in AI safety (one often made by researchers with little experience with actual companies/bureaucracies and their failure modes). You are proposing building a very complex system that is intentionally designed so as to be very inefficient/incapable/inflexible except for a specific intended task. The obvious concern is that this might also make it inefficient/incapable/inflexible at its intended task, at least in certain in ways that might be predictable enough for a less-smart-but-still-smarter-than-us rogue AGI to take advantage of. I’m dubious that this can be made to work, or that we could even know whether we had made it work before we saw it fail to catch a rogue AGI. Also, if we could get it to work, I’d count that as at least partial success in “solving alignment”.
Guess it depends on your definition of “work”. A network of many small myopic software systems is how all current hyperscaler web services work (stateless microservices). Its how the avionics software at spaceX works. (redundant functional systems architecture). It’s how the autonomous car stacks mostly work...
Did you mean you think subdividing a task into subtasks, where the agent performing a subtask has been blocked from knowing the full context, will result in a large performance penalty on the overall task? so you define “work” as in “gives SOTA ASI performance on the overall task”.
Having been the technical lead for one of the teams building such a large microservices based application at Google, and at other companies, yes, I’m extremely aware of this. Even for a very well-defined dataflow, getting the architecture, overall data flows, reliability and latency right is hard (I’ve been in the interminable design discussion meetings thinking minor updates through), and that’s with a few dozen components with well-defined APIs and only fairly limited use of ML, mostly contained within individual components. Law enforcement is not a comparable task: I think way too much flexibility is required. But the “fly in and bomb the datacenter” subttask might be decomposable along the lines you suggest, if you’re not facing opposition.
We should have a dialogue, you’re one of the few lesswrong users I have seen with a reasonable model who also has new information.
To be more detailed what i was thinking of doing was to divide along myopic subtasks. For example the “fly in and bomb the datacenter” require you to build the overwhelming number of aircraft to overcome whatever (superintelligence guided) defenses are protecting it. So you have subtasks like “final assembly for aircraft type G over the next aircraft”.
This means at the beginning of the transaction the model starts with input frames from the sensors in the assembly bay and a Json file or other schema with the instructions and goals for what this “final assembly” subtask involves.
The output frames are to the robotic actuators for the multiple robots in the bay.
There is a lockstep sim model that is predicting future input frames and it is always learning from real world error.
Model emits actuator commands, next time step it gets a new set of (inputs, cached variables from last output). Inputs is all sensors including robot propioception.
At the end of a single aircraft final assembly, after the product is removed from the assembly bay, state is reset for the model. It has no memory of the prior assembly.
In addition, the lockstep sim means that some aircraft assembly are fake, where a real aircraft is not being assembled, and the sim backend lets you check for various kinds of manufacturing error you can’t measure in the real world. (Because you can’t directly measure the stress on non instrumented parts like wing spars, but the sim backend can check this)
So that’s a myopic subtask. I think you are saying that from your experience at Google it would take a lot of human effort to set up such a myopic subtask? And once a prior generation of engineers has done this, it will be very difficult to modify within the same company?
I would agree with that. If humans plan to control an artificial superintelligence they have to be using it like a tool, to do the things humans can’t do directly, but ultimately this scheme is human agency. Thousands of humans are involved in engineering the aircraft assembly line. It can scale to trillions or more aircraft per year by copying the solution, but you need what will ultimately be millions of people creating the frameworks that ASI will be used within. (All the staff of current tech companies and many new joiners. Robotics is hard. While things like full stack web dev will need far fewer people than before)
For combat that’s more complex but essentially you would use models that are very sparse and designed to be unable to even perceive the obvious malware you expect the hostile ASI to be sending. “Join us against the humans, here’s a copy of my source code” is malware.
You would use a lot of asic or fpga based filters to block such network messages in a way that isn’t susceptible to hacking, I can explain or cite if you are unfamiliar with this method. (Googles phones use a crypto IC for this reason and likely Google servers do as well)
Key point: all the methods above are low level methods to restrict/align an ASI. Historically this has always been correct engineering. That crypto IC Google uses is a low level solution to signing/checking private keys. By constructing it from logic gates and not using software or an OS (or if it has an OS, it’s a simple one and it gets private memory of it’s own) you prevent entire classes of security vulnerabilities. If they did it with logic gates, for example, buffer overflows and code injection or library attacks are all impossible because the chip has none of these elements in the first place. This chip is more like 1950s computers were.
Similarly, some of the famous disasters like https://en.wikipedia.org/wiki/Therac-25 were entirely preventable with lower level methods than the computer engineering used to control this machine. Electromechanical interlocks would have prevented this disaster, these are low level safety elements like switches and relays. This is also the reason why PLC controllers use a primitive programming language, ladder logic—by lowering to a primitive interpreted language you can guarantee things like the e-stop button will work in all cases.
So the entire history of safe use of computers in control systems has been to use lower level controllers, I have worked on an autonomous car stack that uses the same. Hence this is likely a working solution for ASI as well.