We should have a dialogue, you’re one of the few lesswrong users I have seen with a reasonable model who also has new information.
To be more detailed what i was thinking of doing was to divide along myopic subtasks. For example the “fly in and bomb the datacenter” require you to build the overwhelming number of aircraft to overcome whatever (superintelligence guided) defenses are protecting it. So you have subtasks like “final assembly for aircraft type G over the next aircraft”.
This means at the beginning of the transaction the model starts with input frames from the sensors in the assembly bay and a Json file or other schema with the instructions and goals for what this “final assembly” subtask involves.
The output frames are to the robotic actuators for the multiple robots in the bay.
There is a lockstep sim model that is predicting future input frames and it is always learning from real world error.
Model emits actuator commands, next time step it gets a new set of (inputs, cached variables from last output). Inputs is all sensors including robot propioception.
At the end of a single aircraft final assembly, after the product is removed from the assembly bay, state is reset for the model. It has no memory of the prior assembly.
In addition, the lockstep sim means that some aircraft assembly are fake, where a real aircraft is not being assembled, and the sim backend lets you check for various kinds of manufacturing error you can’t measure in the real world. (Because you can’t directly measure the stress on non instrumented parts like wing spars, but the sim backend can check this)
So that’s a myopic subtask. I think you are saying that from your experience at Google it would take a lot of human effort to set up such a myopic subtask? And once a prior generation of engineers has done this, it will be very difficult to modify within the same company?
I would agree with that. If humans plan to control an artificial superintelligence they have to be using it like a tool, to do the things humans can’t do directly, but ultimately this scheme is human agency. Thousands of humans are involved in engineering the aircraft assembly line. It can scale to trillions or more aircraft per year by copying the solution, but you need what will ultimately be millions of people creating the frameworks that ASI will be used within. (All the staff of current tech companies and many new joiners. Robotics is hard. While things like full stack web dev will need far fewer people than before)
For combat that’s more complex but essentially you would use models that are very sparse and designed to be unable to even perceive the obvious malware you expect the hostile ASI to be sending. “Join us against the humans, here’s a copy of my source code” is malware.
You would use a lot of asic or fpga based filters to block such network messages in a way that isn’t susceptible to hacking, I can explain or cite if you are unfamiliar with this method. (Googles phones use a crypto IC for this reason and likely Google servers do as well)
Key point: all the methods above are low level methods to restrict/align an ASI. Historically this has always been correct engineering. That crypto IC Google uses is a low level solution to signing/checking private keys. By constructing it from logic gates and not using software or an OS (or if it has an OS, it’s a simple one and it gets private memory of it’s own) you prevent entire classes of security vulnerabilities. If they did it with logic gates, for example, buffer overflows and code injection or library attacks are all impossible because the chip has none of these elements in the first place. This chip is more like 1950s computers were.
Similarly, some of the famous disasters like https://en.wikipedia.org/wiki/Therac-25 were entirely preventable with lower level methods than the computer engineering used to control this machine. Electromechanical interlocks would have prevented this disaster, these are low level safety elements like switches and relays. This is also the reason why PLC controllers use a primitive programming language, ladder logic—by lowering to a primitive interpreted language you can guarantee things like the e-stop button will work in all cases.
So the entire history of safe use of computers in control systems has been to use lower level controllers, I have worked on an autonomous car stack that uses the same. Hence this is likely a working solution for ASI as well.
We should have a dialogue, you’re one of the few lesswrong users I have seen with a reasonable model who also has new information.
To be more detailed what i was thinking of doing was to divide along myopic subtasks. For example the “fly in and bomb the datacenter” require you to build the overwhelming number of aircraft to overcome whatever (superintelligence guided) defenses are protecting it. So you have subtasks like “final assembly for aircraft type G over the next aircraft”.
This means at the beginning of the transaction the model starts with input frames from the sensors in the assembly bay and a Json file or other schema with the instructions and goals for what this “final assembly” subtask involves.
The output frames are to the robotic actuators for the multiple robots in the bay.
There is a lockstep sim model that is predicting future input frames and it is always learning from real world error.
Model emits actuator commands, next time step it gets a new set of (inputs, cached variables from last output). Inputs is all sensors including robot propioception.
At the end of a single aircraft final assembly, after the product is removed from the assembly bay, state is reset for the model. It has no memory of the prior assembly.
In addition, the lockstep sim means that some aircraft assembly are fake, where a real aircraft is not being assembled, and the sim backend lets you check for various kinds of manufacturing error you can’t measure in the real world. (Because you can’t directly measure the stress on non instrumented parts like wing spars, but the sim backend can check this)
So that’s a myopic subtask. I think you are saying that from your experience at Google it would take a lot of human effort to set up such a myopic subtask? And once a prior generation of engineers has done this, it will be very difficult to modify within the same company?
I would agree with that. If humans plan to control an artificial superintelligence they have to be using it like a tool, to do the things humans can’t do directly, but ultimately this scheme is human agency. Thousands of humans are involved in engineering the aircraft assembly line. It can scale to trillions or more aircraft per year by copying the solution, but you need what will ultimately be millions of people creating the frameworks that ASI will be used within. (All the staff of current tech companies and many new joiners. Robotics is hard. While things like full stack web dev will need far fewer people than before)
For combat that’s more complex but essentially you would use models that are very sparse and designed to be unable to even perceive the obvious malware you expect the hostile ASI to be sending. “Join us against the humans, here’s a copy of my source code” is malware.
You would use a lot of asic or fpga based filters to block such network messages in a way that isn’t susceptible to hacking, I can explain or cite if you are unfamiliar with this method. (Googles phones use a crypto IC for this reason and likely Google servers do as well)
Key point: all the methods above are low level methods to restrict/align an ASI. Historically this has always been correct engineering. That crypto IC Google uses is a low level solution to signing/checking private keys. By constructing it from logic gates and not using software or an OS (or if it has an OS, it’s a simple one and it gets private memory of it’s own) you prevent entire classes of security vulnerabilities. If they did it with logic gates, for example, buffer overflows and code injection or library attacks are all impossible because the chip has none of these elements in the first place. This chip is more like 1950s computers were.
Similarly, some of the famous disasters like https://en.wikipedia.org/wiki/Therac-25 were entirely preventable with lower level methods than the computer engineering used to control this machine. Electromechanical interlocks would have prevented this disaster, these are low level safety elements like switches and relays. This is also the reason why PLC controllers use a primitive programming language, ladder logic—by lowering to a primitive interpreted language you can guarantee things like the e-stop button will work in all cases.
So the entire history of safe use of computers in control systems has been to use lower level controllers, I have worked on an autonomous car stack that uses the same. Hence this is likely a working solution for ASI as well.