Suppose we have a hypercomputer that knows the exact state of the rest of the universe. We set it to the task of simulating everything except itself. ( While assuming it’s own output is 00… 0). It then outputs some info about this simulated world.
We could build some real word UFAI detecting hardware, and add a rule to the simulation that if the virtual version of the UFAI detector is triggered, it outputs a “UFAI error”. But what we are really wanting to do is write a program that can detect an UFAI, given an atomically precise description of the world. While also having some control over which world you get a description of through physical manufactured devices being copied into the virtual world. For instance, suppose you know that only molecular nanotech can break through your physical defenses. You have a program that can reliably detect molecular nanotech give atomically precise description of the world. You can combine these into a program that returns “UFAI error” any time the virtual copy of a physical UFAI detector is triggered or nanotech is made in the simulation. Note that humans making nanotech within the prediction window would also give you an “UFAI error”.
Perhaps we could have some program that searches for all computational processes that seem to be doing counterfactual reasoning about the oracle. Any process that is trying to acausally optimise the oracles output in any way whatsoever will cause the oracle to output an error message. Don’t make this too sensitive to humans. Suppose the prediction oracles output was whatever was typed into this keyboard here. Suppose we did a medical trial in the virtual world, and then typed the results into the keyboard. The keyboard doesn’t need to be plugged in to anything. The only reason to type that data in is to influence the world outside the simulation. You might solve this by allowing any optimizer that runs on neurons to try manipulating the output. (And hope that any UFAI isn’t built out of neurons and can’t decide to make a neuron based copy of themselves to avoid setting of the detector without setting of the detector. )
Suppose we have a hypercomputer that knows the exact state of the rest of the universe. We set it to the task of simulating everything except itself. ( While assuming it’s own output is 00… 0). It then outputs some info about this simulated world.
We could build some real word UFAI detecting hardware, and add a rule to the simulation that if the virtual version of the UFAI detector is triggered, it outputs a “UFAI error”. But what we are really wanting to do is write a program that can detect an UFAI, given an atomically precise description of the world. While also having some control over which world you get a description of through physical manufactured devices being copied into the virtual world. For instance, suppose you know that only molecular nanotech can break through your physical defenses. You have a program that can reliably detect molecular nanotech give atomically precise description of the world. You can combine these into a program that returns “UFAI error” any time the virtual copy of a physical UFAI detector is triggered or nanotech is made in the simulation. Note that humans making nanotech within the prediction window would also give you an “UFAI error”.
Perhaps we could have some program that searches for all computational processes that seem to be doing counterfactual reasoning about the oracle. Any process that is trying to acausally optimise the oracles output in any way whatsoever will cause the oracle to output an error message. Don’t make this too sensitive to humans. Suppose the prediction oracles output was whatever was typed into this keyboard here. Suppose we did a medical trial in the virtual world, and then typed the results into the keyboard. The keyboard doesn’t need to be plugged in to anything. The only reason to type that data in is to influence the world outside the simulation. You might solve this by allowing any optimizer that runs on neurons to try manipulating the output. (And hope that any UFAI isn’t built out of neurons and can’t decide to make a neuron based copy of themselves to avoid setting of the detector without setting of the detector. )