Side-channels: input versus output

This is a brief post arguing that, although “side-channels are inevitable” is pretty good common advice, actually, you can prevent attackers inside a computation from learning about what’s outside.

We can prevent a task-specific AI from learning any particular facts about, say, human psychology, virology, or biochemistry—if:

  1. we are careful to only provide the training process with inputs that would be just as likely in, say, an alternate universe where AI was built by octopus minds made of organosilicon where atoms obey the Bohr model

  2. we use relatively elementary sandboxing (no clock access, no networking APIs, no randomness, none of these sources of nondeterminism, error-correcting RAM, and that’s about it)

I don’t think either of these happens by default and if you are in an AGI lab I suggest you advocate for either (or both if you can, but one at a time is good too).

Regarding item 1, self-play in Go is an example par excellence and this may be one reason why people tend to have a strong intuition that arbitrarily strong AlphaZero fails to kill you. An example that trades off more potential risk for engineering applicability would be inputs from a Newtonian physics simulator with semi-rigid objects of randomly generated shapes.

I think it is good to be aware that

  • this is a real tradeoff; we don’t need to just assume that superintelligent models will be able to deduce whatever physical facts they want through pure logical induction

  • taking advantage of the argument here would require being careful about data

    • datasets like CommonCrawl, MassiveText, and YouTube-8M are, uh, not careful

Arguments

If an attacker on the outside of a computation is trying to manipulate events inside the computation, they get to use any and all physical mechanisms: they can open up the computer, stick some electrodes on the memory bus, and have at it.

If an attacker is inside a pure mathematical function and trying to manipulate events outside the computer, well, they can influence the output of the function (or even, potentially, whether the function is tractable to compute). If the output of the computation is propagated into human minds, this influence could have cascading effects that are hard to avoid or even notice.

If an attacker is trying to exfiltrate information from a computer, they get to use any and all physical mechanisms: they can run a very accurate clock, use a sensitive microphone, measure magnetic fields, etc. For better or worse, the nature of our physics is that everything that happens tends to radiate information in all directions and it’s very difficult to fully neutralise; physical distance from attackers is the best defence, and even that is far from perfect (for example, timing information still propagates as far away as outputs do).

If an attacker is inside a pure mathematical function and trying to exfiltrate information from outside the computer, they do not get to use any physics. That same mathematical function is, presumably, also being computed in various places all over the multiverse. Only the explicitly given inputs—and what can be inferred from them on a Solomonoff-style prior—narrow it down. If a deterministic function is computed correctly, its result cannot depend further on the specific physical properties of the computation or facts about the environment in which it was running. All the attacker can learn is that, if they are in a simulation at all, the simulator has at least a certain amount of computational resources.[1]

  1. ^

    In fact, maybe they cannot be very sure of this either, since who knows what kind of fancy compiler-optimisation static-analysis techniques are out there in the universe that’s simulating them. They could try doing some cryptographic operations that are supposed to be irreducibly expensive, but we tend to have a hard time proving those sorts of computational hardness assumptions. Or maybe the simulators can just fudge those parts of the computation. I’m not confident either way about how much the attacker can learn about the available computational resources.