One other thing I’ll mention: to fully exploit the valid non-determinism as a covert communication channel, the attacker needs two very strong affordances which may not be realistic:
The attacker would need an exact copy of the model weights outside the data center, which may fail in the closed weight setting
The ability to replay the inference process more deterministically than the verifier can replay inference during verification. However, it’s unclear why an attacker outside the datacenter would have a cleaner view of the inference process than a verifier with access to all internal activations, logits, statistics, etc from inside the datacenter. To fully exploit the channel, the attacker needs deviations to be deterministic and predictable from their perspective while still appearing nondeterministic to the verifier.
Given all of this, I believe this goal (“Output re-computation that reduces covert channel capacity below 0.01%.”) from your intro may be achievable with simple re-computation approaches. Simple DiFR (Divergence From Reference) techniques may work with non-deterministic inference, as the verifier likely has more affordances than the attacker.
If that doesn’t work, Deepseek demonstrated deterministic inference with “minimal performance overhead” in their Deepseek-V4 technical report.
Anecdotally when I talk to normal people about AI the main questions are “Will AI take my job?” and “Will AI take over the world / become Skynet / etc?”.
Although job concerns are taken more seriously than loss of control.