Roy Rinberg comments on Text Compression Can Help Secure Model Weights

Roy Rinberg 5 Mar 2026 16:09 UTC
3 points
0
Wouldn’t someone notice if 100% of requests to OpenAI failed for 24 hours straight though?
Yes, definitely. A few responses:
1. Egress-limiting sets up a information-theoretic backstop, there is simply no way to steal more than X GB in a day across a channel that allows for X GB per day. The more time this backstop buys us, the more leeway we have with our monitoring + incident response systems.
2. I don’t think it’s likely that 100% of the traffic would be corrupted, but I don’t know what is the most likely world, and it’s easy to linearly scale based on what security people think is the most likely outcome. (e.g. 10% of the datacenter → 10x increase timelines).
3. There’s a cognitive load benefit for security engineering—it’s extremely convenient to think about egress-limiting because it’s information-theoretic bound. You should be able to easily confirm that the egress-limiting is implemented properly because it should be <200 lines of code (or is literally just the width of a wire). And “complexity is the worst enemy of security” (post by Bruce Schneier).
To give a somewhat long-winded tangent; I find it useful to think about exfiltration in three settings:
- Setting 1: Fast exfiltration. An extremely rapid exfiltration occurs before human intervention is possible. For example, an adversary compromises 10,000 servers and leaks ¹⁄₁₀,000th of the weights from each of them.
- Setting 2: Intermediate exfiltration. A leak occurs over days or weeks—that is, a human-operable amount of time.
- Setting 3: Slow exfiltration. There is a length of time (e.g., 2 years) where even if the adversary could steal the model weights in that amount of time, it wouldn’t matter by the end, because new models would be meaningfully different.
In this framing, egress-limiting is tasked with moving us from setting 1 to setting 2. And then we can have monitoring + response schemes that move us from setting 2 to setting 3. (Relatedly, you may find it interesting to take a look at my other work on inference verification to detect steganographic exfiltration attempts.)
It seems like the amount of egress an attacker could use without getting caught is a function of buffers
Again, yes I agree. 2 responses:
1. Yes, we would certainly need to include buffers to these bandwidths, the calculations in the post are a back-of-the-envelope-calculation, and I imagine the amount of buffer required is still in the same order of magnitude. (Plus I don’t work at a lab, nor would I be allowed to share such numbers.)
2. More algorithmically, there are rate-limiting methods that permit bursts above a baseline rate while enforcing a hard cap on the total across a longer period (e.g., 1 day). For one, there’s a method called Token Bucket Rate Limiting.
Last, most speculative, point: in the most extreme world where a powerful adversary (e.g. nation state) physically takes over a datacenter. Egress-limiting (if connected to other defenses, like tamper-proof enclosures that destroy the GPU when opened) gives us time to reacquire the GPUs/model-weights, because even with sudo access to the GPU, they still can’t exfiltrate weights in less than X-amount-of-time. Though, this is probably wrong in some way, because either nation-states may be able to remove the enclosures and side channel into the GPU some other means, or once the weights are stolen it’s unlikely we steal them back.