Most of those signs are Anthropic specific, and OpenAI has several signs pointing in the opposite direction. They have really been leaning into a message of “efficient frontier AI for everyone”, they reset rate limits multiple times per month as a meme, and Codex rate limits are already very generous.
Anecdotally I’ve never heard of anyone hitting a Codex rate limit on a Max plan, and from what I hear most people struggle to hit 20% of their weekly rate limit with heavy usage.
One other thing I’ll mention: to fully exploit the valid non-determinism as a covert communication channel, the attacker needs two very strong affordances which may not be realistic:
The attacker would need an exact copy of the model weights outside the data center, which may fail in the closed weight setting
The ability to replay the inference process more deterministically than the verifier can replay inference during verification. However, it’s unclear why an attacker outside the datacenter would have a cleaner view of the inference process than a verifier with access to all internal activations, logits, statistics, etc from inside the datacenter. To fully exploit the channel, the attacker needs deviations to be deterministic and predictable from their perspective while still appearing nondeterministic to the verifier.
Given all of this, I believe this goal (“Output re-computation that reduces covert channel capacity below 0.01%.”) from your intro may be achievable with simple re-computation approaches. Simple DiFR (Divergence From Reference) techniques may work with non-deterministic inference, as the verifier likely has more affordances than the attacker.
If that doesn’t work, Deepseek demonstrated deterministic inference with “minimal performance overhead” in their Deepseek-V4 technical report.