Roy Rinberg

Karma: 217

Reflections on TA-ing Harvard’s first AI safety course

Roy Rinberg15 Jan 2026 16:28 UTC

78 points

4 comments9 min readLW link

Roy Rinberg 16 Dec 2025 20:44 UTC
2 points
0
in reply to: Alek Westover’s comment on: Defending Against Model Weight Exfiltration Through Inference Verification
Yea, there are many possible ideas here. It shouldn’t matter in this case if the user has access to the seed, the only thing that matters is if the verifier has access to the same seed as the generator. SHA256(date) works perfectly fine, and does seem reasonable.
However, you actually get some information-theory benefits if the external-adversary does not have access to the seed, so you may want to keep the real seed more secret (only the lab knows). Effectively this is because, if the adversary has access to the seed, they could choose sequences that have a large fixed-seed posterior. That is, they could iterate over responses that have a lot of tokens on the border near the red line in the figure below, allowing them to pick high entropy responses, and thus encode more bits. If they don’t know the inference-seed, this is statistically unlikely.
(we give some information-theoretic proofs on how much an adversary can extract without the inference seed, in 4.6, in a subsection Receivers without the seed.)

Roy Rinberg 16 Dec 2025 0:09 UTC
3 points
0
in reply to: Adam Karvonen’s comment on: Defending Against Model Weight Exfiltration Through Inference Verification
As a very simple proof-of-concept that this does happen: we queried Openai-4o-mini the same prompt “Once upon a time,” with fixed seed, and we get ~5 different generations.
There are lots of reasons for non-determinism, but one important one is due to floating-point non-associativity (a+b+c != c+ b +a), and ML models don’t specify the ordering of matrix multiplications. Thus, if you run the same model on 2 different machines, you can get different logits (and thus different sampled tokens), because different gpus have different preferences of matmul orderings.
(We give a more formal discussion of sources of non-determinism in section 4.4 of the full paper (link) )

Defending Against Model Weight Exfiltration Through Inference Verification

Roy Rinberg, Adam Karvonen, dreuter and Keri Warr

15 Dec 2025 15:26 UTC

119 points

15 comments8 min readLW link

We Built a Tool to Protect Your Dataset From Simple Scrapers

TurnTrout, Edward Turner, Dipika Khullar and Roy Rinberg

25 Jul 2025 5:44 UTC

60 points

9 comments3 min readLW link