As a very simple proof-of-concept that this does happen: we queried Openai-4o-mini the same prompt “Once upon a time,” with fixed seed, and we get ~5 different generations.
There are lots of reasons for non-determinism, but one important one is due to floating-point non-associativity (a+b+c != c+ b +a), and ML models don’t specify the ordering of matrix multiplications. Thus, if you run the same model on 2 different machines, you can get different logits (and thus different sampled tokens), because different gpus have different preferences of matmul orderings.
(We give a more formal discussion of sources of non-determinism in section 4.4 of the full paper (link) )
As a very simple proof-of-concept that this does happen: we queried Openai-4o-mini the same prompt “Once upon a time,” with fixed seed, and we get ~5 different generations.
There are lots of reasons for non-determinism, but one important one is due to floating-point non-associativity (a+b+c != c+ b +a), and ML models don’t specify the ordering of matrix multiplications. Thus, if you run the same model on 2 different machines, you can get different logits (and thus different sampled tokens), because different gpus have different preferences of matmul orderings.
(We give a more formal discussion of sources of non-determinism in section 4.4 of the full paper (link) )