I agree that individual control increases policy variance, which was sort of my point. Whether that’s good or not seems to me to depend on what the default course of events is. If you think things are headed in a good direction, then low variance is good. But if the default course is likely to be disastrous, high variance at least provides a chance.
I don’t understand your point about asymmetry. Doesn’t that tend to make the default course bad?
This is an interesting experiment, but I think there are some technical issues.
First, even at temperature zero, LLMs may not be deterministic, in practice. The round-off error in the matrix computations can depend on things like how many processor cores are available (hence, how the taks is split up) or what other requests are being processed at the same time (since the operations are merged, affecting round-off error). It is possible to implement LLM inference in a way that’s deterministic at temperature zero, but I think it’s not typically done by commercial LLM providers, since it is somewhat more costly.
Second, temperature zero is not how an LLM is “supposed” to be run. They are trained at temperature one, and running them at any other temperature introduces bias to an unknown degree, perhaps producing atypical results.
If the general non-determinism problem is avoided (using slower implementation), one could run at temperature one by just setting the same random number seed each time. That would be a better experiment.