The people with the best calibration scores will not be those with the most skill at calibration. It will be those who “don’t guess” on the trivia questions—they either know it or they don’t (100% of 0% chance of getting it right). This is because if you guess and have (e.g.) a 50% chance of getting it right, then even if you are perfectly calibrated about that 50%, you will still get a Brier score of 0.25, as opposed to a score of 0 for someone who “doesn’t guess”.
Consequently, I don’t really see this game as being very useful at measuring calibration.
Sure, that patch wouldn’t have the problem I described.
Anyway, do whatever works for you—if you find this exercise helps people train their calibration, then I suppose that’s a good thing. I guess my main point would be not to take too seriously what this method tells us about who is “best” at calibration—and I guess you’re saying people already don’t take seriously in the case of someone who is doing badly at the trivia portion, but I think the failure mode is a bit more general than that. Anyway, I guess it doesn’t matter too much.