Do you have a more specific purpose in mind? I’m curious what spurred your question.
A prof doing an experiment gave me a bunch of data from calibration tests with demographic identifiers, and I’d like to be able to analyze it to say things like “Old people have better calibration than young people” or “Training in finance improves your calibration”.
Oh, excellent. I do love data. What is the format (what is the maximum amount of information you have about each individual)?
Given that you already have the data, (and you probably have reason to suspect that individuals were not trying to game the test?), I suspect the best way is to graph both accuracy and anticipated accuracy against the chosen demographic, and then for all your readers who want numbers, compute either the ratio or the difference of those two and publish the PMCC of that against the demographic (it’s Frequentist, but it’s also standard practice, and I’ve had papers rejected that don’t follow it...).
Leaving them with two separate metrics would allow you to make interesting statements like “financial training increased accuracy, but it also decreased calibration. Subjects overestimated their ability.”
A prof doing an experiment gave me a bunch of data from calibration tests with demographic identifiers, and I’d like to be able to analyze it to say things like “Old people have better calibration than young people” or “Training in finance improves your calibration”.
Oh, excellent. I do love data. What is the format (what is the maximum amount of information you have about each individual)?
Given that you already have the data, (and you probably have reason to suspect that individuals were not trying to game the test?), I suspect the best way is to graph both accuracy and anticipated accuracy against the chosen demographic, and then for all your readers who want numbers, compute either the ratio or the difference of those two and publish the PMCC of that against the demographic (it’s Frequentist, but it’s also standard practice, and I’ve had papers rejected that don’t follow it...).
I’m not sure what the Pacific Mennonite Children’s Choir has to do with it… oh wait, nevermind.
Leaving them with two separate metrics would allow you to make interesting statements like “financial training increased accuracy, but it also decreased calibration. Subjects overestimated their ability.”