Yvain, I rechecked the calibration survey results, and encourage someone to recheck my recheck further:
First, these strata overlap… is 5 in 0-5 or 5-15? The N I doesn’t actually match either one get either one when I recheck.
Secondly, I am not sure what program you used to calculate the statistics, but when I checked in excel, some people used percentages that got pulled as numbers less than one. I tried to clean that for these. (also removed someone who answered 150.)
Thirdly, there are 20 people in this N. You can be either 60% correct (12 correct), or 65% correct (13 correct), but 60.2% correct in this line seems weird.
85-95: 60.2% [n = 20]
Here was my attempt at recalculating those figures: N after data cleaning was 998.
0-<5: 9.1% [n = 2⁄22]
5-<15: 13.7% [n = 25⁄183]
15-<25: 9.3% [n = 21⁄226]
25-<35: 10% [n = 20⁄200]
35-<45: 11.1% [n = 10⁄90]
45-<55: 17.3% [n = 19⁄110]
55-<65: 20.8% [n = 11⁄53]
65-<75: 22.6% [n = 7⁄31]
75-<85: 36.7% [n = 11⁄30]
85-<95: 63.2% [n = 12⁄19]
95-100: 88.2% [n = 30⁄34]
I express low confidence in these remarks because I haven’t rechecked this or gone into detail about data cleaning, but my brief take is:
1: Yes, there were some errors that made it look a bit worse than it was.
2: It’s still shows overconfidence. (Edit: see possible caveat below)
Question: Do we have enough data to determine if that hump at near 10% confidence that you are right is significant?
Edit: I’m not a statistician, but I do notice there appears to be substantially more N that answered in the lower confidence ranges. I mean, yes, on average, the people who answered in those high 55-<85 ranges were quite far off, but there were more people than answered in the 15-<25 range then all of those three groups put together.
Yvain, I rechecked the calibration survey results, and encourage someone to recheck my recheck further:
First, these strata overlap… is 5 in 0-5 or 5-15? The N I doesn’t actually match either one get either one when I recheck.
Secondly, I am not sure what program you used to calculate the statistics, but when I checked in excel, some people used percentages that got pulled as numbers less than one. I tried to clean that for these. (also removed someone who answered 150.)
Thirdly, there are 20 people in this N. You can be either 60% correct (12 correct), or 65% correct (13 correct), but 60.2% correct in this line seems weird. 85-95: 60.2% [n = 20]
Here was my attempt at recalculating those figures: N after data cleaning was 998.
0-<5: 9.1% [n = 2⁄22]
5-<15: 13.7% [n = 25⁄183]
15-<25: 9.3% [n = 21⁄226]
25-<35: 10% [n = 20⁄200]
35-<45: 11.1% [n = 10⁄90]
45-<55: 17.3% [n = 19⁄110]
55-<65: 20.8% [n = 11⁄53]
65-<75: 22.6% [n = 7⁄31]
75-<85: 36.7% [n = 11⁄30]
85-<95: 63.2% [n = 12⁄19]
95-100: 88.2% [n = 30⁄34]
I express low confidence in these remarks because I haven’t rechecked this or gone into detail about data cleaning, but my brief take is:
1: Yes, there were some errors that made it look a bit worse than it was.
2: It’s still shows overconfidence. (Edit: see possible caveat below)
Question: Do we have enough data to determine if that hump at near 10% confidence that you are right is significant?
Edit: I’m not a statistician, but I do notice there appears to be substantially more N that answered in the lower confidence ranges. I mean, yes, on average, the people who answered in those high 55-<85 ranges were quite far off, but there were more people than answered in the 15-<25 range then all of those three groups put together.
I think the calibration data needs additional cleaning. Eyeballing, I see % signs, decimals, and English comments.