Yes calibration tests are rationality tests, but they are better tests on subjects where you are less likely to be rational. So what are the best subjects on which to test your calibration?
I suspect I should also be writing down calibrated probability estimates for my project completion dates. This calibration test is easy to do oneself, without infrastructure, but I’d still be interested in a website tabulating my and others’ early predictions and then our actual performance—perhaps a page within LW?. Might be especially good to know about people within a group of coworkers, who could perhaps then know how much to actually estimate timelines when planning or dividing complex projects.
Wouldn’t making a probability estimate for your project completion dates influence your date of completion? Predicting your completion times successfully won’t prove your rationality.
This is a good point. Still, it would provide evidence of rationality, especially in the likely majority of cases where people didn’t try to game the system by e.g. deliberately picking dates far in advance of their actual completions, and then doing the last steps right at that date. My calibration scores on trivia have been fine for awhile now, but my calibration at predicting my own project completions is terrible.
I wonder to what degree this is a problem of poor calibration vs. poor motivation. Maybe commitment mechanisms like Stikk.com would have a greater marginal benefit than better calibration. I don’t know about you, but that seems to be the case with regards to similar issues on my end.
Perhaps we could make a procedure for asking your friends, coworkers, and other acquaintance (all mixed together) to rate you on various traits, and anonymizing who submitted which rating to encourage honesty? You could then submit calibrated probability estimates as to what ratings were given.
I’d find this a harder context in which to be rational than I’d find trivia.
Actually, there’s probably some website out there already that lets one solicit anonymous feedback. (Which would be a rationality boost for some of us in itself, even apart from calibration—though I’d like to try calibration on it, too.)
Does anybody know of such a site? I spent an hour looking on Google—perhaps not with the right keywords—and found only What Others Think, Kumquat, and a couple Facebook/Myspace apps.
Both look potentially worth using, but neither is ideal. Are there other competitors?
Marshall, you don’t need to be good at the question subjects, just as long as you don’t think you’re good when you’re not. Calibration tests aren’t about how many of the questions you can get right, they test if you’re over (or under) confident about your answers. They tend to use obscure questions which few people are likely to know for sure what the answers are.
Thanks Michael—I just don’t think calibrating on useless information is evidence of my rationality. I am 95% sure, that I don’t know all the time. Calibrating on whether I book my next dental appointment on time seems a better clue.
Interesting point. Does anyone know of any evidence about how well calibration test results match overconfidence in important real-life decisions? I’d expect it would give a good indication, but has anyone actually tested it?
There are a lot of tests that look plausibly useful but would be much more trustworthy if we could find a sufficiently good gold standard to validate against.
If we have enough tests that look plausibly useful, each somewhat independent-looking, we could see how well they correlate. Test items on which good performance predicts high scores on other test items would seem more credible as indicators of actual rationality.
We could include behavioral measures of success, such as those suggested by Marshall, in our list of test items. (Income, self-reported happiness, having stable, positive relationships, managing to exercise regularly or to keep resolutions, probabilistic predictions for the situation you’ll be in next year (future accomplishments, self-reported happiness at future times, etc.; coupled with actual reports next year on your situation) in our list of test items. If we can find pen-and-paper test items that correlate both with behavioral measures of success and with other plausibly rationality-related pen-and-paper test items, after controlling for IQ, I’ll say we’ve won.
I am 95% confident that calibration tests are good tests for a very important aspect of rationality, and would encourage everyone to try a few.
Yes calibration tests are rationality tests, but they are better tests on subjects where you are less likely to be rational. So what are the best subjects on which to test your calibration?
I suspect I should also be writing down calibrated probability estimates for my project completion dates. This calibration test is easy to do oneself, without infrastructure, but I’d still be interested in a website tabulating my and others’ early predictions and then our actual performance—perhaps a page within LW?. Might be especially good to know about people within a group of coworkers, who could perhaps then know how much to actually estimate timelines when planning or dividing complex projects.
Wouldn’t making a probability estimate for your project completion dates influence your date of completion? Predicting your completion times successfully won’t prove your rationality.
This is a good point. Still, it would provide evidence of rationality, especially in the likely majority of cases where people didn’t try to game the system by e.g. deliberately picking dates far in advance of their actual completions, and then doing the last steps right at that date. My calibration scores on trivia have been fine for awhile now, but my calibration at predicting my own project completions is terrible.
I wonder to what degree this is a problem of poor calibration vs. poor motivation. Maybe commitment mechanisms like Stikk.com would have a greater marginal benefit than better calibration. I don’t know about you, but that seems to be the case with regards to similar issues on my end.
Perhaps we could make a procedure for asking your friends, coworkers, and other acquaintance (all mixed together) to rate you on various traits, and anonymizing who submitted which rating to encourage honesty? You could then submit calibrated probability estimates as to what ratings were given.
I’d find this a harder context in which to be rational than I’d find trivia.
Actually, there’s probably some website out there already that lets one solicit anonymous feedback. (Which would be a rationality boost for some of us in itself, even apart from calibration—though I’d like to try calibration on it, too.)
Does anybody know of such a site? I spent an hour looking on Google—perhaps not with the right keywords—and found only What Others Think, Kumquat, and a couple Facebook/Myspace apps.
Both look potentially worth using, but neither is ideal. Are there other competitors?
I don’t associate rationality with Trivial Pursuit, which does rather seem to dominate the test questions.
Marshall, you don’t need to be good at the question subjects, just as long as you don’t think you’re good when you’re not. Calibration tests aren’t about how many of the questions you can get right, they test if you’re over (or under) confident about your answers. They tend to use obscure questions which few people are likely to know for sure what the answers are.
Thanks Michael—I just don’t think calibrating on useless information is evidence of my rationality. I am 95% sure, that I don’t know all the time. Calibrating on whether I book my next dental appointment on time seems a better clue.
Interesting point. Does anyone know of any evidence about how well calibration test results match overconfidence in important real-life decisions? I’d expect it would give a good indication, but has anyone actually tested it?
There are a lot of tests that look plausibly useful but would be much more trustworthy if we could find a sufficiently good gold standard to validate against.
If we have enough tests that look plausibly useful, each somewhat independent-looking, we could see how well they correlate. Test items on which good performance predicts high scores on other test items would seem more credible as indicators of actual rationality.
We could include behavioral measures of success, such as those suggested by Marshall, in our list of test items. (Income, self-reported happiness, having stable, positive relationships, managing to exercise regularly or to keep resolutions, probabilistic predictions for the situation you’ll be in next year (future accomplishments, self-reported happiness at future times, etc.; coupled with actual reports next year on your situation) in our list of test items. If we can find pen-and-paper test items that correlate both with behavioral measures of success and with other plausibly rationality-related pen-and-paper test items, after controlling for IQ, I’ll say we’ve won.