Participants scoring in the bottom quartile on our humor test (...) overestimated their percentile ranking
A less well-known finding of Dunning—Kruger is that the best performers will systematically underestimate how good they are, by about 15 percentile points.
Isn’t this exactly what you’d expect if people were good bayesians receiving scarce evidence? Everyone starts out with assuming that they’re in the middle, and as they find something easy or hard, they gradually update away from their prior. If they don’t have good information about how good other people are, they won’t update too much.
If you then look at the extremes, the very best and the very worst people, of course you’re going to see that they should extremify their beliefs. But if everyone followed that advice, you’d ruin the accuracy of the people more towards the middle, since they haven’t received enough evidence to distinguish themselves from the extremes.
(Similarly, I’ve heard that people often overestimate their ability on easy tasks and underestimate their ability on difficult tasks, which is exactly what you’d expect if they had good epistemics but limited evidence. If task performance is a function of task difficulty and talent for a task, and the only things you can observe is your performance, then believing that you’re good at tasks you do well at and bad at tasks you fail at is the correct thing to do. As a consequence, saying that people overestimate their driving ability doesn’t tell you that much about the quality of their epistemics, in isolation, because they might be following a strategy that optimises performance across all tasks.)
The finding that people at the bottom overestimate their position with 46 percentile points is somewhat more extreme than this naïve model would suggest. As you say, however, it’s easily explained when you take into account that your ability to judge your performance on a task is correlated with your performance on that task. Thus, the people at the bottom are just receiving noise, so on average they stick with their prior and judge that they’re about average.
Of course, just because some of the evidence is consistent with people having good epistemics doesn’t mean that they actually do have good epistemics. I haven’t read the original paper, but it seems like people at the bottom actually thinks that they’re a bit above average, which seems like a genuine failure, and I wouldn’t be surprised if there are more examples of such failures which we can learn to correct. The impostor syndrome also seems like a case where people predictably fail in fixable ways (since they’d do better by estimating that they’re of average ability, in their group, rather than even trying to update on evidence).
But I do think that people often are too quick to draw conclusions from looking at a specific subset of people estimating their performance on a specific task, without taking into account how well their strategy would do if they were better or worse, or were doing a different task. This post fixes some of those problems, by reminding us that everyone lowering the estimate of their performance would hurt the people at the top, but I’m not sure if it correctly takes into account how the people in the middle of the distribution would be affected.
(The counter-argument might be that people who know about Dunning-Kruger is likely to be at the top of any distribution they find themselves in, but this seems false to me. I’d expect a lot of people to know about Dunning-Kruger (though I may be in a bubble) and there are lots of tasks where ability doesn’t correlate a lot with knowing about Dunning-Kruger. Perhaps humor is an example of this.)
In other words, regression to the mean. The predictions form a line, with a positive slope. Less than 1, but only perfect predictions would have slope 1. The intercept is high, which is overconfidence. But the intercept is a statement about the whole population, not about the lowest bin.
Here are the graphs. A lot of information has been destroyed by binning them, but it doesn’t sound like DK thought that information was relevant or made use of it:
The second is different. That better matches the cartoons one finds for an image search for Dunning-Kruger. But I’m not sure it matches this post. (The third could be described as yet another shape, but I’d classify it as a line with a very low positive slope.)
The fourth graph is of a more complicated intervention. It seems like it has the opposite message of this post, namely it finds that the 4th quartile is better calibrated than the 3rd.
Isn’t this exactly what you’d expect if people were good bayesians receiving scarce evidence? Everyone starts out with assuming that they’re in the middle, and as they find something easy or hard, they gradually update away from their prior. If they don’t have good information about how good other people are, they won’t update too much.
If you then look at the extremes, the very best and the very worst people, of course you’re going to see that they should extremify their beliefs. But if everyone followed that advice, you’d ruin the accuracy of the people more towards the middle, since they haven’t received enough evidence to distinguish themselves from the extremes.
(Similarly, I’ve heard that people often overestimate their ability on easy tasks and underestimate their ability on difficult tasks, which is exactly what you’d expect if they had good epistemics but limited evidence. If task performance is a function of task difficulty and talent for a task, and the only things you can observe is your performance, then believing that you’re good at tasks you do well at and bad at tasks you fail at is the correct thing to do. As a consequence, saying that people overestimate their driving ability doesn’t tell you that much about the quality of their epistemics, in isolation, because they might be following a strategy that optimises performance across all tasks.)
The finding that people at the bottom overestimate their position with 46 percentile points is somewhat more extreme than this naïve model would suggest. As you say, however, it’s easily explained when you take into account that your ability to judge your performance on a task is correlated with your performance on that task. Thus, the people at the bottom are just receiving noise, so on average they stick with their prior and judge that they’re about average.
Of course, just because some of the evidence is consistent with people having good epistemics doesn’t mean that they actually do have good epistemics. I haven’t read the original paper, but it seems like people at the bottom actually thinks that they’re a bit above average, which seems like a genuine failure, and I wouldn’t be surprised if there are more examples of such failures which we can learn to correct. The impostor syndrome also seems like a case where people predictably fail in fixable ways (since they’d do better by estimating that they’re of average ability, in their group, rather than even trying to update on evidence).
But I do think that people often are too quick to draw conclusions from looking at a specific subset of people estimating their performance on a specific task, without taking into account how well their strategy would do if they were better or worse, or were doing a different task. This post fixes some of those problems, by reminding us that everyone lowering the estimate of their performance would hurt the people at the top, but I’m not sure if it correctly takes into account how the people in the middle of the distribution would be affected.
(The counter-argument might be that people who know about Dunning-Kruger is likely to be at the top of any distribution they find themselves in, but this seems false to me. I’d expect a lot of people to know about Dunning-Kruger (though I may be in a bubble) and there are lots of tasks where ability doesn’t correlate a lot with knowing about Dunning-Kruger. Perhaps humor is an example of this.)
In other words, regression to the mean. The predictions form a line, with a positive slope. Less than 1, but only perfect predictions would have slope 1. The intercept is high, which is overconfidence. But the intercept is a statement about the whole population, not about the lowest bin.
Here are the graphs. A lot of information has been destroyed by binning them, but it doesn’t sound like DK thought that information was relevant or made use of it:
The second is different. That better matches the cartoons one finds for an image search for Dunning-Kruger. But I’m not sure it matches this post. (The third could be described as yet another shape, but I’d classify it as a line with a very low positive slope.)
The fourth graph is of a more complicated intervention. It seems like it has the opposite message of this post, namely it finds that the 4th quartile is better calibrated than the 3rd.