That’s not an explanation, just a symptom of the problem. People of mediocre talent and high talent both get A—that’s part of the reason why we have to use standardized tests with a higher ceiling.
My intuition is that the top few notches are satisficing, whereas all lower ratings are varying degrees of non-satisficing. The degree to which everything tends to cluster at the top represents the degree to which everything is satisfactory for practical purposes. In situations where the majority of the rated things are not satisfactory (like the Putnam—nothing less than a correct proof is truly satisfactory), the ratings will cluster near the bottom.
For example, compare motels to hotels. Motels always have fewer stars, because motels in general are worse. Whereas, say, video games will tend to cluster at the top because video games in general are satisfactorily fun.
Or, think Humanities vs. Engineering grades. Humanities students in general satisfy the requirements to be historians and writers or liberal-arts-educated-white-collar workers more than Engineering students satisfy the requirements to be engineers.
That’s not an explanation, just a symptom of the problem.
This is what I was trying to convey when I said it might be another example of the problem.
I think it’s reasonable, in many contexts, to say that achieving 75% of the highest possible score on an exam should earn you what most people think of as a C grade (that is, good enough to proceed with the next part of your education, but not good enough to be competitive).
I would say that games are different. There is not, as far as I know, a quantitative rubric for scoring a game. A 6⁄10 rating on a game does not indicate that the game meets 60% of the requirements for a perfect game. It really just means that it’s similar in quality to other games that have received the same score, and usually a 6⁄10 game is pretty lousy. I found a histogram of scores on metacritic:
The peak of the distributions seems to be around 80%, while I’d eyeball the median to be around 70-75%. There is a long tail of bad games. You may be right that this distribution does, in some sense, reflect the actual distribution of game quality. My complaint is that this scoring system is good at resolving bad games from truly awful games from comically terrible games, but it is bad at resolving a good game from a mediocre game.
What I think it should be is a percentile-based score, like Lumifer describes:
Consider this example: I come up to you and ask “So, how was the movie?”. You answer “I give it a 6 out of 10″. Fine. I have some vague idea of what you mean. Now we wave a magic wand and bifurcate reality.
In branch 1 you then add “The distribution of my ratings follows the distribution of movie quality, savvy?” and let’s say I’m sufficiently statistically savvy to understand that. But… does it help me? I don’t know the distribution of movie quality. it’s probably bell-shaped, maybe, but not quite normal if only because it has to be bounded, I have no idea if its skewed, etc.
In branch 2 you then add “The rating of 6 means I rate the movie to be in the sixth decile”. Ah, that’s much better. I now know that out of 10 movies that you’ve seen five were probably worse and three were probably better. That, to me, is a more useful piece of information.
Then again, maybe it’s difficult to discern a difference in quality between a 60th percentile game and an 80th percentile game.
That’s not an explanation, just a symptom of the problem. People of mediocre talent and high talent both get A—that’s part of the reason why we have to use standardized tests with a higher ceiling.
My intuition is that the top few notches are satisficing, whereas all lower ratings are varying degrees of non-satisficing. The degree to which everything tends to cluster at the top represents the degree to which everything is satisfactory for practical purposes. In situations where the majority of the rated things are not satisfactory (like the Putnam—nothing less than a correct proof is truly satisfactory), the ratings will cluster near the bottom.
For example, compare motels to hotels. Motels always have fewer stars, because motels in general are worse. Whereas, say, video games will tend to cluster at the top because video games in general are satisfactorily fun.
Or, think Humanities vs. Engineering grades. Humanities students in general satisfy the requirements to be historians and writers or liberal-arts-educated-white-collar workers more than Engineering students satisfy the requirements to be engineers.
This is what I was trying to convey when I said it might be another example of the problem.
I think it’s reasonable, in many contexts, to say that achieving 75% of the highest possible score on an exam should earn you what most people think of as a C grade (that is, good enough to proceed with the next part of your education, but not good enough to be competitive).
I would say that games are different. There is not, as far as I know, a quantitative rubric for scoring a game. A 6⁄10 rating on a game does not indicate that the game meets 60% of the requirements for a perfect game. It really just means that it’s similar in quality to other games that have received the same score, and usually a 6⁄10 game is pretty lousy. I found a histogram of scores on metacritic:
http://www.giantbomb.com/profile/dry_carton/blog/metacritic-score-distribution-graphs/82409/
The peak of the distributions seems to be around 80%, while I’d eyeball the median to be around 70-75%. There is a long tail of bad games. You may be right that this distribution does, in some sense, reflect the actual distribution of game quality. My complaint is that this scoring system is good at resolving bad games from truly awful games from comically terrible games, but it is bad at resolving a good game from a mediocre game.
What I think it should be is a percentile-based score, like Lumifer describes:
Then again, maybe it’s difficult to discern a difference in quality between a 60th percentile game and an 80th percentile game.
Oh right, I didn’t read carefully sorry.