Your idea about getting closer to 50% probability of an upvote in order to get more information identifies a weakness in the voting system. It doesn’t matter as much for comments, but I think it is inadequate for articles.
Much better than having to put every article into one of three categories—up, down, or neither—would be to have a slider that starts at 0 and can take values between −100 and +100. What we have now is equivalent to something like having −100 to −33.3 all mapped to ‘down’, −33.3 to +33.3 all mapped to neither, and +33.3 to +100 all mapped to ‘up’. Obviously, lots of information is being discarded by design.
Another problem is that votes aren’t normalized with respect to the user that cast the vote. An up vote from a user who rarely votes up should be worth more than one from someone who votes everything up.
Also, there could be distorting effects due to different subsets of readers preferentially reading different subsets of articles. If readers coming to LW without having read OB tend to vote differently (which is plausible since OB folks have not voted for years and may think of not voting up or down as the default, with a vote being for special emphasis), and they tend to read different sorts of articles (simpler articles on easier topics), the articles they read will appear to be wildly more popular.
The slider is an interesting notion. It adds user-interface complexity, and may have incentive problems for users who desire to exert control, but potentially garners a substantially more useful form of information.
At the moment the current score is a strong influence on how I vote on comments: I vote to move the score to the value I’d like it to have. This is somewhat unstable; directly specifying a personal score and taking a median would be less problematic.
The problem of the desire to exert control makes me think that a better option is giving a limited number of double/super/special votes that users can ration out as they see fit. Extra votes that actually mean something.
That’s a good idea. Though I didn’t say it originally, when I mentioned normalization of a vote with respect to the user that cast it, I meant not only that it should be normalized against the average rating of a vote for that user but also against how much the user votes in general—users who rate everything would then have less influence per vote than users who vote less frequently. If that were the case, then people who prefer to ration their votes and use them only for things they feel very strongly about (or have thought carefully about) would not have much less influence on what is popular and the direction of the site, as they currently do.
Having a slider requires a more-sophisticated data analysis, because different people use different rating scales. Typically psychologists use a multi-point scale, then use Rasch analysis (also called multi-item response theory) on the data.
I would say from my experience that a 5-point scale is not big enough; almost everything gets 3 or 4 points, except from the people (about 2% of raters) who binarize the scale by giving everything either a 1 or a 5. Also, people will not use negative ratings, so don’t try to center them on zero. People (or at least Americans) just can’t say “zero is average”.
My instinct would be to have the numbers not be visible to the user. You just have a rectangle with two colors, initially red on the right side and green on the left side. Clicking anywhere inside the rectangle changes the dividing line to be at that location. So clicking 90% of the way towards the right would make the left 90% be green and the right 10% be red. The backend would know that it corresponds to whatever number it corresponds to (+80 according to the scheme I gave earlier), but the user just has a qualitative feel for how much of the mass they’ve allocated to the good (green) color and how much to the bad (red) color.
As you hover over the rating button, the text below changes to indicate what that rating would mean. Zero stars means “don’t bother”, one star means “good enough to stay visible”, two stars means “above-average” and so on
Allow half stars for more information.
We would use percentile score to make the best use of the votes of binarizing voters without giving them more influence than high-information voters.
Amazon ranks stuff between ★☆☆☆☆ and ★★★★★ with a simple Javascript mouse hover / mouse click to set the value. LW could copy that pretty easily. I suggest that 5 categories would be enough.
Your idea about getting closer to 50% probability of an upvote in order to get more information identifies a weakness in the voting system. It doesn’t matter as much for comments, but I think it is inadequate for articles.
Much better than having to put every article into one of three categories—up, down, or neither—would be to have a slider that starts at 0 and can take values between −100 and +100. What we have now is equivalent to something like having −100 to −33.3 all mapped to ‘down’, −33.3 to +33.3 all mapped to neither, and +33.3 to +100 all mapped to ‘up’. Obviously, lots of information is being discarded by design.
Another problem is that votes aren’t normalized with respect to the user that cast the vote. An up vote from a user who rarely votes up should be worth more than one from someone who votes everything up.
Also, there could be distorting effects due to different subsets of readers preferentially reading different subsets of articles. If readers coming to LW without having read OB tend to vote differently (which is plausible since OB folks have not voted for years and may think of not voting up or down as the default, with a vote being for special emphasis), and they tend to read different sorts of articles (simpler articles on easier topics), the articles they read will appear to be wildly more popular.
The slider is an interesting notion. It adds user-interface complexity, and may have incentive problems for users who desire to exert control, but potentially garners a substantially more useful form of information.
At the moment the current score is a strong influence on how I vote on comments: I vote to move the score to the value I’d like it to have. This is somewhat unstable; directly specifying a personal score and taking a median would be less problematic.
The problem of the desire to exert control makes me think that a better option is giving a limited number of double/super/special votes that users can ration out as they see fit. Extra votes that actually mean something.
That’s a good idea. Though I didn’t say it originally, when I mentioned normalization of a vote with respect to the user that cast it, I meant not only that it should be normalized against the average rating of a vote for that user but also against how much the user votes in general—users who rate everything would then have less influence per vote than users who vote less frequently. If that were the case, then people who prefer to ration their votes and use them only for things they feel very strongly about (or have thought carefully about) would not have much less influence on what is popular and the direction of the site, as they currently do.
Having a slider requires a more-sophisticated data analysis, because different people use different rating scales. Typically psychologists use a multi-point scale, then use Rasch analysis (also called multi-item response theory) on the data.
I would say from my experience that a 5-point scale is not big enough; almost everything gets 3 or 4 points, except from the people (about 2% of raters) who binarize the scale by giving everything either a 1 or a 5. Also, people will not use negative ratings, so don’t try to center them on zero. People (or at least Americans) just can’t say “zero is average”.
My instinct would be to have the numbers not be visible to the user. You just have a rectangle with two colors, initially red on the right side and green on the left side. Clicking anywhere inside the rectangle changes the dividing line to be at that location. So clicking 90% of the way towards the right would make the left 90% be green and the right 10% be red. The backend would know that it corresponds to whatever number it corresponds to (+80 according to the scheme I gave earlier), but the user just has a qualitative feel for how much of the mass they’ve allocated to the good (green) color and how much to the bad (red) color.
Two things you could do about that:
As you hover over the rating button, the text below changes to indicate what that rating would mean. Zero stars means “don’t bother”, one star means “good enough to stay visible”, two stars means “above-average” and so on
Allow half stars for more information.
We would use percentile score to make the best use of the votes of binarizing voters without giving them more influence than high-information voters.
Amazon ranks stuff between ★☆☆☆☆ and ★★★★★ with a simple Javascript mouse hover / mouse click to set the value. LW could copy that pretty easily. I suggest that 5 categories would be enough.
See PhilGoetz’s point below: “almost everything gets 3 or 4 points”.