Can you give us some details on how the votes are stored in the server? This may be difficult/impossible to do in an offline fashion if the right sort of data isn’t available.
I suspect that as with site modifications, those of us suggesting ways to find downvote stalkers would do best to figure out how LW works and do as much of the work as possible ourselves. So in this case, that’d probably mean downloading LW source code, figuring out the database structure, thinking of approaches to finding downvote stalkers, formalising them as database queries, then trying to get someone with database access to security check then run those queries. I suspect this because from what I gather Eliezer and those with database access (e.g. presumably Trike) tend to be busy enough or doing important enough other things that they are not willing to or it is not worth their time to do all this themselves, so we should do as much of it as possible to make things quicker for them.
Small amount of money to mouth: I did read through some of the webpages surrounding LW’s source code, downloaded it, and spent a little time trying to figure out how the site and database work. But by the time I got to the point of looking at the code, I had little enough temporary motivation left and the relation of the scripts to each other and the difficulty of figuring out where to start was enough that I didn’t get very far before I burned out for that night and haven’t looked again since. :z
A guide to (learning) LW’s code and database (even if just a few paragraphs along the lines of ‘Start by looking at the main article display script, then move on to...’ or commenting the scripts or something) might be higher leverage at this point with respect to improving the site than submitting small code improvements, since it might encourage several others to submit improvements. On the other hand, part of me suspects that the set of people held back just by that might actually be quite small (polarisation of would-be contributors into hardcore and indifferent with few in the middle—‘if they were going to do it, they would have done it by now’).
Given the distribution of coding ability here, it certainly seems ridiculous how slow stuff like this gets done, and I think it’s due to trivial inconveniences, ugh fields, etc., of which figuring out the site and how to submit code etc. is possibly a large part.
Since Eliezer’s response, I have slightly decreased my distribution over the level of downvote stalking, but there is still way too much evidence for me to honestly believe that there aren’t any downvote stalkers; it would take at least an explanation of exactly what had been tried and possibly significant knowledge of database structure to convince me it’s not happening at this point. So at present I defy the data.
The server needs to explicitly remember every vote from every users for the interface where anybody can change or retract any of their past votes to be possible.
Right- but if it doesn’t have a timestamp, then it’s difficult to determine whether or not one user downvoted another user many times in a few minutes, which is a more reliable sign of the karmassassination problem than just how many times one user has downvoted another user.
You could go off comment timestamp- “has user X downvoted a contiguous block of comments from user Y, or are there holes (i.e. comments user X did not downvote)?”- but that’s less useful, and more likely to catch the false positives of norm-enforcing users downvoting a repeated norm-breaker.
The past 80+ comments from me have all had at least one downvote. There is no reasonable way to interpret this other than as having a stalker.
And the solution to how not to catch false positives is to use some common sense. You’re never going to have an automated algorithm that can detect every instance of abuse, but even an instance that is not detectable by automatic means can be detectable if someone with sufficient database access takes a look when it is pointed out to them.
There is no reasonable way to interpret this other than as having a stalker.
Suppose we find the list of users who downvoted your recent comments, and there are fifteen users on that list, each of whom is an active poster in their own right. What conclusion would you draw from that?
(It may be that, when we actually find that list, there is one account, or a handful of mostly inactive accounts, that represent almost all of the downvotes, in which case ‘stalker’ is a reasonable conclusion. But it’s not the only way the data could turn out.)
And the solution to how not to catch false positives is to use some common sense.
Common sense is costly. The point to doing this algorithmically is that you get a query result that says “these are the twenty cases that might be karmassassination” instead of “these are the twenty thousand cases that might be karmassassination” or “these are the zero cases that might be karmassassination.”
It’s also not particularly wise to run this check just on people who complain- part of the point of this is to prevent karmassassins from driving users away, which hasn’t happened to the people who stuck around to complain (somewhat)- and at least a few users have a habit of downvoting any comments complaining about karma loss because they don’t like comments that complain about karma loss, and so they’ll be extra likely to show up on that list.
Suppose we find the list of users who downvoted your recent comments, and there are fifteen users on that list, each of whom is an active poster in their own right. What conclusion would you draw from that?
I’d conclude that this is an extremely weird statistical anomaly which is not one user moderating down comments, but looks almost exactly like it is. One user doing a lot of downmods has to apply the downmods to separate comments, so his downmods are spread out. 15 users producing the same total number of downmods independently of each other would produce something a lot closer to a Poisson distribution with an expected value of 1, and there should be a number of comments that have zero downmods just by chance.
And the solution to how not to catch false positives is to use some common sense. You’re never going to have an aytomated algorithm that can detect every instance of abuse, but even an instance that is not detectable by automatic means can be detectable if someone with sufficient database access takes a look when it is pointed out to them.
Right on. The solution to karma abuse isn’t some sophisticated algorithm. It’s extremely simple database queries, in plain english along the lines of “return list of downvotes by user A, and who was downvoted,” “return downvotes on posts/comments by user B, and who cast the vote,” and “return lists of downvotes by user A on user B.”
It’s extremely simple database queries, in plain english along the lines of “return list of downvotes by user A, and who was downvoted,” “return downvotes on posts/comments by user B, and who cast the vote,” and “return lists of downvotes by user A on user B.”
And then what will you do with that data? If you find that GrumpyCat666 cast most of the downvotes, does that mean that GrumpyCat666 is a karmassassin, or that GrumpyCat666 is one of the gardeners?
(I can’t find the link now, but early on there was a coded rule to prevent everyone from downvoting more than their total karma. This prevented a user whose name I don’t recall, who had downvoted about some massive fraction of all the comments the site had received, from downvoting any more comments, but this was seen as not helpful for the site, since that person was making the junk less visible.)
Can you give us some details on how the votes are stored in the server? This may be difficult/impossible to do in an offline fashion if the right sort of data isn’t available.
I suspect that as with site modifications, those of us suggesting ways to find downvote stalkers would do best to figure out how LW works and do as much of the work as possible ourselves. So in this case, that’d probably mean downloading LW source code, figuring out the database structure, thinking of approaches to finding downvote stalkers, formalising them as database queries, then trying to get someone with database access to security check then run those queries. I suspect this because from what I gather Eliezer and those with database access (e.g. presumably Trike) tend to be busy enough or doing important enough other things that they are not willing to or it is not worth their time to do all this themselves, so we should do as much of it as possible to make things quicker for them.
Small amount of money to mouth: I did read through some of the webpages surrounding LW’s source code, downloaded it, and spent a little time trying to figure out how the site and database work. But by the time I got to the point of looking at the code, I had little enough temporary motivation left and the relation of the scripts to each other and the difficulty of figuring out where to start was enough that I didn’t get very far before I burned out for that night and haven’t looked again since. :z
A guide to (learning) LW’s code and database (even if just a few paragraphs along the lines of ‘Start by looking at the main article display script, then move on to...’ or commenting the scripts or something) might be higher leverage at this point with respect to improving the site than submitting small code improvements, since it might encourage several others to submit improvements. On the other hand, part of me suspects that the set of people held back just by that might actually be quite small (polarisation of would-be contributors into hardcore and indifferent with few in the middle—‘if they were going to do it, they would have done it by now’).
Given the distribution of coding ability here, it certainly seems ridiculous how slow stuff like this gets done, and I think it’s due to trivial inconveniences, ugh fields, etc., of which figuring out the site and how to submit code etc. is possibly a large part.
Since Eliezer’s response, I have slightly decreased my distribution over the level of downvote stalking, but there is still way too much evidence for me to honestly believe that there aren’t any downvote stalkers; it would take at least an explanation of exactly what had been tried and possibly significant knowledge of database structure to convince me it’s not happening at this point. So at present I defy the data.
The server needs to explicitly remember every vote from every users for the interface where anybody can change or retract any of their past votes to be possible.
Right- but if it doesn’t have a timestamp, then it’s difficult to determine whether or not one user downvoted another user many times in a few minutes, which is a more reliable sign of the karmassassination problem than just how many times one user has downvoted another user.
You could go off comment timestamp- “has user X downvoted a contiguous block of comments from user Y, or are there holes (i.e. comments user X did not downvote)?”- but that’s less useful, and more likely to catch the false positives of norm-enforcing users downvoting a repeated norm-breaker.
The past 80+ comments from me have all had at least one downvote. There is no reasonable way to interpret this other than as having a stalker.
And the solution to how not to catch false positives is to use some common sense. You’re never going to have an automated algorithm that can detect every instance of abuse, but even an instance that is not detectable by automatic means can be detectable if someone with sufficient database access takes a look when it is pointed out to them.
Suppose we find the list of users who downvoted your recent comments, and there are fifteen users on that list, each of whom is an active poster in their own right. What conclusion would you draw from that?
(It may be that, when we actually find that list, there is one account, or a handful of mostly inactive accounts, that represent almost all of the downvotes, in which case ‘stalker’ is a reasonable conclusion. But it’s not the only way the data could turn out.)
Common sense is costly. The point to doing this algorithmically is that you get a query result that says “these are the twenty cases that might be karmassassination” instead of “these are the twenty thousand cases that might be karmassassination” or “these are the zero cases that might be karmassassination.”
It’s also not particularly wise to run this check just on people who complain- part of the point of this is to prevent karmassassins from driving users away, which hasn’t happened to the people who stuck around to complain (somewhat)- and at least a few users have a habit of downvoting any comments complaining about karma loss because they don’t like comments that complain about karma loss, and so they’ll be extra likely to show up on that list.
I’d conclude that this is an extremely weird statistical anomaly which is not one user moderating down comments, but looks almost exactly like it is. One user doing a lot of downmods has to apply the downmods to separate comments, so his downmods are spread out. 15 users producing the same total number of downmods independently of each other would produce something a lot closer to a Poisson distribution with an expected value of 1, and there should be a number of comments that have zero downmods just by chance.
Right on. The solution to karma abuse isn’t some sophisticated algorithm. It’s extremely simple database queries, in plain english along the lines of “return list of downvotes by user A, and who was downvoted,” “return downvotes on posts/comments by user B, and who cast the vote,” and “return lists of downvotes by user A on user B.”
And then what will you do with that data? If you find that GrumpyCat666 cast most of the downvotes, does that mean that GrumpyCat666 is a karmassassin, or that GrumpyCat666 is one of the gardeners?
(I can’t find the link now, but early on there was a coded rule to prevent everyone from downvoting more than their total karma. This prevented a user whose name I don’t recall, who had downvoted about some massive fraction of all the comments the site had received, from downvoting any more comments, but this was seen as not helpful for the site, since that person was making the junk less visible.)