Therefore, I now announce that Eugine_Nier is permanently banned from posting on LessWrong. This decision is final and will not be changed in response to possible follow-up objections.
Unfortunately, it looks like while a ban prevents posting, it does not actually block a user from casting votes. I have asked jackk to look into the matter and find a way to actually stop the downvoting. Jack indicated earlier on that it would be technically straightforward to apply a negative karma modifier to Eugine’s account, and wiping out Eugine’s karma balance would prevent him from casting future downvotes. Whatever the easiest solution is, it will be applied as soon as possible.
Questions:
How are you going to deal with socks?
Are you going to be implementing a more systematic process for detecting karma abuses?
Can those who have been negatively affected by this receive an adjustment?
3a. If you are considering karma adjustments, could you please do them in a way that restores percentages rather then points? I, for one, don’t care about my “fake internet points” very much, but the ratio of upvotes to downvotes is VERY useful to me as a barometer for the overall integrity of my thought processes. (If others who have been affected by this disagree, please speak up.)
Does the system keep track about individual downvotes (who downvoted what)? If yes, then it could be possible to simply revert all votes ever by Eugine. Which should solve all the problems: everyone would have the same total karma and comment karma as if this whole thing never happened.
It has to—otherwise you wouldn’t be able to see what YOU upvoted/downvoted.
Also, otherwise you would be able to upvote or downvote something multiple times.
So clearly, it has to track somewhere.
If you guys need a SQL guy to help do some development work to make meta-moderation easier, let me know; I’ll happily volunteer a few hours a week.
EDIT: AAAUUUGH REDDIT’S DB USES KEY-VALUE PAIRS AIIEEEE IT ONLY HAS TWO TABLES OH GOD WHY WHY SAVE ME YOG-SOTHOTH I HAVE GAZED INTO THE ABYSS AAAAAAAIIIIGH okay. I’ll still do it. whimper
Maybe that’s why volunteer dev work for LW is so hard to come by. Everybody takes one look at the DB and decides they would prefer a very long vacation in Sarlacc, Tatooine.
Didn’t even get to the point of getting the DB up and running when I looked into it before I ran out of motviation (at that time). LW-hacking is not particularly accessible, though it’s not clear how high making it more accessible is as a priority.
The Reddit guys really, really dislike doing schema updates at their scale. They were getting very slow, and their replication setup was not happy about being told to, say, index a new column while people are doing lots of reads and writes at the same time. So they eventually said “to hell with it; we’ll just make a document database, with no schema, and handle consistency problems by not handling them. Man, do not even ask us about joins.” This seems to have made them much happier than the ‘better’ database design they used to use, which is important when you’re a too-small team dealing with terrifying scaling issues, and you know that a lot of people are watching you because they are the ones causing the scaling issues.
This design sure does make writing SQL queries a pain, though, and it’s less than ideal for a site like Less Wrong, which doesn’t do much changing the code.
Structured tables. One for posts, one for comments, one or more for karma and so on, with appropriately typed columns for each attribute such things have. Alternatively if the data really is unstructured then I’d use a key-value store like Cassandra or something.
(For the record many modern key-value stores didn’t exist when the Reddit code was originally written).
Seconding this. A proper relational database would look something like this:
CREATE TABLE Users
(
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
username VARCHAR(250),
passwordHash VARCHAR(250),
firstname VARCHAR(250),
lastname VARCHAR(250),
description VARCHAR(MAX),
dateCreated DATETIME NOT NULL DEFAULT GETDATE(),
dateLoggedIn DATETIME NOT NULL DEFAULT GETDATE(),
active CHAR(1)
);
CREATE TABLE Themes
(
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(250),
description VARCHAR(MAX),
css VARCHAR(MAX),
dateCreated DATETIME NOT NULL DEFAULT GETDATE(),
dateEdited DATETIME NOT NULL DEFAULT GETDATE(),
active CHAR(1)
);
CREATE TABLE Forums
(
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(250),
description VARCHAR(MAX),
users_id_owner INT NOT NULL FOREIGN KEY REFERENCES Users(id),
themes_id INT NOT NULL FOREIGN KEY REFERENCES Themes(id),
dateCreated DATETIME NOT NULL DEFAULT GETDATE(),
dateEdited DATETIME NOT NULL DEFAULT GETDATE(),
active CHAR(1)
);
CREATE TABLE Posts
(
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
forums_id INT NOT NULL FOREIGN KEY REFERENCES Forums(id),
posts_id_parent INT NOT NULL FOREIGN KEY REFERENCES Posts(id),
users_id_poster INT NOT NULL FOREIGN KEY REFERENCES Users(id),
title VARCHAR(250) NOT NULL,
text VARCHAR(MAX) NOT NULL,
dateCreated DATETIME NOT NULL DEFAULT GETDATE(),
dateEdited DATETIME NOT NULL DEFAULT GETDATE(),
active CHAR(1)
);
CREATE TABLE Votes
(
value INT NOT NULL,
posts_id INT NOT NULL FOREIGN KEY REFERENCES Posts(id),
users_id_voter INT NOT NULL FOREIGN KEY REFERENCES Users(id),
dateCreated DATETIME NOT NULL DEFAULT GETDATE()
);
-- constraint: only one vote per post per user
ALTER TABLE Votes ADD CONSTRAINT pk_Votes PRIMARY KEY (posts_id,user_id)
With that schema, all you’d have to do to see someone’s effect on another person’s karma is:
SELECT SUM(VALUE) FROM Votes
WHERE users_id_voter = @Voter
AND posts_id IN
(SELECT id FROM Posts WHERE users_id_poster = @User)
I’m hoping that the fact that your total karma restricts the amount of downvoting that you can do would limit the usefulness of socks for this purpose. Of course there are ways to get around that, but it’s an inconvenience for the downvoters. If there looks to be a problem anyway, we’ll try to figure something out.
Are you going to be implementing a more systematic process for detecting karma abuses?
Would need to figure out one first. Many of the proposals I’ve seen so far require code changes.
Can those who have been negatively affected by this receive an adjustment?
jackk mentioned the possibility of reversing Eugine’s votes by running a script to upvote the comments that he had downvoted. We can do that if the people who were targeted have an interest in it.
It’s better than nothing, but as mentioned before, I’d prefer something that systematically eliminates the downvotes rather than upvoting over them:
Let’s say I’ve made 1600 comments, received +2400 “legitimate” upvotes, and −400 “legitimate” downvotes.
Thus, I should have a karma of 2000 (86% positive). But along comes Eugine, and downvotes everything, giving me another −1600. This puts my karma at 400 (55% positive). You then run a script to upvote everything he downvoted, giving me +1600 karma. This puts me at 2000 (66% positive).
As you can see, I’m STILL below the 70% positive that Eliezer mentioned as his intuitive threshold for “quality contributors”, even though in reality I should be well above that threshold.
This is, in fact, what pissed me off about my karmassassination in the first place—my ‘fake internet points’ don’t matter to me, but my ratio of upvotes to downvotes DOES, because I use it to track how likely it is that I have systematic flaws in my reasoning. This breaks down when the majority of my up- and down-voting comes from one or two concentrated sources, even if one of those sources is directly countering the other.
I don’t know how extensively this site’s source code has been modified from the reddit default, but in r2/models/vote.py we have:
class VotesByAccount(tdb_cassandra.DenormalizedRelation)
class LinkVotesByAccount(VotesByAccount)
class CommentVotesByAccount(VotesByAccount)
Python isn’t currently in my active language cache, so I’m a little rusty dragging through all the dependencies; I’ll try to spend this weekend getting up to speed with Python and see if I can help sort out a generic “wipe out a user’s full voting history” script that can be safely run.
jackk mentioned the possibility of reversing Eugine’s votes by running a script to upvote the comments that he had downvoted.
Villiam’s solution is better, I think. The system certainly keeps track of who downvoted what and allows reversions, because I can see and revert my own up or downvotes.
Further improvement: Remove only downvotes against users he specifically targeted. Whatever remains is probably still a valid signal. That might be a more complex script, though.
Find all sets of Eugine’s votes on comments by particular users, filter out any sets such that there are less than ten votes within that set or the ratio of upvotes to downvotes is greater than, say, 0.2, and reverse any downvotes in the remaining sets? That sounds like it should be compactly doable with SQL, although I don’t know a thing about the LW database.
Total karma won’t restrict people like Eugine at all. The vast bulk of his karma seemed to come from the monthly rationality quote threads, where ten minutes of web surfing and copy/paste can get you 10-100 positive karma. You can even loot old monthly threads if you want, people will still think it’s worth upvoting if they even remember it’s been posted before.
IMHO the monthly quote threads (and possibly other similar types of thread) should not contribute to karma total.
Not really a problem. To gain a lot of downvote power, short of creating a bunch of circle-upvote socks, you’d need to comment or write a lot, and longtime commenters like Eugine are generally easy to spot: everyone has idiosyncratic ideas, ways of phrasing things, writing styles, references and calculations… (without even getting into stylometrics). For example, if I were banned today and surfaced under another sock a month from now, I’d be spotted quickly—just look for the new account that uses lots of hyphens, semicolons, lists, quotations and paraphrases, etc in discussing topics like statistical & experimental methodology. Similarly, Eugine has a lot of idiosyncratic interests (global warming, the fall of the west, conservative family values and so on).
This is the same reason the worst special-interest trolls on Wikipedia didn’t benefit much from socking: they had too clear a fingerprint in their arguments and writings.
True, but in my experience, Eugine’s primary karma engine was karma-mining the Rationalist Quotes page; someone could simply commit to ONLY posting there, and build a pretty substantial resource pool rather quickly.
Eugine’s primary karma engine was karma-mining the Rationalist Quotes page
Nah. The quotes make up <1/5th of his top-ranked comments, and you can see for yourself: load http://www.ibiblio.org/weidai/lesswrong_user.php?u=Eugine_Nier , wait for it to fetch all his comments, “sort by: points”, “hide parents”, copy-paste down to, say, his comments with +9 karma, and then look at the composition:
Of his comments ranked >= 9 points, 20⁄108 or <1/5 were on rationality quote pages. I suppose he could be getting much more karma from masses of lower-ranked comments on quotes pages, but that seems a bit unlikely and more work than I want to do at the moment.
Even a casual inspection of his comments page will reveal that he posted a lot in threads other than quote threads, that his comments were of reasonably good quality, and that they were frequently upvoted (and occasionally downvoted). I don’t think there could be any system that would have stopped him from mass downvoting people by manipulating what counts for karma, as he was basically a contributing member of comment society.
He did have a lot of good comments, but he also had a lot of very negative comments. Hypothetically a system could look for people with a very wide range of scores and flag them for deeper inspection.
The serious answer is that the people who were downvoted noticed that they were downvoted. That was the whole point. At that moment, they should contact a moderator and report a suspicion. And we should make this visible somehow...
Anyway, the main damage was from knowing that someone mass-downvotes you anonymously, and you don’t know who, and you can’t defend. (And that it keeps happening to multiple people, for months.) This shouldn’t happen again, because it would be easier to fix the next time.
Using reddit’s database schema? Challenge accepted. I’m at work right now (writing SQL queries for my college, in fact), but I’ll gladly contribute something useful when I get home.
EDIT: This is a lot more difficult than anticipated. :( I’m going to have to do some serious research before I can produce something useful, given reddit’s flat kvp schema.
Interesting- I don’t think percentage is a useful metric (it can easily be a metric of controversy rather than thought quality). The really concerning thing about mass downvoting is what it can do to the perception of comments when they are being initially read- retroactive adjustments will not matter as much.
Yeah, I noticed a lot of that—if Eugine was the first person to get to one of my comments, it had a lot higher chance of being downvoted further, even if it was similar to other comments that got upvoted when he didn’t get to them until later.
Questions:
How are you going to deal with socks?
Are you going to be implementing a more systematic process for detecting karma abuses?
Can those who have been negatively affected by this receive an adjustment?
3a. If you are considering karma adjustments, could you please do them in a way that restores percentages rather then points? I, for one, don’t care about my “fake internet points” very much, but the ratio of upvotes to downvotes is VERY useful to me as a barometer for the overall integrity of my thought processes. (If others who have been affected by this disagree, please speak up.)
Does the system keep track about individual downvotes (who downvoted what)? If yes, then it could be possible to simply revert all votes ever by Eugine. Which should solve all the problems: everyone would have the same total karma and comment karma as if this whole thing never happened.
It has to—otherwise you wouldn’t be able to see what YOU upvoted/downvoted.
Also, otherwise you would be able to upvote or downvote something multiple times.
So clearly, it has to track somewhere.
If you guys need a SQL guy to help do some development work to make meta-moderation easier, let me know; I’ll happily volunteer a few hours a week.
EDIT: AAAUUUGH REDDIT’S DB USES KEY-VALUE PAIRS AIIEEEE IT ONLY HAS TWO TABLES OH GOD WHY WHY SAVE ME YOG-SOTHOTH I HAVE GAZED INTO THE ABYSS AAAAAAAIIIIGH okay. I’ll still do it. whimper
GIVE THAT USER UPVOTES FOR BRAVERY. Thank you.
I was scrolling through, saw this comment and reread ialdabaoth’s comment and upvoted, which I wouldn’t have without yours. upvoted.
Well, that explains a couple of things.
Maybe that’s why volunteer dev work for LW is so hard to come by. Everybody takes one look at the DB and decides they would prefer a very long vacation in Sarlacc, Tatooine.
Didn’t even get to the point of getting the DB up and running when I looked into it before I ran out of motviation (at that time). LW-hacking is not particularly accessible, though it’s not clear how high making it more accessible is as a priority.
When did you last try? You should be able to more-or-less go
git checkout
→vagrant up
and have everything pretty much ready to go. https://github.com/tricycle/lesswrong/wiki/Development-VM-ImageThe Reddit guys really, really dislike doing schema updates at their scale. They were getting very slow, and their replication setup was not happy about being told to, say, index a new column while people are doing lots of reads and writes at the same time. So they eventually said “to hell with it; we’ll just make a document database, with no schema, and handle consistency problems by not handling them. Man, do not even ask us about joins.” This seems to have made them much happier than the ‘better’ database design they used to use, which is important when you’re a too-small team dealing with terrifying scaling issues, and you know that a lot of people are watching you because they are the ones causing the scaling issues.
This design sure does make writing SQL queries a pain, though, and it’s less than ideal for a site like Less Wrong, which doesn’t do much changing the code.
Being fairly ignorant of databases… how would you have laid it out better, in a general sense?
Structured tables. One for posts, one for comments, one or more for karma and so on, with appropriately typed columns for each attribute such things have. Alternatively if the data really is unstructured then I’d use a key-value store like Cassandra or something.
(For the record many modern key-value stores didn’t exist when the Reddit code was originally written).
Seconding this. A proper relational database would look something like this:
With that schema, all you’d have to do to see someone’s effect on another person’s karma is:
EDIT: Wow, formatting is a pain.
It’s heartwarming to see off-the-cuff SQL that includes foreign key constraints.
Heartwarming enough to offer me a job? ;)
EDIT: Downvoted? Ouch...
I’m hoping that the fact that your total karma restricts the amount of downvoting that you can do would limit the usefulness of socks for this purpose. Of course there are ways to get around that, but it’s an inconvenience for the downvoters. If there looks to be a problem anyway, we’ll try to figure something out.
Would need to figure out one first. Many of the proposals I’ve seen so far require code changes.
jackk mentioned the possibility of reversing Eugine’s votes by running a script to upvote the comments that he had downvoted. We can do that if the people who were targeted have an interest in it.
It’s better than nothing, but as mentioned before, I’d prefer something that systematically eliminates the downvotes rather than upvoting over them:
Let’s say I’ve made 1600 comments, received +2400 “legitimate” upvotes, and −400 “legitimate” downvotes.
Thus, I should have a karma of 2000 (86% positive). But along comes Eugine, and downvotes everything, giving me another −1600. This puts my karma at 400 (55% positive). You then run a script to upvote everything he downvoted, giving me +1600 karma. This puts me at 2000 (66% positive).
As you can see, I’m STILL below the 70% positive that Eliezer mentioned as his intuitive threshold for “quality contributors”, even though in reality I should be well above that threshold.
This is, in fact, what pissed me off about my karmassassination in the first place—my ‘fake internet points’ don’t matter to me, but my ratio of upvotes to downvotes DOES, because I use it to track how likely it is that I have systematic flaws in my reasoning. This breaks down when the majority of my up- and down-voting comes from one or two concentrated sources, even if one of those sources is directly countering the other.
Ah, I see. That’s a reasonable request, I’ll ask if there’s anything that can be done about it.
Places to start looking:
I don’t know how extensively this site’s source code has been modified from the reddit default, but in r2/models/vote.py we have:
Python isn’t currently in my active language cache, so I’m a little rusty dragging through all the dependencies; I’ll try to spend this weekend getting up to speed with Python and see if I can help sort out a generic “wipe out a user’s full voting history” script that can be safely run.
Villiam’s solution is better, I think. The system certainly keeps track of who downvoted what and allows reversions, because I can see and revert my own up or downvotes.
Further improvement: Remove only downvotes against users he specifically targeted. Whatever remains is probably still a valid signal. That might be a more complex script, though.
Find all sets of Eugine’s votes on comments by particular users, filter out any sets such that there are less than ten votes within that set or the ratio of upvotes to downvotes is greater than, say, 0.2, and reverse any downvotes in the remaining sets? That sounds like it should be compactly doable with SQL, although I don’t know a thing about the LW database.
That’s why I hedged a bit. I know SQL is capable of doing such a thing, but I don’t know anything about the LW database either.
Total karma won’t restrict people like Eugine at all. The vast bulk of his karma seemed to come from the monthly rationality quote threads, where ten minutes of web surfing and copy/paste can get you 10-100 positive karma. You can even loot old monthly threads if you want, people will still think it’s worth upvoting if they even remember it’s been posted before.
IMHO the monthly quote threads (and possibly other similar types of thread) should not contribute to karma total.
Not really a problem. To gain a lot of downvote power, short of creating a bunch of circle-upvote socks, you’d need to comment or write a lot, and longtime commenters like Eugine are generally easy to spot: everyone has idiosyncratic ideas, ways of phrasing things, writing styles, references and calculations… (without even getting into stylometrics). For example, if I were banned today and surfaced under another sock a month from now, I’d be spotted quickly—just look for the new account that uses lots of hyphens, semicolons, lists, quotations and paraphrases, etc in discussing topics like statistical & experimental methodology. Similarly, Eugine has a lot of idiosyncratic interests (global warming, the fall of the west, conservative family values and so on).
This is the same reason the worst special-interest trolls on Wikipedia didn’t benefit much from socking: they had too clear a fingerprint in their arguments and writings.
True, but in my experience, Eugine’s primary karma engine was karma-mining the Rationalist Quotes page; someone could simply commit to ONLY posting there, and build a pretty substantial resource pool rather quickly.
Nah. The quotes make up <1/5th of his top-ranked comments, and you can see for yourself: load http://www.ibiblio.org/weidai/lesswrong_user.php?u=Eugine_Nier , wait for it to fetch all his comments, “sort by: points”, “hide parents”, copy-paste down to, say, his comments with +9 karma, and then look at the composition:
Of his comments ranked >= 9 points, 20⁄108 or <1/5 were on rationality quote pages. I suppose he could be getting much more karma from masses of lower-ranked comments on quotes pages, but that seems a bit unlikely and more work than I want to do at the moment.
gwern: Testing our hypotheses since 2009.
Thanks for the info; I was not expecting the data to show that. It does indicate that the problem will be smaller than I feared.
Even a casual inspection of his comments page will reveal that he posted a lot in threads other than quote threads, that his comments were of reasonably good quality, and that they were frequently upvoted (and occasionally downvoted). I don’t think there could be any system that would have stopped him from mass downvoting people by manipulating what counts for karma, as he was basically a contributing member of comment society.
He did have a lot of good comments, but he also had a lot of very negative comments. Hypothetically a system could look for people with a very wide range of scores and flag them for deeper inspection.
Now rewrite what you said as an SQL query...
:D
The serious answer is that the people who were downvoted noticed that they were downvoted. That was the whole point. At that moment, they should contact a moderator and report a suspicion. And we should make this visible somehow...
Anyway, the main damage was from knowing that someone mass-downvotes you anonymously, and you don’t know who, and you can’t defend. (And that it keeps happening to multiple people, for months.) This shouldn’t happen again, because it would be easier to fix the next time.
Using reddit’s database schema? Challenge accepted. I’m at work right now (writing SQL queries for my college, in fact), but I’ll gladly contribute something useful when I get home.
EDIT: This is a lot more difficult than anticipated. :( I’m going to have to do some serious research before I can produce something useful, given reddit’s flat kvp schema.
Interesting- I don’t think percentage is a useful metric (it can easily be a metric of controversy rather than thought quality). The really concerning thing about mass downvoting is what it can do to the perception of comments when they are being initially read- retroactive adjustments will not matter as much.
Yeah, I noticed a lot of that—if Eugine was the first person to get to one of my comments, it had a lot higher chance of being downvoted further, even if it was similar to other comments that got upvoted when he didn’t get to them until later.
Herd mentality is scary.