Could auto-generated troll scores reduce Twitter and Facebook harassments?

There’s been a lot of discussion in the last few yeas on the problem of hateful behaviour on social media such as Twitter and Facebook. How can this problem be solved? Twitter and Facebook could of course start adopting stricter policies towards trolls and haters. They could remove more posts and tweets, and ban more users. So far, they have, however, been relatively reluctant to do that. Another more principled problem with this approach is that it could be seen as a restriction on the freedom of speech (especially if Twitter and Facebook were ordered to do this by law).

There’s another possible solution, however. Using sentiment analysis, you could give Twitter and Facebook users a “troll score”. Users whose language is hateful, offensive, racist, etc, would get a high troll score.* This score would in effect work as a (negative) reputation/​karma score. That would in itself probably incentivize trolls to improve. However, if users would be allowed to block (and make invisible the writings by) any user whose troll score is above a certain cut-off point (of their choice), that would presumably incentivize trolls to improve even more.

Could this be done? Well, it’s already been shown to be possible to infer your big five personality traits, with great accuracy, from what you’ve written and liked, respectively, on Facebook. The tests are constructed of the basis of correlations between data from standard personality questionnaires (more than 80′000 Facebook users filled in such tests on the behalf of YouAreWhatYouLike, who constructed one of the Facebook tests) and Facebook writings or likes. Once it’s been established that, e.g. extraverted people tend to like certain kinds of posts, or use certain kinds of words, this knowledge can be used to predict the level of extraversion of Facebook users who haven’t taken the questionnaire.

This suggest that there are no principled reasons a reliable troll score couldn’t be constructed with today’s technology. However, a problem is that while there are agreed criteria for what is to count as an extraverted person, there are no agreed criteria for what counts as a troll. Also, it seems you couldn’t use questionnaires, since people who actually do behave like trolls online would be discinlined to admit that they do in a questionnaire.

One way to proceed could instead be this. First, you could define in rather general and vague terms what is to count as trolling—say “racism”, “vicious attacks”, “threats of violence”, etc. You could then use two different methods to go from this vague definition to a precise score. The first is to let a number of sensible people give their troll scores of different Facebook posts and tweets (using the general and vague definition of what is to count as trolling). You would feed this into your algorithms, which would learn which combinations of words are characteristic of trolls (as judged by these people), and which arent’t. The second is to simply list a number of words or phrases which would count as characteristic of trolls, in the sense of the general and vague definition. This latter method is probably less costly—particularly if you can generate the troll-lexicon automatically, say from existing dictionaries of offensive words—but also probably less accurate.

In any case, I expect it to be possible to solve this problem. The next problem is: who would do this? Facebook and Twitter should be able to construct the troll score, and to add the option of blocking all trolls, but do they want to? The risk is that they will think that the possible down-side to this is greater than the possible up-side. If people start disliking this rather radical plan, they might leave en masse, whereas if they like it, well, then trolls could potentially disappear, but it’s unlikely that this will affect their bottom line drastically. Thus it’s not clear that they will be more positive to this idea than they are to conventional banning/​moderating methods.

Another option is for an outside company to create a troll score using Facebook or Twitter data. I don’t know whether that’s possible at present—whether you’d need Facebook and Twitter’s consent, and whether they’d then be willing to give it. It seems you definitely would need it in order for the troll score to show up on your standard Facebook/​Twitter account, and in order to enable users to block all trolls.

This second problem is thus much harder. A troll score could probably be constructed by Facebook and Twitter, but potentially they are not very likely to want to do it. Any suggestions on how to get around this problem would be appreciated.

My solution is very similar to the LessWrong solution to the troll problem. Just like you can make low karma users invisible on LessWrong, you would be able to block (and make invisible the writings by) Facebook and Twitter users with a high troll score. A difference is, though, that whereas karma is manually generated (by voting) the troll score would be automatically generated from your writings (for more on this distinction, see here).

One advantage of this method, as opposed to conventional moderation methods, is that it doesn’t restrict freedom of speech in the same way. If trolls were blocked by most users, you’d achieve much the same effect as you would from bannings (the trolls wouldn’t be able to speak to anyone), but in a very different way: it would result from lots of blockings from individual users, who presumably have a full right to block anyone, rather than from the actions of a central admin.

Let me finish with one last caveat. You could of course extend this scheme, and construct all sorts of scores—such as a “liberal-conservative score”, with whose help you could block anyone whose political opinions are insufficiently close to yours. That would be a very bad idea, in my view. Scores of this sort should only be used to combat harassment, threats and other forms of anti-social behaviour, and not to exclude any dissenter from discussion.

* I here use “troll” in the wider sense which “equate[s] trolling with online harassment” rather than in the narrower (and original) sense according to which a troll is “a person who sows discord on the Internet by starting arguments or upsetting people, by posting inflammatory, extraneous, or off-topic messages in an online community (such as a newsgroup, forum, chat room, or blog) with the deliberate intent of provoking readers into an emotional response or otherwise disrupting normal on-topic discussion” (Wikipedia).