Estimating the consequences of device detection tech

Today I have been asked to join a European project in collaboration with the police, developing two sorts of tech:

Device identification; that is, inferring from the filters applied to an image or video, pixel defects and other information hidden in raw pixel data the device with which it was taken (which is useful when for example a criminal modifies the metadata of a picture to try to blame their crimes on somebody else)
Detection of fake media; that is, determining if a certain video has been tampered with

To be honest, I do not fully grasp the consequences this tech could have in society, so I have resolved to write this post to illustrate my informal reasoning about it.

My process of reasoning will be as follows. I will first estimate the potential impact of legitimate use cases of such a device (such as stopping human trafficking), and then estimate the potential impact of illegitimate use cases (such as stopping whistleblowers or detaining dissenters of totalitarian regimes). Then I will compare one result against the other weighted by the applicability of this tech to each case.

Let us start.

Device identification

It is not difficult to imagine how this kind of tech can be misused, but we also need to take into account legitimate uses.

In Europe, we still have a fair dose of human trafficking, in which such a tool could potentially be used to great effect (identifying who took pictures of abused people, perhaps).

According to the link above, there were about 11k identified victims of human trafficking in the EU in 2012. If the tech in question was developed and successful, we could expect it to be implemented in other parts of the first world. Let’s suppose that the volume of dealt with trafficking in the first world (roughly EU + North America + Eastern Asia + Australia) is about 3-~10 times what it is in the EU, and that the number of victims identified roughly corresponds to the number of victims saved from trafficking. That means that about 30k-~100k people are saved from trafficking every year.

Are there other areas of similar importance where this tech could be used?

Recorded police data in Europe points towards about 3k homicides per year in EU, 400k assaults (excluding sexual violence and homicides), 100k cases of sexual violence, 5M thefts (I estimated the sum of the numbers by hand, take it with a grain of salt).

All of these have not the same weight, as an homicide should weight more than pickpocketing. Also, we are not really interested in spontaneous crime, but instead on the kind of premeditated crime where police investigation can be very effective at shutting down (and thus also the detection tool).

For the sake of simplicity, let’s assume that the number of preventable felonies is about ¹⁄₁₀ of the total, and that the weight of theft is 1/10th as the other felonies. Then all the major ones (sexual assault, thefts, assaults) are comparable in magnitude to the human trafficking in importance.

Let’s conclude that the potentially preventable felonies in the first world are between 3 and 10 times the quantity of human trafficking, so between 90k and 1M people are affected by it.

Finding data about the order of importance of misuses is harder, but we can use as a proxy the number of journalists killed or persecuted in a year, since this tech could be used very effectively to track down whistle blowers and investigative journalists. According to this link, it seems like the number of journalists affected are in the order of 300 per year.

Of course, we do not care only about the journalist themselves, but about the work they were doing and never completed. It is much harder to estimate this, but looking over pages dedicated to freedom of press it seems like in most cases journalists are involved in investigations of criminal operations by minor crime syndicates. It seems reasonable to estimate that this syndicates have a number of members between 10 and 100, and that their operations cause damage to maybe people one order of magnitude above, so between 100 and 1000 people affected.

Putting the numbers together, it seems that the value that the affected journalists would have improved the life of the equivalent to between 30k and 300k people per year.

This seems implausibly high, but we have to take into account than journalists do not always succeed taking down the groups they persecute. Even then, they have to be a credible threat (or they would not be persecuted). Maybe we can assign between ¹⁄₁₀ and ¹⁄₁₀₀ chance of them succeeding in the counterfactual world where they are not imprisoned.

That means that the expected number of people that suffers from their imprisonment is between 300 and 30k.

But wait a second! Our analysis relies very heavily on the median case of investigative journalism, and it seems plausible that we should be looking instead at the most valuable interventions by journalist, which could account for most of the value of investigative journalism.

In this webpage we see an infographic showcasing the most impressive results of investigative journalism. This includes a mixture of horrifying but likely low impact interventions and big scandals which threatened entire governments.

Let’s try to estimate the number of people affected per investigation:

Secret diaries: 10M people (population of state of Parana)
Alcatel: 4M people (population of Costa Rica)
Frere’s babies: Negligible
Spirit Child: Negligible
BAE files: 500M USD ≈ 250k people
Yakunovich Leaks: 38B USD ≈ 19M people
Offshore Leaks: Unclear
Investigating Estrada: 100M people (population of Philippines)
Taxation: 190M people (population of Pakistan)

Money to life conversion has been made using GiveWell’s estimation of the quantity of lives that could be saved through charitable donations.

It seems really hard to estimate the impact of Offshore Leaks. Thankfully, I don’t think we need to; the investigation was done by a big media coalition, and I expect it would have happened anyway even with device identification tech.

The impact of this cases is dominated by investigations of governments, especially the last two on Philippines and Pakistan. This is a cherry picked collection of the most impactful cases from 2000. We could expect maybe another 10 cases like these in these years that were not covered in this case studies, but not of the order of 100. The total impact is thus close to 300M~3B people in 20 years, or 15M ~ 150M people per year.

Another possible misuse is censorship, through the localization and detainment of people over what they thought were anonymous social media posts. For example, we have this example of detainments over social media criticism of military action in Turkey, which likely violates humans rights.

How many people are yearly imprisoned over the world over censorship issues?

In the world, over 1000 people were affected by artistic censorship issues in 2016. Sadly that does not inform us about general violations of civic rights.

Looking at specific countries, during the protests in Venezuela last year, over 5k people were detained. In 2017 in Turkey about 3k people were detained over social media.

What about non conflictive countries? UK does not seem to do much better, as London has detained about 600 people in 2010 for social media posts by the Communications Acts. Fair enough, this is more controversial than the cases above.

In any case, it seems reasonable to assume than between 100 and 10k people per year and country are affected by censorship issues. There are about 200 countries in the world, so this amounts to 20k ~ 2M people directly affected by censorship issues globally per year.

Even if our estimation was crude, the number feels correct, so we will roll with it.

What dominates the negative impact, censorship or impediments to investigative journalism?

It seems unfair to compare imprisonment and potential police brutality to the more subtle changes brought by the Estrada and Pakistan Taxation cases of investigative journalism.

We will introduce a penalization factor of 10 to the investigative journalism case; it seems reasonable to assume that non imprisonment is at least 10 times better than a govn change (from a individualistic perspective). Still, the impact of investigative journalism seems to dominate civic censorship, with 1.5M ~ 15M equivalent people.

Now onwards to compare the potential good use with the potential misuse.

We need to assess the relative efficacy of a device detection tool for law enforcement as opposed to interfering with high-profile investigative journalism / civic censorship.

It seems to me that it would be really useful for civic censorship, and less but still quite useful for law enforcement. It is unclear to me how useful would it be for obfuscating investigative journalism.

We can try to make a guess. I can’t imagine that this tech would be more useful for stopping investigative journalism than for messing with civic dissenters, since the use cases are similar. On the other hand, its effect can be negligible on investigative journalism if most of the investigation remains private during investigation and key photos are only made public after the investigation is complete, without withholding attribution. From my extremely uninformed perspective, this looks like the case.

Taking into account this plausible asymmetry of usefulness, the impact of the tool on civic censorship starts dominating the impact of the tool in investigative journalism.

According to our numbers, in the best case, for each 100 people helped though law enforcement application of the tool, 2 would suffer over the improved censorship. In the worst case, for each 9 people helped though law enforcement we would have 200 people imprisoned over censorship issues.

If we think that one year of imprisonment over censorship is as bad as one year of being trafficked, then the net benefit of the detection tool seems dubious.

This is a fairly informal analysis, but enough to give me pause. Let’s take a moment to appreciate some of the weaknesses of this analysis:

We have not taken into account second-order effects of censorship, but they could potentially have society-shaping effects. We also have not taken into account opportunity costs, though at this stage of *my* life they are fairly small.

Our estimates are overall quite weak, and though I have tried to keep my estimate intervals wide to reflect this, I have most likely made mistakes. We also have made no effort to assess the relative likelihood of (mis)uses.

In the face of all of these, my tentative decision is not to help develop this tech, at least until I get better informed.

It is also interesting to see how the results of this analysis carry over to other areas. If we squint hard enough, we realize that essentially what we are comparing is the good use that could be done by law enforcement in the first world vs the possible misuses of better law enforcement tools over the world.

The results look quite poor, and I think it is easy to explain why: in the first world crime is fortunately quite low. If we look at causes of death in the first world, violence does not even register (and it is VERY overrepresented in media) when compared to disease and traffic accidents.

Given the potential for misuse of better law enforcement tools, I am extremely hesitant to recommend somebody pursuing that goal, even without taking into account opportunity costs.

Detection of fake media

While it was somewhat easy to imagine the potential applications of device identification, how fake media detection would affect the world is harder to imagine.

I am going to humbly accept that I do not understand the importance that fake media has and will have in the world, and leave this as a proposed exercise:

Exercise: What are the potential benefits and dangers of fake media detection? What is their relative importance?

My current, really uninformed view, is that fake video detection will have a minor, positive effect in distinguishing veritable sources. Maybe solve some debates and prevent people from claiming that something is a fake video when they do not want to take responsibility for something? But overall I do not think that fake videos will be something abused.

I might be terribly wrong here, but we already live in a world with lots of misinformation due to Internet and cherry picking news sources and I do not see how realistic fake videos makes things an order of magnitude worse.

It could be argued that videos are intuitively harder to fake than articles, so they carry more force among the common public. But I cannot recall being persuaded by video proof of anything important.

Maybe in investigative journalism this tech will have positive consequences, since it will prevent people from dismissing video proof as maybe fake, as I mentioned above.

More information

If you enjoyed following this kind of analysis, and want to learn how to use estimations and think with probabilities, I can recommend Think Again.

If there is anything you think I have unfairly simplified something or have more relevant data, please leave a comment (it will be relevant for my decision!).

Acknowledgements: Thank you to Tam Borine for her help editing the initial draft, and to Pablo Villalobos for comments and support.

And thanks to you for reading!