Neglected cause: automated fraud detection in academia through image analysis

The amyloid scandal has highlighted both the low effort placed into the detection of academic fraud and the low effort of fraudsters to avoid detection. My recent research into automated fraud detection has found a few papers that attempted to do this through writing analysis (trying to catch automated translation or AI-written papers) or data analysis (trying to identify synthesized datasets).

However, I’ve been unable to find large-scale automated attempts at fraud detection through the examination of plots in papers, despite this being how multiple high-profile cases of fraud, like Lesne (amyloid) and Schone (crystal semi-conductors) were discovered. As a researcher myself, I have a few suspicions about why.

Firstly, the use of photo editing isn’t just common, it’s essentially mandatory. Different papers have different requirements on labeling and image format, and it’s much much easier to just use photoshop to change your labels (“Plot A, Plot B2”) and plot arrangement than to fiddle through an opaque R package like oncoplot. So “evidence of photoshop” isn’t much of a signal.

Secondly, a lot of cases of detected fraud detected this way may be honest mistakes or not actually fraud. I’ve been tempted to use photoshop to change a typo in the plot axis labels instead of digging through code I haven’t used in months. Sometimes plots get mixed up during the process of submission, or a subtle bug means that a test case accidentally makes it into your final paper. Writing a paper about automated fraud detection automatically makes your own past work a target of examination, which many researchers may seek to avoid. There may be a lot more noise than signal, but focusing our analysis on prestigious publications and highly-influential papers can reduce this effect.

Thirdly, there aren’t many labeled cases of fraud available for machine learning. This can be partially worked around by automatically scanning for unambiguous flags like duplicate plots, total or partial, and artifacts like unaligned axes or composite plots (excluding aesthetic elements). The really obvious stuff that actually got people caught in the past.

By applying this analysis to every paper ever published (or maybe just the high-impact ones), we may be able to identify large conceptual errors in science and raise the bar for rigor in future scientific papers. I would like to work on an open-source website for this as a personal project. However, I also feel like someone has done something similar already, and I just don’t know about it.

Author’s Note: There’s also a case to be made for making an AI to evaluate the statistical rigor of papers, using elements like whether or not it corrected for multiplicity and how many p-values were suspiciously close to 0.05 to estimate its likelihood to replicate. Essentially codifying and automating the heuristics to detect bullshit papers I sometimes see discussed in academic circles. I expect something like this to already exist and am confused why it hasn’t been integrated into tools like semanticscholar.

Update: I found these articles about people trying to identify image duplication by hand as recently as 2020. Clearly there is a demand for this sort of thing.