Linkpost: Francesca v Harvard

Link post

Just when I thought the Gino case came to a close, I started reading Francesca Gino’s defense. It paints her as a belabored academic unfairly penalized by misleading and incompetent investigators, as well as a biased New Yorker reporter.

I’m reposting this here as I think it’s useful to see both sides of every case, in an interesting parallel to recent, more local, events.

_____

Breaking My Silence

I have been sitting in near silence for over three months, since learning that HBS was putting me on unpaid administrative leave, and banning me from campus, teaching and research. It has been shattering to watch my career being decimated and my reputation completely destroyed. It has been hard to see how this situation impacted those around me—my family, my mentors, my collaborators and my students.

The information that has been available to the public, and the analysis posted by critics, may sound compelling. But the information is incomplete and misleading. The record needs to be corrected. This website is my attempt to do so.

Let that correction begin with this simple and unambiguous statement: I absolutely did not commit academic fraud.

____

In an earlier post, I refuted Data Colada’s critique of the PNAS paper by demonstrating that:

Data Colada cherry-picked the data it chose to include in its analysis,

Data Colada excluded a third condition in the study,

Data Colada excluded 2 of the 3 dependent variables in its analysis, and

Data Colada completely misrepresented the Excel calcChain function to buttress their claims.

Together, this allowed Data Colada to create an enormously misleading impression about my work. Importantly, I also demonstrated that if one were to exclude all of the data observations that Data Colada claimed should have been considered “suspicious,” the findings of the study still hold, suggesting there was no motivation for me to manipulate data.

In a more recent post, Data Colada doubled down on their claims, arguing that HBS’ investigation found additional evidence of data manipulation in this particular study.

What is this additional evidence?

In a nutshell, HBS claimed the following:

HBS claimed that it was able to obtain the original (unpublished) dataset for the study in question.

HBS claimed that, with the help of a forensics firm, it was able to compare this original dataset to the dataset published on OSF.

HBS claimed that this comparison yielded significant discrepancies.

HBS claimed that these discrepancies were evidence of fraud on my part.

Now that I’ve been able to enlist the help of my own forensics team, I have been able to conduct my own investigation into this matter. In the analysis below, I offer my findings and refute each of the claims put forth by HBS.

[...]

Studies like this one often involve multiple dataset files.


It took a while for my team and me to unpack the evidence in this case, but our unequivocal conclusion is that HBS used the wrong dataset in its investigation. They used the July 16 HBS version, when they should have the July 16 OG version.

Before I explain how we came to this conclusion, let me note that I do not believe Maidstone is to blame here. On the contrary—and this is a significant detail—Maidstone repeatedly expressed their lack of certainty that the July 16 HBS version was the dataset they should be relying on for their analysis. In their report, they repeatedly included caveats about this, noting that there was reason to believe there might be additional datasets they were not privy to, and it was therefore possible they were looking at the wrong file. Maidstone also explicitly noted that it was relying on “the Client’s description of provenance” (the client being HBS) in its analysis, in a clear attempt to distance itself from the possibility that the July 16 HBS version was not the correct dataset to be using.

As I said, I believe the July 16 OG file is the version that HBS should have used in its investigation. This file has the following characteristics:

It contains all of data from the July 13th version, except for some data that appears to have been discarded or corrected.

It contains some additional data that that is not included in the July 13th version.

It is completely consistent with the data posted on OSF, with few exceptions (analysis entries with some summary statistics).

The following sections explore each of these characteristics.

____