Agreed that improved incentives for truth-seeking would improve details across the board, while local procedural patches would tend to be circumvented.
alternative metrics (altmetrics) such as
How many people downloaded it
How much it has been discussed on Twitter
How many websites link to it
The caliber of the scientists who have recommended it
How many people have saved it in a reference manager like Mendeley or Zotero
The first three metrics seem like they could even more strongly encourage sexy bogus findings by giving the general public more of a role: the science press seem to respond strongly to press releases and unsubstantiated findings, as do website hits (I say this based on the “most emailed” and “most read” categories at the NYTimes science section).
The first three metrics seem like they could even more strongly encourage sexy bogus findings by giving the general public more of a role
Reference manager data could have the same effect, despite reference managers being disproportionately used by researchers rather than laypeople.
Using myself as an example, I sometimes save interesting articles about psychology, medicine, epidemiology and such that I stumble on, even though I’m not officially in any of those fields. If a lot of researchers are like me in this respect (admittedly a big if) then sexy, bogus papers in popular generalist journals stand a good chance of bubbling to the top of Mendeley/Zotero/etc. rankings.
Come to think of it, a handful of the papers I’ve put in my Mendeley database are there because I think they’re crap, and I want to keep a record of them! This raises the comical possibility of papers scoring highly on altmetrics because scientists are doubtful of them!
(jmmcd points out that PageRanking users might help, although even that’d rely on strongly weighted researchers being less prone to the behaviours I’m talking about.)
This is an issue with efforts to encourage replication and critique of dubious studies: in addition to wasting a lot of resources replicating false positives, you have to cite the paper you’re critiquing, which boosts its standing in mechanical academic merit assessments like those used in much UK science funding.
15 years ago, the academic search engine Citeseer was designed not just with the goal of finding academic papers, identifying which ones were the same, and counting citations, but, as indicated in its name, showing the user the context of the citations, to see if they were positive or negative.
I’ve occasionally wished for this myself. I look forward to semantic analysis being good enough to apply to academic papers, so computers can estimate the proportion of derogatory references to a paper instead of mechanically counting all references as positive.
I don’t necessarily endorse the specific metrics cited. I have further thoughts on how to get around issues of the type that you mention, which I’ll discuss in a future post.
Yes, but the next line mentioned PageRank, which is designed to deal with those types of issues. Lots of inward links doesn’t mean much unless the people (or papers, or whatever, depending on the semantics of the graph) linking to you are themselves highly ranked.
A fundamental problem seems to be that there is a lower prior for any given hypothesis, driven by the increased number of researchers, use of automation, and incentive to go hypothesis-fishing.
Wouldn’t a more direct solution be to simply increase the significance threshold required in the field?
A fundamental problem seems to be that there is a lower prior for any given hypothesis, driven by the increased number of researchers, use of automation, and incentive to go hypothesis-fishing.
That doesn’t lower the pre-study prior for hypotheses, it (in combination with reporting bias) reduces the likelihood ratio a reported study gives you for the reported hypothesis.
Wouldn’t a more direct solution be to simply increase the significance threshold required in the field?
Increasing the significance threshold would mean that adequately-powered honest studies would be much more expensive, but those willing to use questionable research practices could instead up the ante and use the QRPs more aggressively. That could actually make the published research literature worse.
That doesn’t lower the pre-study prior for hypotheses, it (in combination with reporting bias) reduces the likelihood ratio a reported study gives you for the reported hypothesis.
Respectfully disagree. The ability to cheaply test hypotheses allows researchers to be less discriminating. They can check a correlation on a whim. Or just check every possible combination of parameters simply because they can. And they do.
That is very different from selecting a hypothesis out of the space of all possible hypotheses because it’s an intuitive extension of some mental model. And I think it absolutely reduces the pre-study priors for hypotheses, which impacts the output signal even if no QRPs are used.
I’d take the favor of a handful of highly-qualified specialists in the field (conventional peer-review) over a million ‘likes’ on facebook any day. And this is coming from someone who agrees the traditional system is deeply flawed.
Something like PLoS’ model is more reasonable: Publish based on the quality of the research, not the impact of the findings. Don’t impose artificial ‘page limits’ on the number of papers that can be published per issue. Encourage open access to everyone. Make it mandatory to release all experimental data and software that is needed to reproduce the results of the paper. At the same time, encourage a fair and balanced peer-review process.
I’ve never published in PLoS, by the way, but I will probably be sending my next papers to them.
I would hope that if there were a public web application, it would take 20-100 different statistics, and allow people to choose which to privilege. Not sure if it’s the reader’s or the website’s responsibility to choose worthwhile stats to focus on, especially if they became standardized and other agencies were able to focus on what they wanted.
For example, I imagine that foundations could have different combination metrics, like “we found academics with over 15 papers with collectively 100 citations, which must average at least an 60% on a publicity scale and have a cost-to-influence index of over 45”. These criterion could be highly focussed on the needs and specialties of the foundation.
Agreed that improved incentives for truth-seeking would improve details across the board, while local procedural patches would tend to be circumvented.
The first three metrics seem like they could even more strongly encourage sexy bogus findings by giving the general public more of a role: the science press seem to respond strongly to press releases and unsubstantiated findings, as do website hits (I say this based on the “most emailed” and “most read” categories at the NYTimes science section).
Reference manager data could have the same effect, despite reference managers being disproportionately used by researchers rather than laypeople.
Using myself as an example, I sometimes save interesting articles about psychology, medicine, epidemiology and such that I stumble on, even though I’m not officially in any of those fields. If a lot of researchers are like me in this respect (admittedly a big if) then sexy, bogus papers in popular generalist journals stand a good chance of bubbling to the top of Mendeley/Zotero/etc. rankings.
Come to think of it, a handful of the papers I’ve put in my Mendeley database are there because I think they’re crap, and I want to keep a record of them! This raises the comical possibility of papers scoring highly on altmetrics because scientists are doubtful of them!
(jmmcd points out that PageRanking users might help, although even that’d rely on strongly weighted researchers being less prone to the behaviours I’m talking about.)
This is an issue with efforts to encourage replication and critique of dubious studies: in addition to wasting a lot of resources replicating false positives, you have to cite the paper you’re critiquing, which boosts its standing in mechanical academic merit assessments like those used in much UK science funding.
We would need a scientific equivalent of the “nofollow” attribute in HTML. A special kind of citation meaning: “this is wrong”.
15 years ago, the academic search engine Citeseer was designed not just with the goal of finding academic papers, identifying which ones were the same, and counting citations, but, as indicated in its name, showing the user the context of the citations, to see if they were positive or negative.
I’ve occasionally wished for this myself. I look forward to semantic analysis being good enough to apply to academic papers, so computers can estimate the proportion of derogatory references to a paper instead of mechanically counting all references as positive.
I don’t necessarily endorse the specific metrics cited. I have further thoughts on how to get around issues of the type that you mention, which I’ll discuss in a future post.
Yes, but the next line mentioned PageRank, which is designed to deal with those types of issues. Lots of inward links doesn’t mean much unless the people (or papers, or whatever, depending on the semantics of the graph) linking to you are themselves highly ranked.
Yep, a data-driven process could be great, but if what actually gets through the inertia is the simple version, this is an avenue for backfire.
A fundamental problem seems to be that there is a lower prior for any given hypothesis, driven by the increased number of researchers, use of automation, and incentive to go hypothesis-fishing.
Wouldn’t a more direct solution be to simply increase the significance threshold required in the field?
That doesn’t lower the pre-study prior for hypotheses, it (in combination with reporting bias) reduces the likelihood ratio a reported study gives you for the reported hypothesis.
Increasing the significance threshold would mean that adequately-powered honest studies would be much more expensive, but those willing to use questionable research practices could instead up the ante and use the QRPs more aggressively. That could actually make the published research literature worse.
Respectfully disagree. The ability to cheaply test hypotheses allows researchers to be less discriminating. They can check a correlation on a whim. Or just check every possible combination of parameters simply because they can. And they do.
That is very different from selecting a hypothesis out of the space of all possible hypotheses because it’s an intuitive extension of some mental model. And I think it absolutely reduces the pre-study priors for hypotheses, which impacts the output signal even if no QRPs are used.
I’d take the favor of a handful of highly-qualified specialists in the field (conventional peer-review) over a million ‘likes’ on facebook any day. And this is coming from someone who agrees the traditional system is deeply flawed.
Something like PLoS’ model is more reasonable: Publish based on the quality of the research, not the impact of the findings. Don’t impose artificial ‘page limits’ on the number of papers that can be published per issue. Encourage open access to everyone. Make it mandatory to release all experimental data and software that is needed to reproduce the results of the paper. At the same time, encourage a fair and balanced peer-review process.
I’ve never published in PLoS, by the way, but I will probably be sending my next papers to them.
I would hope that if there were a public web application, it would take 20-100 different statistics, and allow people to choose which to privilege. Not sure if it’s the reader’s or the website’s responsibility to choose worthwhile stats to focus on, especially if they became standardized and other agencies were able to focus on what they wanted.
For example, I imagine that foundations could have different combination metrics, like “we found academics with over 15 papers with collectively 100 citations, which must average at least an 60% on a publicity scale and have a cost-to-influence index of over 45”. These criterion could be highly focussed on the needs and specialties of the foundation.