Meta-analysis on cognitive effects of modafinil (my bachelor thesis)
Well, meta-analyses certainly are an area of interest to me, and I was disappointed in 2012 by “Cognition Enhancement by Modafinil: A Meta-Analysis” (Kelley et al 20120) which used only 3 studies, and so was not very informative. A new meta-analysis would be great. But… I read quickly through it, and I saw no meta-analysis. Just a literature review. What’s with the post title?
Modafinil significantly improved performance in 26 out of 102 cognitive tests, but significantly decreased performance in 3 cognitive tests.
Nitpick: I really hate this use of ‘significantly’ and I ban it from my own writing. Is this referring to effect sizes or p-values?
Notably, modafinil appears to have detrimental effects on mental flexibility. Although 4 studies employed the Intra/Extradimensional Set Shift task (ID/ED), no performance improvements could be detected. Performance was even reduced in a study by Randall et al. (2004). Furthermore, Müller et al. (2012) found that subjects on modafinil had lower flexibility scores in the Abbreviated Torrance task for adults.
Eh. Absence of improvement != damage. Randal 2004 didn’t find a statistically-significant decrease (and it’s not clear whether it should, given that it reports 25 datasets for 3 groups, so hunting for decreases incurs worries about multiplicity). And I have to point out, as far as Müller et al 2012 goes, the decrease didn’t reach p<0.05 (just 0.053), and if you’re willing to accept just trending, then you should also be accepting the increase in the GEFT/Group Embedded Figures Task (p=0.08).
How important are these observations...? Well, as you found out, it can be hard to compare or meta-analyze psychology studies since studies may cover the same topic but use different sets of tests, frustrating the most obvious approach ‘just univariate meta-analyze everything!’
Reprinted from Baranski et al. (2004) without permission.
But… I read quickly through it, and I saw no meta-analysis. Just a literature review. What’s with the post title?
You’re right. I don’t remember why I wrote “meta-analysis”. (Probably because it sounds fancy and smart). I updated the title.
Is this referring to effect sizes or p-values?
p-values.
Eh. Absence of improvement != damage.
True.
...Randal 2004 didn’t find a statistically-significant decrease…
No. In Randall et al. (2004) participants in the 200 mg modafinil condition made significantly more errors (p<0,05) in the Intra/Extradimensional Set Shift task than participants in the placebo and the 100 mg modafinil condition. (The 200 mg group made on average around 27 errors. The 100 mg group around 14. The control group around 17 errors.)
Actually, you linked to a different study. The results can be found in the complete study I linked to. I can upload it if you want to see it yourself.
Reprinted from Baranski et al. (2004) without permission.
Every single graphic in this whole thing is reprinted without permission, to tell the truth. (Is this a problem?)
I’m not an academic, but my understanding was that “significantly” was a synonym for “p<0.05″ every time in academic writing. “Significantly” referring to effect size is solely the province of non-academic writing(well, that or things like history).
I’m not an academic, but my understanding was that “significantly” was a synonym for “p<0.05″ every time in academic writing.
If only it were that simple. But one of my scripts flags use of significance language, and I have seen many times ‘significant’ and variants used in scientific writing as meaning important or large.
I’m curious if you have ideas on how to deal with that.
The standard solution seems to be ‘multivariate meta-analysis’. I’ve done a little reading on the topic, but I’ve had trouble getting started with it—you need to know the correlations between the multiple outcome variables, this is typically unavailable (the data-sharing problem), and I think it only works anyway if there is at least a little bit of correlation between the multiple outcomes, while I would like to be able to collectively analyze outcomes from disjoint studies which is… less clear how to do.
This meta-analysis on meditation, has an interesting approach, they basically just analyze the effect sizes in the same “class” (averaging effect sizes within a study if there are multiple different outcomes measured in the same class).
So, their methodology is, as far as I can tell, described by these parts:
The aim of our meta-analysis was to assess the effect a mindfulness meditation intervention on health status measures. We considered the concept of health to include both physical and mental health. All outcome measures were either subsumed under “physical health”, “mental health” or were excluded from the analysis. We only included data from standardized and validated scales with established internal consistency (e.g., the Global Severity Inventory of Symptom Check List-R, Hospital Anxiety and Depression Scale, Beck Depression Inventory, Profile of Mood States, McGill-Melzack Pain-Rating Scale, Short Form 36 Health Survey, and Medical Symptom Checklist; a full list is available upon request). Also a conservative procedure was chosen to exclude relatively ambiguous or unconventional measures, e.g., spiritual experience, empathy, neuropsychological performance, quality of social support, and egocentrism.
“Mental health” constructs comprised scales such as psychological wellbeing and symptomatology, depression, anxiety, sleep, psychological components of quality of life, or affective perception of pain. “Physical health” constructs were medical symptoms, physical pain, physical impairment, and physical component of quality of life questionnaires.
...We first integrated all effect sizes within a single study by the calculation of means into two effect sizes, one for mental and one for physical health. If the sample size varied between scales of one study, we weighted them for N. Effect sizes obtained in this manner were aggregated across studies by the computation of a weighted mean, where the inverse of the estimated standard deviation for each investigation served as a weight [8].
So, they just split the effect sizes, and do an average of the 2 sets. Nothing more.
I dunno. They don’t give any references to papers or textbooks on meta-analysis to justify this procedure. It doesn’t sound very kosher to me.
From a statistical point of view, I wouldn’t expect this to work very well. I would expect a lot of heterogeneity and a very weak signal. However, they report very strong results with low heterogeneity (which I find pretty surprising). I don’t see any obvious way in which this would be “cheating”.
I don’t see any obvious way in which this would be “cheating”.
Oh, that’s easy: publication bias. If the original studies report only the measures which reached a cutoff, and the null is always true, then since their measures will generally all be on the same subjects/with the same n, their effect sizes will have to be fairly similar* and I’d expect the i^2 to be low even as the results are meaningless.
* since p is just a function of sample size & effect size, and the p threshold is fixed by convention at 0.05, and sample size n is pretty much the same across all measures—since why would you recruit a subject and then not get as much data as possible and omit lots of subjects? - only measurements with effect sizes big enough to cross the p with the fixed n will be reported.
While if each particular measure was done separately as a bunch of univariate or multivariate meta-analyses, they’d have to get access to the original data or they’d be able to see the publication bias on a measure by measure basis.
Or it might be that each measure has a weighted effect size of zero, it’s just that each study is biased towards a different measure, and so its ‘overall’ estimate is positive even though if we had combined each measure with all its siblings, every single one would net to zero.
Maybe I’m wrong about these speculations. But I hope you see why I feel uncomfortable with this ‘lump everything remotely similar together’ approach and would like to see what meta-analytic experts say about the approach.
Well, meta-analyses certainly are an area of interest to me, and I was disappointed in 2012 by “Cognition Enhancement by Modafinil: A Meta-Analysis” (Kelley et al 20120) which used only 3 studies, and so was not very informative. A new meta-analysis would be great. But… I read quickly through it, and I saw no meta-analysis. Just a literature review. What’s with the post title?
Nitpick: I really hate this use of ‘significantly’ and I ban it from my own writing. Is this referring to effect sizes or p-values?
Eh. Absence of improvement != damage. Randal 2004 didn’t find a statistically-significant decrease (and it’s not clear whether it should, given that it reports 25 datasets for 3 groups, so hunting for decreases incurs worries about multiplicity). And I have to point out, as far as Müller et al 2012 goes, the decrease didn’t reach p<0.05 (just 0.053), and if you’re willing to accept just trending, then you should also be accepting the increase in the GEFT/Group Embedded Figures Task (p=0.08).
How important are these observations...? Well, as you found out, it can be hard to compare or meta-analyze psychology studies since studies may cover the same topic but use different sets of tests, frustrating the most obvious approach ‘just univariate meta-analyze everything!’
Hah.
You’re right. I don’t remember why I wrote “meta-analysis”. (Probably because it sounds fancy and smart). I updated the title.
p-values.
True.
No. In Randall et al. (2004) participants in the 200 mg modafinil condition made significantly more errors (p<0,05) in the Intra/Extradimensional Set Shift task than participants in the placebo and the 100 mg modafinil condition. (The 200 mg group made on average around 27 errors. The 100 mg group around 14. The control group around 17 errors.)
Actually, you linked to a different study. The results can be found in the complete study I linked to. I can upload it if you want to see it yourself.
Every single graphic in this whole thing is reprinted without permission, to tell the truth. (Is this a problem?)
I’m not an academic, but my understanding was that “significantly” was a synonym for “p<0.05″ every time in academic writing. “Significantly” referring to effect size is solely the province of non-academic writing(well, that or things like history).
If only it were that simple. But one of my scripts flags use of significance language, and I have seen many times ‘significant’ and variants used in scientific writing as meaning important or large.
Sigh. People suck sometimes.
I’m curious if you have ideas on how to deal with that.
Maybe grouping the tests into different kinds of tests and fitting a hierarchical model inside those groups? Are there similar kinds of tests?
The standard solution seems to be ‘multivariate meta-analysis’. I’ve done a little reading on the topic, but I’ve had trouble getting started with it—you need to know the correlations between the multiple outcome variables, this is typically unavailable (the data-sharing problem), and I think it only works anyway if there is at least a little bit of correlation between the multiple outcomes, while I would like to be able to collectively analyze outcomes from disjoint studies which is… less clear how to do.
Right that makes sense. People rarely report the covariance matrix of the data.
Much less provide IPD/individual-patient-data which is what one really wants. The lack of data is frustrating.
Indeed.
This meta-analysis on meditation, has an interesting approach, they basically just analyze the effect sizes in the same “class” (averaging effect sizes within a study if there are multiple different outcomes measured in the same class).
That sounds like a completely disgusting approach… I’m going to have to read that and see if it’s a legitimate strategy.
They seem to get pretty strong effect sizes and low heterogeneity, so I’m curious to hear your thoughts on it.
So, their methodology is, as far as I can tell, described by these parts:
So, they just split the effect sizes, and do an average of the 2 sets. Nothing more.
I dunno. They don’t give any references to papers or textbooks on meta-analysis to justify this procedure. It doesn’t sound very kosher to me.
From a statistical point of view, I wouldn’t expect this to work very well. I would expect a lot of heterogeneity and a very weak signal. However, they report very strong results with low heterogeneity (which I find pretty surprising). I don’t see any obvious way in which this would be “cheating”.
Are you worried about something else specific?
Oh, that’s easy: publication bias. If the original studies report only the measures which reached a cutoff, and the null is always true, then since their measures will generally all be on the same subjects/with the same n, their effect sizes will have to be fairly similar* and I’d expect the i^2 to be low even as the results are meaningless.
* since p is just a function of sample size & effect size, and the p threshold is fixed by convention at 0.05, and sample size n is pretty much the same across all measures—since why would you recruit a subject and then not get as much data as possible and omit lots of subjects? - only measurements with effect sizes big enough to cross the p with the fixed n will be reported.
While if each particular measure was done separately as a bunch of univariate or multivariate meta-analyses, they’d have to get access to the original data or they’d be able to see the publication bias on a measure by measure basis.
Or it might be that each measure has a weighted effect size of zero, it’s just that each study is biased towards a different measure, and so its ‘overall’ estimate is positive even though if we had combined each measure with all its siblings, every single one would net to zero.
Maybe I’m wrong about these speculations. But I hope you see why I feel uncomfortable with this ‘lump everything remotely similar together’ approach and would like to see what meta-analytic experts say about the approach.
That’s a great point, I hadn’t been thinking about that. It amplifies the publication bias by a lot.