A small investigational drug trial won’t be powered to detect outliers, and you won’t be able to reliably solve that by invoking Bayesian statistics.
In large drug trials I think this is to some degree already done, but it’s limited by the extreme sketchiness of suddenly inventing new endpoints for your study after you have the data. It would probably take the form of increasing the threshold for an endpoint (for example, “No significant difference between drug and placebo was found with the planned endpoint of decreasing HAM-D ratings by 3 or more, but there were significantly more patients in the drug group who had their HAM-D ratings decrease by 10 or more”. Everyone is rightly suspicious of people who do this, because, again, changing endpoints. But if it happened enough someone would take notice. Trust me, “not coming up with clever ways to make their drug look effective for at least some people” is not one of pharmaceutical companies’ failure modes.
But keep in mind that you sort of loaded the original example by choosing something that almost never happens (someone living to 110 without any signs of aging). In a psychiatry study, what’s the most extreme example you’re going to get? Someone’s depression remits completely? Big deal. Most people’s depressive episodes remit completely after a couple of months anyway, and in 25% of people they never return (in even more people, they take many years to return, and almost no studies continue for the many years it would take to notice). In a drug trial of 10000 people (the number you gave above) hundreds or thousands of people in each group are going to have their depression remit completely; if the drug has a superpowerful effect on one person and cures her depression forever, that will get lost in noise in the way that someone living to 110 with the body of a 30 year old might not.
(it’s instructive to compare this to the way studies investigate side effects. If one person in a 10000 person study has their arms fall off, the investigators will notice, because that’s sufficiently rare as to raise suspicion it was caused by the drug. The drug will then end up with a black box warning saying “may make arms fall off.”)
Another way these sorts of outlier effects might be detected is by subgroup analyses (which are also extremely sketchy). If there is no effect in general, researchers may check whether there is an effect among men, among women, among blacks, among whites, among Latinos, among postmenopausal Burmese women who wear hats and own at least two pets and have a history of disease in their left kidney, anything that turns up a positive result. But again, this is hardly something we want to encourage.
But all these things are for investigational drugs. if we’re talking about a drug that’s already been approved and has a strong prescription history, then your worries about individualized response would get subsumed into the good responder / bad responder distinction, which is a very very big area of research which we know a lot about and when we don’t know it it’s not for lack of trying.
For example, among bipolar patients, response to lithium can be (very inconsistently) predicted by selecting for patients who have stronger family history of disease, have fewer depressive symptoms, have slower cycles, have more euthymic periods, have less of a history of drug use, start with a manic episode, demonstrate psychomotor retardation, demonstrate premorbid mood lability, lack premorbid personality disturbance, possibly have deranged serotonin metabolism in platelets, possibly have increased calcium binding to red blood cells, possibly lack the HLA-A3 antigen, possibly have a particular variant of the gene GADL1, etc etc etc.
(in practice we don’t expend much effort to check most of these things, because their predictive power is so weak that it’s almost always a worse idea than just making a best guess based on the data you have, putting someone on lithium or on an alternative, then switching if it doesn’t work)
As far as I know, psychiatrists cannot reliably predict that a given drug will improve a patient’s long-term diagnosis, and psychiatrists/psychologists cannot even reliably agree on what condition a patient is manifesting. Mental disorders appear to resist diagnosis and solution, unlike, say, a broken leg or a sucking chest wound.
Thanks for that last link, it was an interesting update on the effectiveness of psychiatry. I was weighting my knowledge of the prevalence of rotten corpses in psychology into my estimate of the effectiveness of psychiatric methods, which now seems to be conflating two very different things. Although it does still seem that the set of psychiatrists who are capable of ignoring the prevalent rotten corpses in psychology when prescribing drugs is still small enough to tip the field toward doing your own analyses. I guess i don’t have a good set of heuristics for comparing the effects of personal bias v the effects of a psychiatrist trained in psychology and prone to that field’s biases.
Yes, my example was loaded. The thought experiment was ‘weird, unrecognized by the system outlier, of personal interest to the reader,’ and whether/in-what-circumstances it should influence the reader to try the drug. If one of those circumstances is ‘pharma doesn’t try to make their drug look effective as a nootropic,’ i feel it sums my perspective a bit better than ‘pharma doesn’t try to make their drug look effective for at least some people, within the set of markets they’ve established as worth aiming marketing toward during a given time period.’
A small investigational drug trial won’t be powered to detect outliers, and you won’t be able to reliably solve that by invoking Bayesian statistics.
I think in the hypothetical he meant you’ve already won the lottery, so to speak.
The whole “medical doctors can always consistently treat medical diseases, but psychiatrists are throwing darts blindfolded” story is something of a myth
I agree, too bad for the patients who actually need help that the myth is alive and well. Psychiatry allows for this blind folded dart throwing more though since there are no simple tests, and people might be judging the whole field based on a few incompetent individuals or psychotherapy forms that have stuck for historical reasons. I don’t think you can directly compare medications to make the point like they did in that paper, since drugs make up a smaller fraction of psychiatrists’ treatment arsenal. Correct me if it’s different in the US.
(Take psychodynamic psychotherapy for example and see how popular it is for whatever reason. I doubt you’ll find such a popular rotten corpse in medicine.)I was wrong about this one apparently, thanks Yvain. If you do, I suppose it would be some surgical technique. Both psychotherapy and surgery require training so there are greater sunk costs involved.
Psychotherapy seems to work pretty well, and it’s not obvious that psychodynamic psychotherapy works less well than other sorts. See http://slatestarcodex.com/2013/09/19/scientific-freud/ . I prefer things more in the CBT vein myself, but the pro-psychodynamics people aren’t as helpless and discredited as one might think.
Thanks. Another myth, huh. This one is widespread even amongst medical professionals. Now I wonder what other myths I’ve accepted without questioning. If your blog contains more debunking of medical debunking, some pointers would be nice.
This OB article you linked to seems like a useful generalized explanation for why these kinds of myths happen. I agree many doctors seem to make that mistake, which is concerning because this is a really stupid one.
A small investigational drug trial won’t be powered to detect outliers, and you won’t be able to reliably solve that by invoking Bayesian statistics.
In large drug trials I think this is to some degree already done, but it’s limited by the extreme sketchiness of suddenly inventing new endpoints for your study after you have the data. It would probably take the form of increasing the threshold for an endpoint (for example, “No significant difference between drug and placebo was found with the planned endpoint of decreasing HAM-D ratings by 3 or more, but there were significantly more patients in the drug group who had their HAM-D ratings decrease by 10 or more”. Everyone is rightly suspicious of people who do this, because, again, changing endpoints. But if it happened enough someone would take notice. Trust me, “not coming up with clever ways to make their drug look effective for at least some people” is not one of pharmaceutical companies’ failure modes.
But keep in mind that you sort of loaded the original example by choosing something that almost never happens (someone living to 110 without any signs of aging). In a psychiatry study, what’s the most extreme example you’re going to get? Someone’s depression remits completely? Big deal. Most people’s depressive episodes remit completely after a couple of months anyway, and in 25% of people they never return (in even more people, they take many years to return, and almost no studies continue for the many years it would take to notice). In a drug trial of 10000 people (the number you gave above) hundreds or thousands of people in each group are going to have their depression remit completely; if the drug has a superpowerful effect on one person and cures her depression forever, that will get lost in noise in the way that someone living to 110 with the body of a 30 year old might not.
(it’s instructive to compare this to the way studies investigate side effects. If one person in a 10000 person study has their arms fall off, the investigators will notice, because that’s sufficiently rare as to raise suspicion it was caused by the drug. The drug will then end up with a black box warning saying “may make arms fall off.”)
Another way these sorts of outlier effects might be detected is by subgroup analyses (which are also extremely sketchy). If there is no effect in general, researchers may check whether there is an effect among men, among women, among blacks, among whites, among Latinos, among postmenopausal Burmese women who wear hats and own at least two pets and have a history of disease in their left kidney, anything that turns up a positive result. But again, this is hardly something we want to encourage.
But all these things are for investigational drugs. if we’re talking about a drug that’s already been approved and has a strong prescription history, then your worries about individualized response would get subsumed into the good responder / bad responder distinction, which is a very very big area of research which we know a lot about and when we don’t know it it’s not for lack of trying.
For example, among bipolar patients, response to lithium can be (very inconsistently) predicted by selecting for patients who have stronger family history of disease, have fewer depressive symptoms, have slower cycles, have more euthymic periods, have less of a history of drug use, start with a manic episode, demonstrate psychomotor retardation, demonstrate premorbid mood lability, lack premorbid personality disturbance, possibly have deranged serotonin metabolism in platelets, possibly have increased calcium binding to red blood cells, possibly lack the HLA-A3 antigen, possibly have a particular variant of the gene GADL1, etc etc etc.
(in practice we don’t expend much effort to check most of these things, because their predictive power is so weak that it’s almost always a worse idea than just making a best guess based on the data you have, putting someone on lithium or on an alternative, then switching if it doesn’t work)
The whole “medical doctors can always consistently treat medical diseases, but psychiatrists are throwing darts blindfolded” story is something of a myth—see for example Putting the efficacy of psychiatric and general medicine medication into perspective: review of meta-analyses
Thanks for that last link, it was an interesting update on the effectiveness of psychiatry. I was weighting my knowledge of the prevalence of rotten corpses in psychology into my estimate of the effectiveness of psychiatric methods, which now seems to be conflating two very different things. Although it does still seem that the set of psychiatrists who are capable of ignoring the prevalent rotten corpses in psychology when prescribing drugs is still small enough to tip the field toward doing your own analyses. I guess i don’t have a good set of heuristics for comparing the effects of personal bias v the effects of a psychiatrist trained in psychology and prone to that field’s biases.
Yes, my example was loaded. The thought experiment was ‘weird, unrecognized by the system outlier, of personal interest to the reader,’ and whether/in-what-circumstances it should influence the reader to try the drug. If one of those circumstances is ‘pharma doesn’t try to make their drug look effective as a nootropic,’ i feel it sums my perspective a bit better than ‘pharma doesn’t try to make their drug look effective for at least some people, within the set of markets they’ve established as worth aiming marketing toward during a given time period.’
I think in the hypothetical he meant you’ve already won the lottery, so to speak.
I agree, too bad for the patients who actually need help that the myth is alive and well. Psychiatry allows for this blind folded dart throwing more though since there are no simple tests, and people might be judging the whole field based on a few incompetent individuals or psychotherapy forms that have stuck for historical reasons. I don’t think you can directly compare medications to make the point like they did in that paper, since drugs make up a smaller fraction of psychiatrists’ treatment arsenal. Correct me if it’s different in the US.
(Take psychodynamic psychotherapy for example and see how popular it is for whatever reason. I doubt you’ll find such a popular rotten corpse in medicine.) I was wrong about this one apparently, thanks Yvain. If you do, I suppose it would be some surgical technique. Both psychotherapy and surgery require training so there are greater sunk costs involved.
Psychotherapy seems to work pretty well, and it’s not obvious that psychodynamic psychotherapy works less well than other sorts. See http://slatestarcodex.com/2013/09/19/scientific-freud/ . I prefer things more in the CBT vein myself, but the pro-psychodynamics people aren’t as helpless and discredited as one might think.
Thanks. Another myth, huh. This one is widespread even amongst medical professionals. Now I wonder what other myths I’ve accepted without questioning. If your blog contains more debunking of medical debunking, some pointers would be nice.
This OB article you linked to seems like a useful generalized explanation for why these kinds of myths happen. I agree many doctors seem to make that mistake, which is concerning because this is a really stupid one.