Counterfactual outcome state transition parameters

To­day, my pa­per “The choice of effect mea­sure for bi­nary out­comes: In­tro­duc­ing coun­ter­fac­tual out­come state tran­si­tion pa­ram­e­ters” has been pub­lished in the jour­nal Epi­demiologic Meth­ods. The ver­sion of record is be­hind a pay­wall un­til De­cem­ber 2019, but the fi­nal au­thor manuscript is available as a preprint at arXiv.

This pa­per is the first pub­li­ca­tion about an am­bi­tious idea which, if ac­cepted by the statis­ti­cal com­mu­nity, could have sig­nifi­cant im­pact on how ran­dom­ized tri­als are re­ported. Two other manuscripts from the same pro­ject are available as work­ing pa­pers on arXiv. This blog post is in­tended as a high-level overview of the idea, to ex­plain why I think this work is im­por­tant.

Q: What prob­lem are you try­ing to solve?

Ran­dom­ized con­trol­led tri­als are of­ten con­ducted in pop­u­la­tions that differ sub­stan­tially from the clini­cal pop­u­la­tions in which the re­sults will be used to guide clini­cal de­ci­sion mak­ing. My goal is to clar­ify the con­di­tions that must be met in or­der for the ran­dom­ized trial to be in­for­ma­tive about what will hap­pen if the drug is given to a tar­get pop­u­la­tion which differs from the pop­u­la­tion that was stud­ied.

As a first step, one could at­tempt to con­struct a sub­group of the par­ti­ci­pants in the ran­dom­ized trial, such that the sub­group is suffi­ciently similar to the pa­tients you are in­ter­ested in, in terms of some ob­served baseline co­vari­ates. How­ever, this leaves open the ques­tion of how one can de­ter­mine what baseline co­vari­ates need to be ac­counted for.

In or­der to de­ter­mine this, it would be nec­es­sary to provide a pri­ori biolog­i­cal facts which would lead to the effect in one pop­u­la­tion be­ing equal to the effect in an­other pop­u­la­tion. For ex­am­ple, if we some­how knew that the effect of a drug is en­tirely de­ter­mined by some gene whose prevalence differs be­tween two coun­tries, it is pos­si­ble that when we com­pare peo­ple in Coun­try A who have the gene with peo­ple in Coun­try B who also have the gene, and com­pare peo­ple in Coun­try A who don’t have the gene with peo­ple in Coun­try B who don’t have the gene, the effect is equal be­tween the rele­vant groups. Us­ing an ex­ten­sion of this ap­proach, we can try to look for a set of baseline co­vari­ates such that the effect can be ex­pected to be ap­prox­i­mately equal be­tween two pop­u­la­tions once we make the com­par­i­sons within lev­els of the co­vari­ates.

Un­for­tu­nately, things are more com­pli­cated than this. Speci­fi­cally, we need to be more pre­cise about what we mean by the word “effect”. When in­ves­ti­ga­tors mea­sure effects, they have sev­eral op­tions available to them: They can use mul­ti­plica­tive pa­ram­e­ters (such as the risk ra­tio and the odds ra­tio), ad­di­tive pa­ram­e­ters (such as the risk differ­ence), or sev­eral other al­ter­na­tives that have fallen out of fash­ion (such as the arc­sine differ­ence). If the baseline risks differ be­tween two pop­u­la­tions (for ex­am­ple, be­tween men and women), then at most one of these pa­ram­e­ters can be equal be­tween the two groups. There­fore, a biolog­i­cal model that en­sures equal­ity of the risk ra­tio can­not also en­sure equal­ity of the risk differ­ence. The logic that de­ter­mines whether a set of co­vari­ates is suffi­cient in or­der to get effect equal­ity, is there­fore nec­es­sar­ily de­pen­dent on how we choose to mea­sure the effect.

Mak­ing things even worse, the com­monly used risk ra­tio is not sym­met­ric to the cod­ing of the out­come vari­able: Gen­er­al­iza­tions based on the ra­tio of prob­a­bil­ity of death, will give differ­ent pre­dic­tions from gen­er­al­iza­tions based on the ra­tio of prob­a­bil­ity sur­vival.. In other words, when us­ing a risk ra­tio model, your con­clu­sions are not in­var­i­ant to an ar­bi­trary de­ci­sion that was made when the per­son who con­structed the dataset de­cided whether to en­code the out­come vari­able as (death=1, sur­vival=0) or as (sur­vival=1, death=0).

The in­for­ma­tion that doc­tors (and the pub­lic) ex­tract from ran­dom­ized tri­als is of­ten in the form of a sum­mary mea­sure based on a mul­ti­plica­tive pa­ram­e­ter. For ex­am­ple, a study will of­ten re­port that a par­tic­u­lar drug “dou­bled” the effect of a par­tic­u­lar side effect, and this then be­comes the mea­sure of effect that the clini­ci­ans will use in or­der to in­form their de­ci­sion mak­ing. More­over, the stan­dard method­ol­ogy for meta-anal­y­sis is es­sen­tially a weighted av­er­age of the mul­ti­plica­tive pa­ram­e­ter from each study. Any con­clu­sion that is drawn from these stud­ies would have been differ­ent if in­ves­ti­ga­tors had cho­sen a differ­ent effect pa­ram­e­ter, or a differ­ent cod­ing scheme for the out­come vari­able. Th­ese an­a­lytic choices are rarely jus­tified by any kind of ar­gu­ment, and in­stead rely on a con­ven­tion to always use the risk ra­tio based on the prob­a­bil­ity of death. No con­vinc­ing ra­tio­nale for this con­ven­tion ex­ists.

My goal is to provide a gen­eral frame­work that al­lows an in­ves­ti­ga­tor to rea­son from biolog­i­cal facts about what set of co­vari­ates are suffi­cient to con­di­tion on, in or­der for the effect in one pop­u­la­tion to be equal to the effect in an­other, in terms of a speci­fied mea­sure of effect. While the nec­es­sary biolog­i­cal con­di­tions can at best be con­sid­ered ap­prox­i­ma­tions of the un­der­ly­ing data gen­er­at­ing mechanism, clar­ify­ing the pre­cise na­ture of these con­di­tions will be use­ful to as­sist rea­son­ing about how much un­cer­tainty there is about whether the re­sults will gen­er­al­ize to other pop­u­la­tion.

Q: What are the ex­ist­ing solu­tions to this prob­lem, and why do you think you can im­prove on them?

Re­cently, much at­ten­tion has been given to a solu­tion by Judea Pearl and Elias Barein­boim, based on an ex­ten­sion of causal di­rected acyclic graphs. Pearl and Barein­boim’s ap­proach is math­e­mat­i­cally valid and el­e­gant. How­ever, the con­di­tions that must be met in or­der for these graphs to be a rea­son­able ap­prox­i­ma­tion of the data gen­er­at­ing mechanism, are much more re­stric­tive than most tri­al­ists are com­fortable with.

Here, I am go­ing to skip a lot of de­tails about these se­lec­tion di­a­grams, and in­stead fo­cus on the spe­cific as­pect that I find prob­le­matic. Th­ese se­lec­tion di­a­grams aban­don mea­sures of effect com­pletely, and in­stead con­sider the coun­ter­fac­tual dis­tri­bu­tion of the out­come un­der the in­ter­ven­tion sep­a­rately from the coun­ter­fac­tual dis­tri­bu­tion of the out­come un­der the con­trol con­di­tion. This re­solves a lot of the prob­lems as­so­ci­ated with effect mea­sures, but it also fails to make use of in­for­ma­tion that is con­tained in how these two coun­ter­fac­tu­als re­late to each other.

Con­sider for ex­am­ple an ex­per­i­ment to de­ter­mine the effect of home­opa­thy on heart dis­ease. Sup­pose this ex­per­i­ment is con­ducted in men, and de­ter­mines that there is no effect. If we use se­lec­tion di­a­grams to rea­son about whether these con­clu­sions also hold in women, we will have to con­struct a causal graph that con­tains ev­ery cause of heart dis­ease whose dis­tri­bu­tion differs be­tween men and women, mea­sure these vari­ables and con­trol for them. Most likely, this will not be pos­si­ble, and we will con­clude that we are un­able to make any pre­dic­tion for what will hap­pen if women take home­o­pathic treat­ments. The ap­proach sim­ply does not al­low us to try to ex­trap­o­late the effect size (even when it is null), since it can­not make use of in­for­ma­tion about how what hap­pened un­der treat­ment re­lates to what hap­pens un­der the con­trol con­di­tion. The se­lec­tion di­a­gram ap­proach there­fore leaves key in­for­ma­tion on the table: In my view the in­for­ma­tion that is left out is ex­actly those pieces of in­for­ma­tion that could most re­li­ably be used to make gen­er­al­iza­tions about causal effects.

A closely re­lated point is that the Barein­boim-Pearl ap­proach leads to a con­clu­sion that meta-anal­y­sis can be con­ducted sep­a­rately in the ac­tive arm and the con­trol arm. Most meta-an­a­lysts would con­sider this idea crazy, since it ar­guably aban­dons ran­dom­iza­tion (which is an ob­jec­tive fact about how the data was gen­er­ated) in fa­vor of un­ver­ifi­able and ques­tion­able as­sump­tions en­coded in the graph, es­sen­tially claiming that all causes of the out­come have been mea­sured.

Q: What are coun­ter­fac­tual out­come state tran­si­tion pa­ram­e­ters?

Our goal is to con­struct a mea­sure of effect that al­lows us to cap­ture the re­la­tion­ship be­tween what hap­pens if treated, to what hap­pens if un­treated. We want to do this in a way that avoids the math­e­mat­i­cal prob­lems with stan­dard mea­sures of effect, and such that mag­ni­tude of the pa­ram­e­ters has a biolog­i­cal in­ter­pre­ta­tion. If we suc­ceed in do­ing this, we will be able to de­ter­mine what co­vari­ates to con­trol for on the ba­sis of ask­ing what biolog­i­cal prop­er­ties are as­so­ci­ated with the mag­ni­tude of the pa­ram­e­ters.

Coun­ter­fac­tual out­come state tran­si­tion pa­ram­e­ters are effect mea­sures that quan­tify the prob­a­bil­ity of “switch­ing” out­come state if we move be­tween coun­ter­fac­tual wor­lds. We define one pa­ram­e­ter which mea­sures the prob­a­bil­ity that the drug kills the pa­tient, con­di­tional on be­ing some­one who would have sur­vived with­out the drug, and an­other pa­ram­e­ter which mea­sures the prob­a­bil­ity that the drug saves the pa­tient, con­di­tional on be­ing some­one who would have died with­out the drug.

Im­por­tantly, these pa­ram­e­ters are not iden­ti­fied from the data, ex­cept un­der strong mono­ton­ic­ity con­di­tions. For ex­am­ple, if we be­lieve that the drug helps some peo­ple, harms other peo­ple and has no effect on a third group, there is no mono­ton­ic­ity and the method can­not be used. How­ever, it is some­times the case that the drug only op­er­ates in one di­rec­tion. For ex­am­ple, for most drugs, it is very un­likely that the drug pre­vents some­one from get­ting an aller­gic re­ac­tion to it. There­fore, its effect on aller­gic re­ac­tions is mono­tonic.

If the effect of treat­ment is mono­tonic, one of the COST pa­ram­e­ters is equal to 0 or 1, and the other pa­ram­e­ter is iden­ti­fied as the risk ra­tio. If this is a treat­ment that re­duces in­ci­dence, the COST pa­ram­e­ter as­so­ci­ated with a pro­tec­tive effect is equal to the stan­dard risk ra­tio based on the prob­a­bil­ity of death. If on the other hand the treat­ment in­creases in­ci­dence, the COST pa­ram­e­ter as­so­ci­ated with a harm­ful effect is iden­ti­fied as the re­coded risk ra­tio based on the prob­a­bil­ity of sur­vival. There­fore, if we de­ter­mine which risk ra­tio to use on the ba­sis of the COST model, the risk ra­tio is con­strained be­tween 0 and 1.

Q: Is this idea new?

The un­der­ly­ing in­tu­ition be­hind this idea is not new. For ex­am­ple, Min­del C. Sheps pub­lished a re­mark­able pa­per in the New England Jour­nal of Medicine in 1958, in which she works from the same in­tu­ition and reaches es­sen­tially the same con­clu­sions. Sheps’ clas­sic pa­per has more than 100 cita­tions in the statis­ti­cal liter­a­ture, but her recom­men­da­tions have not been adapted to any de­tectable ex­tent in ap­plied statis­ti­cal liter­a­ture. Jon Deeks pro­vided em­piri­cal ev­i­dence for the idea of us­ing the stan­dard risk ra­tio for pro­tec­tive treat­ments, and re­coded risk ra­tio for harm­ful effects, in Statis­tics in Medicine in 2012.

What is new to this pa­per, is that we for­mal­ize the in­tu­ition Sheps was work­ing from in terms of a for­mal coun­ter­fac­tual causal model, which is used as a bridge be­tween the back­ground biolog­i­cal knowl­edge and the choice of effect mea­sure. For­mal­iz­ing the prob­lem in this way al­lows us to clar­ify the scope and limits of the ap­proach, and points the di­rec­tion to how these ideas can be used to in­form fu­ture de­vel­op­ments in meta-anal­y­sis.