# drnickbone comments on Can Counterfactuals Be True?

• I’m not sure the prob­lem is with English…

The is­sue arises when­ever we have a causal model with a large num­ber of micro-states, and the an­tecedent of a coun­ter­fac­tual can only be re­al­ised in wor­lds which change lots of differ­ent micro-states. The most “nat­u­ral” way of think­ing about the coun­ter­fac­tual in that case is still to make a min­i­mal change (to one sin­gle micro state e.g. a par­ti­cle de­cay­ing some­where, or an atom shift­ing an angstrom some­where) and to make it suffi­ciently far back in time to make a differ­ence. (In the Gore case, in the brain of who­ever thought up the but­terfly bal­lot, or per­haps in the brain of a jus­tice of the Supreme Court.) The prob­lem with Pearl’s calcu­lus though is that it doesn’t do that.

Here’s a toy model to demon­strate (no English). Con­sider the fol­low­ing set of struc­tural equa­tions (among Boolean micro state vari­ables):

X = 0

Y_1 = X, Y_2 = X, …, Y_10^30 = X

The model is de­ter­minis­tic so P[X = 0] = 1.

Next we define a “macro-state” vari­able Z := (Y_1 + Y2 + … + Y 10^30) /​ 10^30. Plainly in the ac­tual out­come Z = 0 and in­deed P[Z = 0] = 1.

But what if Z were equal to 1?

My un­der­stand­ing of Pearl’s se­man­tics is that to eval­u­ate this we have to in­ter­vene i.e. do(Z = 1) and this is equiv­a­lent to the multi-point in­ter­ven­tion do(Y_1 = 1 & Y_2 = 1 & … & Y_10^30 = 1). This is achieved by re­plac­ing ev­ery struc­tural equa­tion be­tween X and Y_i by the static equa­tion Y_i = 1.

Im­por­tantly, it is NOT achieved by the sin­gle-point in­ter­ven­tion X = 1, even though that is prob­a­bly the most “nat­u­ral” way to re­al­ise the coun­ter­fac­tual. So in Pearl’s no­ta­tion, we must have ~X _ (Z = 1) or in prob­a­bil­is­tic terms P[X = 0 | do(Z = 1)] = 1. Which, to be frank, seems wrong.

And we can’t “fix” this in Pearl’s se­man­tics by choos­ing the al­ter­na­tive surgery (X = 1) be­cause if P[X = 1 | do(Z = 1)] = 1 that would im­ply in Pearl’s se­man­tics that X is caused by the Yi, rather than the other way round, which is clearly wrong since it con­tra­dicts the origi­nal causal graph. Worse, even if we in­tro­duce some am­bi­guity, say­ing that X might change un­der the in­ter­ven­tion do(Z = 1), then we will still have P[X = 1 | do(Z = 1)] > 0 = P[X = 1] and this is enough to im­ply a prob­a­bil­is­tic causal link from the Y_i to X which is still con­trary to the causal graph.

So I think this is a case where Pearl’s anal­y­sis gets it wrong.

• Be­fore I an­a­lyze this ap­par­ent para­dox in any depth, I want to be sure I un­der­stand your crit­i­cism. There are three things about this com­ment on which I am un­clear.

1.) The num­ber of states can­not be rele­vant to the para­dox from a the­o­ret­i­cal stand­point, be­cause noth­ing in Pearl’s calcu­lus de­pends on the num­ber of states. If this does pose a prob­lem, it only poses a prob­lem in so far as it cre­ates an ap­par­ent para­dox, that is, what­ever al­gorithm hu­mans use to parse the coun­ter­fac­tual “What if Z were 1?” is differ­ent from the Pearl’s calcu­lus. A pri­ori, this is not a dealbreaker, un­less it can also be shown the hu­man al­gorithm does bet­ter.

2.) If Yi = X, then there is a causal link be­tween Yi and X. In­deed, there is a causal link be­tween ev­ery X and ev­ery Yi. Con­di­tion­ing on any of the Yi im­me­di­ately fixes the value of ev­ery other vari­able.

3.) You say the prob­lem isn’t with English, but then talk about “the most nat­u­ral way to re­al­ize a coun­ter­fac­tual.” I don’t know what that means, other than as an ar­ti­fact of the hu­man causal learn­ing al­gorithm.

Or am I mi­s­un­der­stand­ing you com­pletely?

• Thanks for tak­ing the time to think/​com­ment. It may help us to fix a refer­ence which de­scribes Pearl’s think­ing and his calcu­lus. There are sev­eral of his pa­pers available on­line, but this one is pretty com­pre­hen­sive: ftp://​ftp.cs.ucla.edu/​​pub/​​stat_ser/​​r284-reprint.pdf “Bayesi­anism and Causal­ity, Or, Why I am only a Half-Bayesian”.

Now onto your points:

1) You are cor­rect that noth­ing in Pearl’s calcu­lus varies de­pend­ing on the num­ber of vari­ables Yi which causally de­pend on X. For any num­ber of Yi, the in­ter­ven­tion do(Z = 1) breaks all the links be­tween X and the Yi and doesn’t change the vale of X at all. Also, there is no “para­dox” within Pearl’s calcu­lus here: it is in­ter­nally con­sis­tent.

The real difficulty is that the calcu­lus just doesn’t work as a full con­cep­tual anal­y­sis of coun­ter­fac­tu­als, and this be­comes in­creas­ingly clear the more Yi vari­ables we add. It is a bit un­for­tu­nate, be­cause while the calcu­lus is el­e­gant in its own terms, it does ap­pears that con­cep­tual anal­y­sis is what Pearl was at­tempt­ing. He re­ally did in­tend his “do” calcu­lus to re­flect how we usu­ally un­der­stand coun­ter­fac­tu­als, only stated more pre­cisely. Pearl was not con­sciously propos­ing a “re­vi­sion­ist” ac­count to the effect: “This is how I’m go­ing to define coun­ter­fac­tu­als for the sake of get­ting some math to work. If your ex­ist­ing defi­ni­tion or in­tu­ition about coun­ter­fac­tu­als doesn’t match that defi­ni­tion, then sorry, but it still won’t af­fect my defi­ni­tion.” Ac­cord­ingly, it doesn’t help to say “Reg­u­lar in­tu­itions say one thing, Pearl’s calcu­lus says an­other, but the calcu­lus is bet­ter, there­fore the calcu­lus is right and in­tu­itions are wrong”. You can get away with that in re­vi­sion­ist ac­counts/​defi­ni­tions but not in reg­u­lar con­cep­tual anal­y­sis.

2) The struc­tural equa­tions do in­deed im­ply there is a causal link from the X to the Yi. But there is NO causal link in the op­po­site di­rec­tion from the Yi to the X, or from any Yi to any Yj. The causal graph is di­rected, and the struc­tural equa­tions are asym­met­ric. Note that in Pearl’s mod­els, the struc­tural equa­tion Yi = X is differ­ent from the re­verse struc­tural equa­tion X = Yi, even though in reg­u­lar logic and prob­a­bil­ity the­ory these are equiv­a­lent. This point is re­ally quite es­sen­tial to Pearl’s treat­ment, and is made clear by the refer­enced doc­u­ment.

3) See point 1. Pearl’s calcu­lus is try­ing to analyse coun­ter­fac­tu­als (and causal re­la­tions) as we usu­ally un­der­stand them, not to pro­pose a re­vi­sion­ist ac­count. So ev­i­dence about how we (nat­u­rally) in­ter­pret coun­ter­fac­tu­als (in both the Gore case and the X, Y case) is en­tirely rele­vant here.

In­ci­den­tally, if you want my one sen­tence view, I’d say that Pearl is cor­rectly analysing a cer­tain sort of coun­ter­fac­tual but not the gen­eral sort he thinks he is analysing. Con­sider these two coun­ter­fac­tu­als:

If A were to hap­pen, then B would hap­pen.

If A were to be made to hap­pen (by out­side in­ter­ven­tion) then B would hap­pen.

I be­lieve that these are differ­ent coun­ter­fac­tu­als, with differ­ent an­tecedents, and so they can have differ­ent truth val­ues. It looks to me like Pearl’s “do” calcu­lus cor­rectly analy­ses the sec­ond sort of coun­ter­fac­tual, but not the first.

(Edited this com­ment to fix ty­pos and a bro­ken refer­ence.)

• Okay. So ac­cord­ing to Causal­ity (first edi­tion, cause I’m poor), The­o­rem 7.1.7, the al­gorithm for calcu­lat­ing the coun­ter­fac­tual P( (Y= y)_(X = x) | e) -- which rep­re­sents the state­ment “If X were x, then Y would be y, given ev­i­dence e”—has three stages:

1. Ab­duc­tion; use the prob­a­bil­ity dis­tri­bu­tion P(x, y| E = e).

2. Ac­tion; perform do(X = x).

3. Calcu­late p(Y = y) rel­a­tive to the new graph model and its up­dated joint prob­a­bil­ity dis­tri­bu­tion.

In our spe­cific case, we want to calcu­late P (X = 0_(Z = 1)). There’s no ev­i­dence to con­di­tion on, so ab­duc­tion does noth­ing.

To perform do(Z = 1), we delete ev­ery ar­row point­ing from the Yi’s to Z. The new prob­a­bil­ity dis­tri­bu­tion, p(x, yi | do(Z = 1)) is given by p(x, yi, 1) when z = 1 and zero oth­er­wise. Since the origi­nal prob­a­bil­ity dis­tri­bu­tion as­signed prob­a­bil­ity one only to the state (x = 0, yi = 0, z = 0), the new prob­a­bil­ity dis­tri­bu­tion is uniformly zero.

I now no longer fol­low your calcu­la­tion of P(X=0_(Z=1)). In par­tic­u­lar:

My un­der­stand­ing of Pearl’s se­man­tics is that to eval­u­ate this we have to in­ter­vene i.e. do(Z = 1) and this is equiv­a­lent to the multi-point in­ter­ven­tion do(Y1 = 1 & Y2 = 1 & … & Y10^30 = 1). This is achieved by re­plac­ing ev­ery struc­tural equa­tion be­tween X and Yi by the static equa­tion Y_i = 1.

The in­ter­ven­tion do(Z = 1) does not ma­nipu­late the Yi. The for­mula I used to calcu­late p(X = 0 | do(Z = 1)) is the trun­cated fac­tor­iza­tion for­mula given in sec­tion 3.2.3.

I sud­denly wish I had sat down and calcu­lated this out first, rather than ar­gue from prin­ci­ples. I hear my mother’s voice in the back­ground tel­ling me to “do the math,” as is her habit.

• You missed the point here that Z is a “macro-state” vari­able, which is defined to be the av­er­age of the Yi vari­ables.

It is not ac­tu­ally a sep­a­rate vari­able on the causal graph, and it is not caused by the Yi vari­ables. This means that the in­ter­ven­tion do(Z = 1) can only be re­al­ised on the causal graph by do(Y1 = 1, Y2 = 1, …, Y_10^30 = 1) which was what I stated a few posts ago. You are cor­rect that the ab­duc­tion step is not needed as this is a de­ter­minis­tic ex­am­ple.

• Then why is P( X = 1 | do(Yi = 1) ) = 1? If I delete from the graph ev­ery ar­row en­ter­ing each Yi, I’m left with a graph empty of edges; the new joint pdf is still uniformly zero.

• In Pearl’s calcu­lus, it isn’t!

If you look back at my above posts, I de­duce that in Pearl’s calcu­lus we will get P[X = 0 | do (Z = 1)] = P[X = 0 | do(Yi = 1 for all i)] = 1. We agree here with what Pearl’s calcu­lus says.

The prob­lem is that the coun­ter­fac­tual in­ter­pre­ta­tion of this is “If the av­er­age value of the Yi were 1, then X would have been 0”. And that seems plain im­plau­si­ble as a coun­ter­fac­tual. The much more plau­si­ble coun­ter­fac­tual back­tracks to change X, al­low­ing all the Yi to change to 1 through a sin­gle change in the causal graph, namely “If the av­er­age value of the Yi were 1, then X would have been 1″.

No­tice the anal­ogy to the Gore coun­ter­fac­tual. If Gore were pres­i­dent on 9/​11, he wouldn’t sud­denly have be­come pres­i­dent (the equiv­a­lent of a mass dele­tion of all the causal links to the Yi). No, he would have been pres­i­dent since Jan­uary, be­cause of a micro-change the pre­vi­ous Fall (equiv­a­lent to a back­tracked change to the X). I be­lieve you agreed that the Gore coun­ter­fac­tual needs to back­track to make sense, so you agree with back­track­ing in prin­ci­ple? In that case, you should dis­agree with the Pearl treat­ment of coun­ter­fac­tu­als, since they never back­track (they can’t).

• If you look back at my above posts, I de­duce that in Pearl’s calcu­lus we will get P[X = 0 | do (Z = 1)] = P[X = 0 | do(Yi = 1 for all i)] = 1. We agree here with what Pearl’s calcu­lus says.

No, we dis­agree. My calcu­la­tions sug­gest that P[X = 0 | do(Yi = 1 for all i)] = P[X = 1 | do(Yi = 1 for all i)] = 0. The in­ter­ven­tion falls out­side the re­gion where the origi­nal joint pdf has pos­i­tive mass. The in­ter­ven­tion do(X = 1) also an­nihilates the origi­nal joint pdf, be­cause there is no re­gion of pos­i­tive mass in which X = 1.

I still don’t un­der­stand why you don’t think the prob­lem is a lan­guage prob­lem. Pearl’s coun­ter­fac­tu­als have a spe­cific mean­ing, so of course they don’t mean some­thing else from what they mean, even if the other mean­ing is a more plau­si­ble in­ter­pre­ta­tion of the coun­ter­fac­tual (again, what­ever that means—I’m still not sure what “more plau­si­ble” is sup­posed to mean the­o­ret­i­cally).

The prob­lem is that the coun­ter­fac­tual in­ter­pre­ta­tion of this is “If the av­er­age value of the Yi were 1, then X would have been 0”. And that seems plain im­plau­si­ble as a coun­ter­fac­tual. The much more plau­si­ble coun­ter­fac­tual back­tracks to change X, al­low­ing all the Yi to change to 1 through a sin­gle change in the causal graph, namely “If the av­er­age value of the Yi were 1, then X would have been 1″.

I think the prob­lem is that when you in­ter­vene to make some­thing im­pos­si­ble hap­pen, the re­sult­ing sys­tem no longer makes sense.

I be­lieve you agreed that the Gore coun­ter­fac­tual needs to back­track to make sense, so you agree with back­track­ing in prin­ci­ple?

Yes. (I as­sume you mean “If Gore was pres­i­dent dur­ing 9/​11, he wouldn’t have in­vaded Iraq.”)

In that case, you should dis­agree with the Pearl treat­ment of coun­ter­fac­tu­als, since they never back­track (they can’t).

Why should I dis­agree with Pearl’s treat­ment of coun­ter­fac­tu­als that don’t back­track?

Isn’t the de­ci­sion of whether or not a given coun­ter­fac­tual back­tracks in its most “nat­u­ral” in­ter­pre­ta­tion largely a lin­guis­tic prob­lem?

• No, we dis­agree. My calcu­la­tions sug­gest that P[X = 0 | do(Yi = 1 for all i)] = P[X = 1 | do(Yi = 1 for all i)] = 0. The >in­ter­ven­tion falls out­side the re­gion where the origi­nal joint pdf has pos­i­tive mass. The in­ter­ven­tion do(X = 1) also >an­nihilates the origi­nal joint pdf, be­cause there is no re­gion of pos­i­tive mass in which X = 1.

I don’t think that’s cor­rect. My un­der­stand­ing of the in­ter­ven­tion do(Yi = 1 for all i)] is that it cre­ates a dis­con­nected graph, in which the Yi all have the value 1 (as stipu­lated by the in­ter­ven­tion) but X re­tains its origi­nal mass func­tion P[X = 0] = 1. The causal links from X to the Yi are sev­ered by the in­ter­ven­tion, so it doesn’t mat­ter that the in­ter­ven­tion has zero prob­a­bil­ity in the origi­nal graph, since the in­ter­ven­tion cre­ates a new graph. (In­ter­ven­tions into de­ter­minis­tic sys­tems of­ten will have zero prob­a­bil­ity in the origi­nal sys­tem, though not in the in­ter­vened one.) On the other hand, you claim to be fol­low­ing Pearl_2012 whereas I’ve been read­ing Pearl_2001 and there might have been some differ­ences in his treat­ment of im­pos­si­ble in­ter­ven­tions… I’ll check this out.

For now, just sup­pose the origi­nal dis­tri­bu­tion over X was P[X = 0] = 1 - ep­silon and P[X = 1] = ep­silon for a very small ep­silon. Would you agree that the in­ter­ven­tion do(Yi = 1 for all i) now is in the area of pos­i­tive mass func­tion, but still doesn’t change the dis­tri­bu­tion over X so we still have P[X = 0 | do(Yi = 1 for all i)] = 1 - ep­silon and P[X = 1 | do(Yi = 1 for all i)] = ep­silon?

Isn’t the de­ci­sion of whether or not a given coun­ter­fac­tual back­tracks in its most “nat­u­ral” in­ter­pre­ta­tion largely a >lin­guis­tic prob­lem?

I still think it’s a con­cep­tual anal­y­sis prob­lem rather than a lin­guis­tic prob­lem. How­ever per­haps we should play the taboo game on “lin­guis­tic” and “con­cep­tual” since it seems we mean differ­ent things by them (and pos­si­bly what you mean by “lin­guis­tic” is close to what I mean by “con­cep­tual” at least where we are talk­ing about con­cepts ex­pressed in English).

Thanks any­way.

• You seem to be done, so I won’t be­la­bor things fur­ther; I just want to point out that I didn’t claim to have a more up­dated copy of Pearl (in fact, I said the op­po­site). I doubt there’s been any change to his al­gorithm.

All this ASCII math is con­fus­ing the heck out of me, any­way.

EDIT: Oh, dear. I see how hor­ribly wrong I was now. The ver­sion of the for­mula I was look­ing at said “(for­mula) for (un-in­ter­vened vari­ables) con­sis­tent with (in­ter­ven­tion), and zero oth­er­wise” and be­cause it was a de­ter­minis­tic sys­tem my mind con­flated the two kinds of con­sis­tency. I’m re­ally sorry to have blown a lot of your free time on my own in­com­pe­tence.

• Thanks for that.… You just saved me a few hours ad­di­tional re­search on Pearl to find out whether I’d got it wrong (and mis­ap­plied the calcu­lus for in­ter­ven­tions that are im­pos­si­ble in the origi­nal sys­tem)!

In­ci­den­tally, I’m quite a fan of Pearl’s work, and think there should be ways to ad­just the calcu­lus to al­low rea­son­able back­track­ing coun­ter­fac­tu­als as well as for­ward-track­ing ones (i.e. ways to find a min­i­mal in­ter­ven­tion fur­ther back in the graph, one which then makes the an­tecedent come out true..) But that’s prob­a­bly worth a sep­a­rate post, and I’m not ready for it yet.

• Thanks for that.… You just saved me a few hours ad­di­tional re­search on Pearl to find out whether I’d got it wrong (and mis­ap­plied the calcu­lus for in­ter­ven­tions that are im­pos­si­ble in the origi­nal sys­tem)!

In­ci­den­tally, I’m quite a fan of Pearl’s work, and think there should be ways to ad­just the calcu­lus to al­low rea­son­able back­track­ing coun­ter­fac­tu­als as well as for­ward-track­ing ones (i.e. ways to find a min­i­mal in­ter­ven­tion fur­ther back in the graph, one which then makes the an­tecedent come out true..) But that’s prob­a­bly worth a sep­a­rate post, and I’m not ready for it yet.

• “Bayesi­anism and Causal­ity, Or, Why I am only a Half-Bayesian”.

As a (mostly ir­rele­vant) side note, this is Pearl_2001, who is a very differ­ent per­son from Pearl_2012.

Also, there is no “para­dox” within Pearl’s calcu­lus here: it is in­ter­nally con­sis­tent.

I’m us­ing the word para­dox in the sense of “puz­zling con­clu­sion”, not “log­i­cal in­con­sis­tency.” Hence “ap­par­ent para­dox”, which can’t make sense in the con­text of the lat­ter defi­ni­tion.

It is a bit un­for­tu­nate, be­cause while the calcu­lus is el­e­gant in its own terms, it does ap­pears that con­cep­tual anal­y­sis is what Pearl was at­tempt­ing. He re­ally did in­tend his “do” calcu­lus to re­flect how we usu­ally un­der­stand coun­ter­fac­tu­als, only stated more pre­cisely. Pearl was not con­sciously propos­ing a “re­vi­sion­ist” ac­count to the effect: “This is how I’m go­ing to define coun­ter­fac­tu­als for the sake of get­ting some math to work. If your ex­ist­ing defi­ni­tion or in­tu­ition about coun­ter­fac­tu­als doesn’t match that defi­ni­tion, then sorry, but it still won’t af­fect my defi­ni­tion.”

The hu­man causal al­gorithm is fre­quently, hor­rifi­cally, wrong. A the­ory that at­tempts to model it is, I heav­ily sus­pect, less ac­cu­rate than Pearl’s the­ory as it stands, at least be­cause it will fre­quently pre­fer to use the post hoc in­fer­ence when it is more ap­pro­pri­ate to in­fer a mu­tual cause.

Ac­cord­ingly, it doesn’t help to say “Reg­u­lar in­tu­itions say one thing, Pearl’s calcu­lus says an­other, but the calcu­lus is bet­ter, there­fore the calcu­lus is right and in­tu­itions are wrong”. You can get away with that in re­vi­sion­ist ac­counts/​defi­ni­tions but not in reg­u­lar con­cep­tual anal­y­sis.

No, I didn’t say that. In my ear­lier com­ments I won­dered un­der what con­di­tions the “nat­u­ral” in­ter­pre­ta­tion of coun­ter­fac­tu­als was prefer­able. If reg­u­lar in­tu­ition dis­agrees with Pearl, there are at least two pos­si­bil­ities: in­tu­ition is wrong (i.e., a bias ex­ists) or Pearl’s calcu­lus does worse than in­tu­ition, which means the calcu­lus needs to be up­dated. In a sense, the calcu­lus is already a “re­vi­sion­ist” ac­count of the hu­man causal learn­ing al­gorithm, though I dis­ap­prove of the con­no­ta­tions of “re­vi­sion­ist” and be­lieve they don’t ap­ply here.

But there is NO causal link in the op­po­site di­rec­tion from the Yi to the X, or from any Yi to any Yj. The causal graph is di­rected, and the struc­tural equa­tions are asym­met­ric.

Yes, but my ques­tion here was whether or not the graph model was ac­cu­rate. Purely de­ter­minis­tic graph mod­els are weird in that they are ob­ser­va­tion­ally equiv­a­lent not just with other graphs with the same v-struc­ture, but with any graph with the same skele­ton, and even worse, one can always add an ar­row con­nect­ing the ends of any path. I un­der­stand bet­ter now that the only pur­pose be­hind a de­ter­minis­tic graph model is to fix one out of this vast set of ob­ser­va­tion­ally equiv­a­lent mod­els. I was con­fused by the plethora of ob­ser­va­tion­ally equiv­a­lent de­ter­minis­tic graph mod­els.

In­ci­den­tally, if you want my one sen­tence view, I’d say that Pearl is cor­rectly analysing a cer­tain sort of coun­ter­fac­tual but not the gen­eral sort he thinks he is analysing. Con­sider these two coun­ter­fac­tu­als:

If A were to hap­pen, then B would hap­pen.

If A were to be made to hap­pen (by out­side in­ter­ven­tion) then B would hap­pen.

As far as I can tell, the first is given by P(B | A), and the sec­ond is P(B_A). Am I miss­ing some­thing re­ally fun­da­men­tal here?

I’ve done the calcu­la­tions for your model, but I’m go­ing to put them in a differ­ent com­ment to sep­a­rate out math­e­mat­i­cal is­sues from philo­soph­i­cal ones. This com­ment is already too long.

• Cou­ple of points. You say that “the hu­man causal al­gorithm is fre­quently, hor­rifi­cally, wrong”.

But re­mem­ber here that we are dis­cussing the hu­man coun­ter­fac­tual al­gorithm, and my un­der­stand­ing of the ex­per­i­men­tal ev­i­dence re coun­ter­fac­tual rea­son­ing (e.g. on cases like Kennedy or Gore) is that it is pretty con­sis­tent across hu­man sub­jects, and be­tween “naive” sub­jects (taken straight off the street) vs “ex­pert” sub­jects (who have been think­ing se­ri­ously about the mat­ter). There is also quite a lot of con­sis­tency on what con­stitues a “plau­si­ble” ver­sus a “far out” coun­ter­fac­tual, and much stronger sense about what hap­pens in the cases with plau­si­ble an­tecedents than in cases with weird an­tecedents (such as what Cae­sar would have done if fight­ing in Korea). It’s also in­ter­est­ing that there are rather a lot of for­mal analy­ses which al­most match the hu­man al­gorithm, but not quite, and that there is quite a lot of con­sen­sus on the counter ex­am­ples (that they gen­uinely are counter ex­am­ples, and that the for­mal anal­y­sis gets it wrong).

What pretty much ev­ery­one agrees is that coun­ter­fac­tu­als in­volv­ing macro-vari­able an­tecedents as­sume some back-track­ing be­fore the time of the an­tecedent, and that a small micro-state change to set up the an­tecedent is more plau­si­ble than a sud­den macro-change which in­volves breaks across mul­ti­ple micro-states.

And on your other point, sim­ple con­di­tion­ing P(B | A) gives re­sults more like the in­dica­tive con­di­tional (“If Oswald did not shoot Kennedy, then some­one else did”) rather than the coun­ter­fac­tual con­di­tional (“If Oswald had not shot Kennedy, then no-one else would have”) .