# SamEisenstat

Karma: 36 (LW), 2 (AF)
NewTop
Page 1

# Reflec­tive or­a­cles as a solu­tion to the con­verse Law­vere problem

29 Nov 2018 3:23 UTC
18 points
• As you say, this isn’t a proof, but it wouldn’t be too sur­pris­ing if this were con­sis­tent. There is some such that has a proof of length by a re­sult of Pavel Pudlák (On the length of proofs of fini­tis­tic con­sis­tency state­ments in first or­der the­o­ries). Here I’m mak­ing the de­pen­dence on ex­plicit, but not the de­pen­dence on . I haven’t looked at it closely, but the proof strat­egy in The­o­rems 5.4 and 5.5 sug­gests that will not de­pend on , as long as we only ask for the weaker prop­erty that will only be prov­able in length for sen­tences of length at most .

• I mi­s­un­der­stood your pro­posal, but you don’t need to do this work to get what you want. You can just take each sen­tence as an ax­iom, but de­clare that this ax­iom takes sym­bols to in­voke. This could be done by chang­ing the no­tion of length of a proof, or by tak­ing ax­ioms and with very long.

• Yeah, I had some­thing along the lines of what Paul said in mind. I wanted not to re­quire that the cir­cuit im­ple­ment ex­actly a given func­tion, so that we could see if dae­mons show up in the out­put. It seems eas­ier to define dae­mons if we can just look at in­put-out­put be­havi­our.

• I’m hav­ing trou­ble think­ing about what it would mean for a cir­cuit to con­tain dae­mons such that we could hope for a proof. It would be nice if we could find a sim­ple such defi­ni­tion, but it seems hard to make this in­tu­ition pre­cise.

For ex­am­ple, we might say that a cir­cuit con­tains dae­mons if it dis­plays more op­ti­miza­tion that nec­es­sary to solve a prob­lem. Min­i­mal cir­cuits could have dae­mons un­der this defi­ni­tion though. Sup­pose that some func­tion de­scribes the be­havi­our of some pow­er­ful agent, a func­tion is like with noise added, and our prob­lem is to pre­dict suffi­ciently well the func­tion . Then, the sim­plest cir­cuit that does well won’t bother to mem­o­rize a bunch of noise, so it will pur­sue the goals of the agent de­scribed by more effi­ciently than , and thus more effi­ciently than nec­es­sary.

• Two minor com­ments. First, the bit­strings that you use do not all cor­re­spond to wor­lds, since, for ex­am­ple, im­plies , as is a sub­the­ory of . This can be fixed by in­stead us­ing a tree of sen­tences that all di­ag­o­nal­ize against them­selves. Tsvi and I used a con­struc­tion in this spirit in A limit-com­putable, self-re­flec­tive dis­tri­bu­tion, for ex­am­ple.

Se­cond, I be­lieve that weak­en­ing #2 in this post also can­not be satis­fied by any con­stant dis­tri­bu­tion. To sketch my rea­son­ing, a trader can try to buy a se­quence of sen­tences , spend­ing 2^{-n}\$ on the $$n$$th sen­tence $$\phi_1 \wedge \dots \wedge \phi_n$$. It should choose the se­quence of sen­tences so that $$\phi_1 \wedge \dots \wedge \phi_n$$ has prob­a­bil­ity at most $$2^{-n}$$, and then it will make an in­finite amount of money if the sen­tences are si­mul­ta­neously true.

The way to do this is to choose each from a list of all sen­tences. If at any point you no­tice that has too high a prob­a­bil­ity, then pick a new sen­tence for . We can sell all the con­junc­tions for and get back the origi­nal amount payed by hy­poth­e­sis. Then, if we can keep us­ing sharper con­tin­u­ous tests of the prob­a­bil­ities of the sen­tences over time, we will set­tle down to a se­quence with the de­sired prop­erty.

In or­der to turn this sketch into a proof, we need to be more care­ful about how these things are to be made con­tin­u­ous, but it seems rou­tine.

• I at first didn’t un­der­stand your ar­gu­ment for claim (2), so I wrote an al­ter­nate proof that’s a bit more ob­vi­ous/​care­ful. I now see why it works, but I’ll give my ver­sion be­low for any­one in­ter­ested. In any case, what you re­ally mean is the prob­a­bil­ity of de­cid­ing a sen­tence out­side of by hav­ing it an­nounced by na­ture; there may be a high prob­a­bil­ity of sen­tences be­ing de­cided in­di­rectly via sen­tences in .

In­stead of choos­ing as you de­scribe, pick so that the prob­a­bil­ity of sam­pling some­thing in is greater than . Then, the prob­a­bil­ity of sam­pling some­thing in is at least . Hence, no mat­ter what sen­tences have been de­cided already, the prob­a­bil­ity that re­peat­edly sam­pling from se­lects be­fore it se­lects any sen­tence out­side of is at least

as de­sired.

Fur­ther­more, this ar­gu­ment makes it clear that the prob­a­bil­ity dis­tri­bu­tion we con­verge to de­pends only on the set of sen­tences which the en­vi­ron­ment will even­tu­ally as­sert, not on their or­der­ing!

Oh, I didn’t no­tice that as­pect of things. That’s pretty cool.

• A few thoughts:

I agree that the LI crite­rion is “poin­t­wise” in the way that you de­scribe, but I think that this is both pretty good and as much as could ac­tu­ally be asked. A sin­gle effi­ciently com­putable trader can do a lot. It can en­force co­her­ence on a polyno­mi­ally grow­ing set of sen­tences, search for proofs us­ing many differ­ent proof strate­gies, en­force a polyno­mi­ally grow­ing set of statis­ti­cal pat­terns, en­force re­flec­tion prop­er­ties on a polyno­mi­ally large set of sen­tences, etc. So, even­tu­ally the mar­ket will not be ex­ploitable on all these things si­mul­ta­neously, which seems like a pretty good level of ac­cu­rate be­liefs to have.

On the other side of things, it would be far to strong to ask for a uniform bound of the form “for ev­ery , there is some day such that af­ter step , no trader can mul­ti­ply its wealth by a fac­tor more than “. This is be­cause a trader can be hard­coded with ar­bi­trar­ily many hard-to-com­pute facts. For ev­ery , there must even­tu­ally be a day on which the be­lief of your log­i­cal in­duc­tor as­sign prob­a­bil­ity less than to some true state­ment, at which point a trader who has that state­ment hard­coded can mul­ti­ply its wealth by . (I can give a con­struc­tion of such a sen­tence us­ing self-refer­ence if you want, but it’s also in­tu­itively nat­u­ral—just pick many mu­tu­ally ex­clu­sive state­ments with noth­ing to break the sym­me­try.)

Thus, I wouldn’t think of traders as “mis­takes”, as you do in the post. A trader can gain money on the mar­ket if the mar­ket doesn’t already know all facts that will be listed by the de­duc­tive pro­cess, but that is a very high bar. Do­ing well against finitely many traders is already “pretty good”.

What you can ask for re­gard­ing unifor­mity is for some sim­ple func­tion such that any trader can mul­ti­ply its wealth by at most a fac­tor . This is ba­si­cally the idea of the mis­take bound model in learn­ing the­ory; you bound how many mis­takes hap­pen rather than when they hap­pen. This would let you say a more than the one-trader prop­er­ties I men­tioned in my first para­graph. In fact, has this prop­ery; is just the ini­tial wealth of the trader. You may there­fore want to do some­thing like set­ting traders’ ini­tial wealths ac­cord­ing to some mea­sure of com­plex­ity. Ad­mit­tedly this isn’t made ex­plicit in the pa­per, but there’s not much ad­di­tional that needs to be done to think in this way; it’s just the com­bi­na­tion of the in­di­vi­d­ual proofs in the pa­per with the ex­plicit bounds you get from the ini­tial wealths of the traders in­volved.

I ba­si­cally agree com­pletely on your last few points. The traders are a model class, not an en­sem­ble method in any sub­stan­tive way, and it is just con­fus­ing to con­nect them to the pa­pers on en­sem­ble meth­ods that the LI pa­per refer­ences. Also, while I use the idea of log­i­cal in­duc­tion to do re­search that I hope will be rele­vant to prac­ti­cal al­gorithms, it seems un­likely than any prac­ti­cal al­gorithm will look much like a LI. For one thing, find­ing fixed points is re­ally hard with­out some prop­erty stronger than con­ti­nu­ity, and you’d need a pretty good rea­son to put it in the in­ner loop of any­thing.

• Univer­sal Pre­dic­tion of Selected Bits solves the re­lated ques­tion of what hap­pens if the odd bits are ad­ver­sar­ial but the even bits copy the pre­ced­ing odd bits. Ba­si­cally, the uni­ver­sal semimea­sure learns to do the right thing, but the ex­act sense in which the re­sult is pos­i­tive is sub­tle and has to do with the differ­ence be­tween mea­sures and semimea­sures. The meth­ods may also be rele­vant to the ques­tions here, though I don’t see a proof for ei­ther ques­tion yet.

• Yeah, I like tail de­pen­dence.

There’s this ques­tion of whether for log­i­cal un­cer­tainty we should think of it more as try­ing to “un-up­date” from a more log­i­cally in­formed per­spec­tive rather than try­ing to use some log­i­cal prior that ex­ists at the be­gin­ning of time. Maybe you’ve heard such ideas from Scott? I’m not sure if that’s the right per­spec­tive, but it’s what I’m al­lud­ing to when I say you’re in­tro­duc­ing ar­tifi­cial log­i­cal un­cer­tainty.

• Yeah, the 5 and 10 prob­lem in the post ac­tu­ally can be ad­dressed us­ing prov­abil­ity ideas, in a way that fits in pretty nat­u­ally with log­i­cal in­duc­tion. The mo­ti­va­tion here is to work with de­ci­sion prob­lems where you can’t prove state­ments for agent , util­ity func­tion , ac­tion , and util­ity value , at least not with the amount of com­put­ing power pro­vided, but you want to use in­duc­tive gen­er­al­iza­tions in­stead. That isn’t nec­es­sary in this ex­am­ple, so it’s more of an illus­tra­tion.

To say a bit more, if you make log­i­cal in­duc­tors propo­si­tion­ally con­sis­tent, similarly to what is done in this post, and make them as­sign things that have been proven already prob­a­bil­ity 1, then they will work on the 5 and 10 prob­lem in the post.

It would be in­ter­est­ing if there was more of an anal­ogy to ex­plore be­tween the prov­abil­ity or­a­cle set­ting and the in­duc­tive set­ting, and more ideas could be car­ried over from modal UDT, but it seems to me that this is a differ­ent kind of prob­lem that will re­quire new ideas.

• It’s hard to an­a­lyze the dy­nam­ics of log­i­cal in­duc­tors too pre­cisely, so we of­ten have to do things that feel like worst-case anal­y­sis, like con­sid­er­ing an ad­ver­sar­ial trader with suffi­cient wealth. I think that prob­lems that show up from this sort of anal­y­sis can be ex­pected to cor­re­spond to real prob­lems in su­per­in­tel­li­gent agents, but that is a difficult ques­tion. The ma­lig­nancy of the uni­ver­sal prior is part of the rea­son.

As to your pro­posed solu­tion, I don’t see how it would work. Scott is propos­ing that the trader makes con­di­tional con­tracts, which are in effect voided if the event that they are con­di­tional on doesn’t hap­pen, so the trader doesn’t ac­tu­ally lose any­thing is this case. (It isn’t dis­cussed in this post, but con­di­tional con­tracts can be built out of the usual sort of bets, with pay­offs given by the defi­ni­tion of con­di­tional prob­a­bil­ity.) So, in or­der to make the trader lose money, the events need to hap­pen some­times, not just be ex­pect to hap­pen with some non­neg­li­gable prob­a­bil­ity by the mar­ket.

• In coun­ter­fac­tual mug­ging with a log­i­cal coin, AsDT always uses a log­i­cal in­duc­tor’s best-es­ti­mate of the util­ity it would get right now, so it sees the coin as already de­ter­mined, and sees no benefit from giv­ing Omega money in the cases where Omega asks for money.

The way I would think about what’s go­ing on is that if the coin is already known at the time that the ex­pec­ta­tions are eval­u­ated, then the prob­lem isn’t con­ver­gent in the sense of AsDT. The agent that pays up when­ever asked has a con­stant ac­tion, but it doesn’t re­ceive a con­stant ex­pected util­ity. You can think of the av­er­ag­ing as in­tro­duc­ing ar­tifi­cial log­i­cal un­cer­tainty to make more things con­ver­gent, which is why it’s more up­date­less. (My un­der­stand­ing is that this is pretty close to how you’re think­ing of it already.)

• This isn’t too re­lated to your main point, but ev­ery or­dered field can be em­bed­ded into a field of Hahn se­ries, which might be sim­pler to work with than sur­re­als.

That page dis­cusses the ba­sics of Hahn se­ries, but not the em­bed­ding the­o­rem. (Ehrlich, 1995) treats things in de­tail, but is long and in­tro­duces a lot of defi­ni­tions. The em­bed­ding the­o­rem is stated on page 23 (24 in the pdf).

# Log­i­cal in­duc­tor limits are dense un­der poin­t­wise convergence

6 Oct 2016 8:07 UTC
4 points
• Nicely done. I should have spent more time think­ing about liar sen­tences; you re­ally can do a lot with them.

Fol­low-up ques­tion—is the set of limits of log­i­cal in­duc­tors con­vex? (Your proof also makes me cu­ri­ous as to whether the set of “ex­panded limits” is con­vex, though that ques­tion is a bit less nice than the other).

Sam Eisen­stat: In the origi­nal coun­terex­am­ple to the Trol­l­jec­ture, I guess you’re propos­ing that be log­i­cally prior to , so we can go back and re­place with , but still have to use in or­der to de­rive ? Here I’m just us­ing “” to mean “coun­ter­fac­tu­ally re­sults in” with­out mean­ing to im­ply that this should be defined modally.

I agree that this is in­tu­itively sen­si­ble, but it is difficult to give a gen­eral defi­ni­tion that agrees with in­tu­ition—it’s hard to even have an in­tu­ition in the gen­eral case. There has been some work along these lines though; see for ex­am­ple https://​agent­foun­da­tions.org/​item?id=241 though you may already know about this since you’ve been talk­ing with Scott. I’m pes­simistic about this di­rec­tion in gen­eral, though I’d be happy to talk fur­ther about it.

Jack Gal­lagher: What’s the gen­er­a­tor be­hind your be­ing pes­simistic?

Sam Eisen­stat: Well, like Daniel sug­gests, we can ask whether the or­der­ing is sub­jec­tive/​based on an agents epistemic state. Un­like Daniel, I think there are prob­lems with this. The best ex­am­ples are things like trans­par­ent New­comb’s prob­lem, where the agent needs to coun­ter­fact on things that it already knows the fact of the mat­ter about. You can try to find loop­holes in this, e.g. maybe the agent is do­ing some­thing wrong if it isn’t ig­no­rant of cer­tain facts, but I haven’t been able to find some­thing that’s satis­fy­ing (e.g. doesn’t fall to other thought ex­per­i­ments).

It thus seems that an agent should just coun­ter­fact on things even though it already knows them to be true, but this is a weird sort of thing to call “epistemic”. It doesn’t just de­pend on the agent’s prob­a­bil­ity as­sign­ments to var­i­ous state­ments. Thus, to go farther here would seem to re­quire a richer idea of “agent” than we presently have.

The al­ter­na­tive is an ob­jec­tive tem­po­ral or­der­ing on log­i­cal state­ments. (This is a ter­rible use of the ex­cluded mid­dle, since “sub­jec­tive” and “ob­jec­tive” are smug­gling in a lot, so the place to look is prob­a­bly some­where that feels like a third op­tion.) Any­way, it seems ab­surd for there to be a fact of the mat­ter as to whether 417*596 = 248532 comes tem­po­rally be­fore or af­ter 251*622 = 156122.

Part of this is just a feel­ing of ar­bi­trari­ness be­cause even if one of those did come be­fore the other, we don’t have to tools to know which one. Still, it seems the tools we do have are the wrong shape for build­ing some­thing that would work. For ex­am­ple, there seems to be an af­finity be­tween the ideas of be­ing “easy”, of de­pend­ing on few things, but be­ing de­pended on by many, and of be­ing tem­po­rally “early”. Pa­trick’s re­vived trol­l­jec­ture is one at­tempt at for­mal­iz­ing it, but there was noth­ing deep about my coun­terex­am­ple, it’s just easy to gen­er­ate state­ments that are difficult to eval­u­ate. (There are a differ­ent ver­sions of this, like in­de­pen­dent state­ments in un­de­cid­able the­o­ries, or hard SAT in­stances in propo­si­tional logic.)

I used the word “we” in the last para­graph, but maybe you have type-the­ory tools that are more suited to this?

Daniel Satanove: It’s eas­ier to come up with UDT when you have CDT. Is there a ver­sion of log­i­cal coun­ter­fac­tu­als that works in more in­tu­itive cases, but fail on stranger edge cases like trans­par­ent New­comb’s?

Sam Eisen­stat: The EDT of log­i­cal coun­ter­fac­tu­als is just EDT with a prior over logic (e.g. bounded Dem­ski, some­thing more so­phis­ti­cated). Proof-based UDT is a spe­cial case of this; it pre­scribes ac­tions if any rea­son­able prior would agree, and is silent oth­er­wise.

Un­for­tu­nately, as shown by the trol­l­jec­ture coun­terex­am­ple, there are coun­ter­fac­tu­als that proof-based UDT gets wrong (the trol­l­jec­ture ba­si­cally says that what proof-based UDT does should be read as coun­ter­fac­tu­als), and there­fore that EDT gets wrong given any rea­son­able prior, so it’s hard to define a “do­main of val­idity” for this (not that that’s nec­es­sary for it be in­for­ma­tive).

One can­di­date for the CDT of log­i­cal coun­ter­fac­tu­als would just be putting a causal DAG struc­ture on log­i­cal state­ments, like dis­cussed in Scott’s fo­rum post that I linked above. The TDT pa­per is in­for­mal, and I haven’t looked at it in a while, but I think that you can read it as propos­ing ex­actly this.

I be­lieve that in all thought ex­per­i­ments where peo­ple have an in­tu­ition about what should be done, they are us­ing this sort of rea­son­ing plus strat­egy se­lec­tion (i.e. up­date­less­ness). I would be in­ter­ested in see­ing any thought ex­per­i­ments that peo­ple do not an­a­lyze in this way.

There isn’t much in the di­rec­tion of ac­tu­ally defin­ing such causal net­works. Back when the trol­l­jec­ture looked plau­si­ble we tried to in­ter­pret it in terms of causal net­works, which didn’t quite work (see http://​ep­silonof­doom.blogspot.com/​2015/​08/​prov­abil­ity-coun­ter­fac­tu­als-vs-three.html). I ex­pect some­thing like Pa­trick’s re­vived trol­l­jec­ture to be even harder to view in this way, since it’s in­her­ently ap­prox­i­mate; a no­tion of ap­prox­i­mate causal net­work would be more ap­pro­pri­ate here, if one could be found. I don’t know of any pro­posal that (1) ac­tu­ally defines a causal net­work and (2) works on some easy ex­am­ples.

Well, that’s not en­tirely true. You can just pick any or­der­ing that puts the agents de­ci­sion at the be­gin­ning, and use the re­sult­ing fully-con­nected DAG. This works wher­ever proof-based UDT works, but fails on the trol­l­jec­ture, ex­actly be­cause it as­sumes any­thing that prov­ably fol­lows from your agent’s ac­tion is causally down­stream of it.

Tsvi BT: “One can­di­date for the CDT of log­i­cal coun­ter­fac­tu­als would just be putting a causal DAG struc­ture on log­i­cal state­ments”

How does this in­ter­act with log­i­cal up­date­less­ness? What hap­pens to your DAG as you learn new facts?

Sam Eisen­stat: I’m not sure if it’s ob­vi­ous, but the idea is that each node has a full table of coun­ter­fac­tu­als. An in­stance of this has more data than just an or­der­ing.

It’s not in­tended to be up­date­less; it’s a CDT/​TDT-style the­ory rather than UDT-style. You could layer strat­egy se­lec­tion on top of it (though it could be com­pu­ta­tion­ally difficult if your DAG is hard to work with).

When you learn new facts you con­di­tion the prob­a­bil­ity dis­tri­bu­tion as­so­ci­ated with the DAG on them, like for any Bayesian net­work.

Tsvi BT: So then this is dy­nam­i­cally in­con­sis­tent in the coun­ter­fac­tual mug­ging with a log­i­cal coin.

Sam Eisen­stat: Oh yeah, “layer strat­egy se­lec­tion on top” doesn’t work for log­i­cal un­cer­tainty. Any­way, this was in re­sponse to Daniel’s re­quest for a the­ory that “works in more in­tu­itive cases”; I hope I didn’t give the im­pres­sion that this would be hard to break.

On the other hand, you could try to “just do strat­egy se­lec­tion” with re­spect to a par­tic­u­lar prior if you’re will­ing to treat ev­ery­thing you learn as an ob­ser­va­tion. This is easy if you pick a co­her­ent prior, but it’s un­com­putable and it doesn’t let you pay coun­ter­fac­tual mug­gers with log­i­cal coins. Abram’s GLS stuff tries to do this with pri­ors that don’t already know all log­i­cal facts. What do you think of that sort of ap­proach?

Tsvi BT: I don’t have much more con­crete to say. I cur­rently view de­ci­sion the­ory as be­ing the ques­tions of how to do log­i­cal up­date­less­ness, and how to do strat­egy se­lec­tion with a bounded rea­soner. The GLS prior might be close, but there are ba­sic open ques­tions about how it works. “Strate­gic un­cer­tainty” (or some­thing) may or may not just boil down to log­i­cal un­cer­tainty; the rea­son it might not is that you might ac­tu­ally have to define an op­ti­miza­tion tar­get other than “what Ideal Me thinks is the best strat­egy”, to take into ac­count that ac­tual ver­sions of you are bounded in differ­ent ways. Th­ese two “com­po­nents” (I wildly spec­u­late) could com­bine to give a re­flec­tively sta­ble de­ci­sion the­ory.

Daniel Satanove: Proof lengths prob­a­bly aren’t the only way of do­ing the time step thing. Take the proof length defi­ni­tion of ob­ser­va­tion at time k, ex­cept de­lay all proofs of state­ments of the form a + b = c by one time step. This, or some­thing sort of like this, will prob­a­bly also work. Also any strictly in­creas­ing func­tion on ob­ser­va­tion time should work.

The ob­ser­va­tion at time k should be an epistemic thing that de­scribes how an agent learns of new the­o­rems, rather than some fun­da­men­tal prop­erty of math.

Jack Gal­lagher: What do you mean by “an epistemic thing”?

Daniel Satanove: “In or­der to break the cy­cles, we want some no­tion of log­i­cal time from which in­fluence can only prop­a­gate for­ward. Proof length seems, at least to me, to be the most (only?) nat­u­ral way to pull some­thing like that off. A proof of length k can be thought of as a log­i­cal ob­ser­va­tion at time k.”

You are re­strict­ing your­self to agents who do a breadth first search through proofs rather than any­thing that can do a more di­rected ap­proach. Say­ing that you want some no­tion of “log­i­cal time” kind of sounds like you want some­thing that is a prop­erty of the math rather than a prop­erty of the agent.

• [EDIT: This all deals with mea­sures, not semimea­sures; see be­low.]

For to dom­i­nate in the old sense means that its Bayes score can only be some con­stant worse than , no worse, re­gard­less of en­vi­ron­ment. My defi­ni­tion here im­plies that, but I think it’s a stric­ter re­quire­ment.

Your defi­ni­tion is equiv­a­lent to the stan­dard defi­ni­tion. Li and Vitányi say that dom­i­nates iff there is some such that for any bi­nary string , we have . Li and Vitányi’s “prob­a­bil­ity dis­tri­bu­tions” take a bi­nary string as in­put while the prob­a­bil­ity dis­tri­bu­tions we are us­ing ask for an el­e­ment of a -alge­bra, but we can abuse no­ta­tion by al­low­ing a bi­nary string to rep­re­sent the event of that the ran­dom se­quence starts with this string as a pre­fix.

The­o­rem. For any prob­a­bil­ity dis­tri­bu­tions on Can­tor space and any , the con­di­tion holds for all bi­nary strings iff it holds for all .

Proof. The di­rec­tion is im­me­di­ate. For , sup­pose that for all bi­nary string events . We first show that this prop­erty holds for ev­ery event in the alge­bra gen­er­ated by the bi­nary string events. We can write any as a dis­joint union of bi­nary string events . Then, as de­sired.

We can ex­tend this to the -alge­bra gen­er­ated by , which is just the Borel -alge­bra on Can­tor space, by the mono­tone class the­o­rem. I’m not sure how much mea­sure the­ory you know, but you can think of this as a form of in­duc­tion on mea­surable sets; if a prop­erty holds on an alge­bra of sets and it is pre­served by countable mono­tone unions and in­ter­sec­tions, then it holds on ev­ery set in the -alge­bra gen­er­ated by . For countable mono­tone unions, if we have , then We can do the same thing for countable mono­tone in­ter­sec­tions ;

EDIT: I’m talk­ing about prob­a­bil­ity dis­tri­bu­tions rather than semimea­sures. I don’t know how your defi­ni­tion of dom­i­nance is helpful for semimea­sures though. The uni­ver­sal semimea­sure is defined on bi­nary se­quences, and I think that ques­tion of whether dom­i­nates de­pends on how you ex­tend it to the whole -alge­bra. I ex­pect that many rea­son­able ex­ten­sions of would satisfy your The­o­rem 1, but this post (in par­tic­u­lar the proof of The­o­rem 1) doesn’t seem to choose a spe­cific such ex­ten­sion.

• Haskell doesn’t ac­tu­ally let you do this as far as I can tell, but the nat­u­ral com­pu­ta­tional way to im­ple­ment a func­tion is with a case ex­pres­sion with no cases. This is sen­si­ble be­cause has no con­struc­tors, so it should be pos­si­ble to pat­tern match it with such an ex­pres­sion. Another way of think­ing of this is that we can use pat­tern match­ing on sum types and is just the -ary sum type.

• Con­di­tion 4 in your the­o­rem co­in­cides with Lewis’ ac­count of coun­ter­fac­tu­als. Pearl cites Lewis, but he also crit­i­cizes him on the ground that the or­der­ing on wor­lds is too ar­bi­trary. In the lan­guage of this post, he is say­ing that con­di­tion 2 arises nat­u­rally from the struc­ture of the prob­lem and that con­di­tion 4 is de­rives from the deeper struc­ture cor­re­spond­ing to con­di­tion 2.

I also no­ticed that the func­tion and the par­tial or­der can be read as “time of first di­ver­gence from the real world” and “first di­verges be­fore”, re­spec­tively. This makes the the­o­rem a lot more in­tu­itive.