# Fun With DAGs

Cross­posted here.

Directed acyclic graphs (DAGs) turn out to be re­ally fun­da­men­tal to our world. This post is just a bunch of neat stuff you can do with them.

#### Utility Functions

Sup­pose I pre­fer chicken over steak, steak over pork, and pork over chicken. We can rep­re­sent my prefer­ences with a di­rected graph like this:

This poses a prob­lem. If I’m hold­ing a steak, then I’ll pay a penny to up­grade it to chicken — since I pre­fer chicken over steak. But then I’ll pay an­other penny to up­grade my chicken to pork, and an­other to up­grade it to steak, and then we’re back to the be­gin­ning and I’ll pay to up­grade the steak to chicken again! Some­one can stand there switch­ing out chicken/​pork/​steak with me, and col­lect my money all day.

Any time our prefer­ence graph con­tains cy­cles, we risk this sort of “money-pump”. So, to avoid be­ing money-pumped, let’s make our prefer­ences not con­tain any cy­cles:

Now our prefer­ences are di­rected (the ar­rows), acyclic (no cy­cles), and a graph (ar­row-and-box draw­ing above). It’s a DAG!

No­tice that we can now sort the DAG nodes, spa­tially, so that each node only points to higher-up nodes:

Read­ing off the nodes from bot­tom to top, the or­der is: chicken, pork, steak.

This is called a topolog­i­cal sort, or topo sort for short. We can always topo sort a DAG — there is always some or­der of the nodes such that each node only points to nodes af­ter it in the sort or­der.

In this case, the topo sort or­ders our prefer­ences. We pre­fer pork over ev­ery­thing which comes be­fore pork in the topo sort, and so forth. We could even num­ber the items ac­cord­ing to their topo-sorted or­der: 1 for chicken, 2 for pork, 3 for steak. We pre­fer whichever thing has the higher num­ber, i.e. the higher po­si­tion in the topo-sort or­der.

This is called a util­ity func­tion. This util­ity func­tion U is defined by U(chicken) = 1, U(pork) = 2, U(steak) = 3. We pre­fer things with higher util­ity — in other words, we want to max­i­mize the util­ity func­tion!

To sum­ma­rize: if our prefer­ences do con­tain cy­cles, we can be money-pumped. If they don’t, then our prefer­ences form a DAG, so we can topo sort to find a util­ity func­tion.

#### Circuits

Here’s a re­ally sim­ple cir­cuit to com­pute f = (x+y)*(x-y):

No­tice that the cir­cuit is a DAG. In this case, the topo sort tells us the or­der in which things are com­puted: “+” and “-” both come be­fore “x” (the mul­ti­pli­ca­tion, not the in­put). If the graph con­tained any cy­cles, then we wouldn’t know how to eval­u­ate it — if the value of some node changed as we went around a cy­cle, we might get stuck up­dat­ing in cir­cles in­definitely!

It’s the DAG struc­ture that makes a cir­cuit nice to work with: we can eval­u­ate things in or­der, and we only need to eval­u­ate each node once.

#### Dy­namic Programming

Here’s a clas­sic al­gorithms prob­lem:

Sup­pose we want to com­pute the num­ber of paths from cor­ner A to cor­ner B, trav­el­ling only down and right along the lines. (I’m about to give a solu­tion, so pause to think if you don’t want to be spoilered.)

Our main trick is that the num­ber of paths to B is the num­ber of paths to C plus the num­ber of paths to D. Like­wise, for any other node, the num­ber of paths to the node is the sum of the num­bers above and to the left. So we can turn the pic­ture above into a cir­cuit, by stick­ing “+” op­er­a­tions at each node and filling in the ar­row­heads:

I’ve omit­ted all the tiny “+” signs, but each node in this cir­cuit sums the val­ues from the in­com­ing ar­rows. Start with 1 at cor­ner A (there is only one path from A to it­self), and the cir­cuit will out­put the num­ber of paths from A to B at cor­ner B.

Why is this in­ter­est­ing? Well, con­sider cod­ing this up. We can make a sim­ple re­cur­rent func­tion:

``````f(row, col):
if row == 0 or col == 0:
re­turn 1
re­turn f(row-1, col) + f(row, col-1)
``````

… but this func­tion is ex­tremely in­effi­cient. In effect, it starts at B, then works back­ward to­ward A, ex­plor­ing ev­ery pos­si­ble path through the grid and adding them all up. The to­tal run­time will be ex­po­nen­tial in the size of the grid.

How­ever, note that there are not that many com­bi­na­tions of (row, col) at which f will be eval­u­ated — in­deed, there’s only one (row, col) com­bi­na­tion for each node in the grid. The in­effi­ciency is be­cause our sim­ple re­cur­rent func­tion calls f(row, col) mul­ti­ple times for each node, and runs the full com­pu­ta­tion each time. In­stead, we could just com­pute f for each node once.

How can we do that? We already did! We just need to treat the prob­lem as a cir­cuit. Re­mem­ber, when eval­u­at­ing a cir­cuit, we work in topo-sorted or­der. In this case, that means start­ing from A, which means start­ing from f(0,0). Then we store that value some­where, and move on — work­ing along the top f(0, col) and the side f(row, 0). In gen­eral, we work from up­per left to lower right, fol­low­ing topo sort or­der, and stor­ing re­sults as we go. Rather than us­ing func­tion calls, as in the re­cur­sive for­mu­la­tion, we just lookup stored val­ues. As long as we fol­low topo sort or­der, each value we need will be stored be­fore we need it.

This is “dy­namic pro­gram­ming”, or DP: tak­ing a re­cur­sive func­tion, and turn­ing it into a cir­cuit. Tra­di­tion­ally, DP is taught with ta­bles, but I find this deeply con­fus­ing — the key to DP is that it isn’t re­ally about ta­bles, it’s about DAGs. We take the com­pu­ta­tional DAG of some re­cur­sive func­tion, and turn it into a cir­cuit. By eval­u­at­ing in topo-sort or­der, we avoid re-eval­u­at­ing the func­tion at the same node twice.

#### Tur­ing-Com­putable Func­tions: Cir­cuits + Symmetry

To turn cir­cuits into fully gen­eral Tur­ing-com­putable func­tions (i.e. any­thing which can be writ­ten in a pro­gram­ming lan­guage), we just need to al­low re­cur­sion.

Here’s a re­cur­sive fac­to­rial func­tion:

``````f(n):
if n == 0:
re­turn 1
re­turn n * f(n-1)
``````

We can rep­re­sent this as an in­finite cir­cuit:

Each of the dashed boxes cor­re­sponds to a copy of the func­tion f. The in­finite lad­der han­dles the re­cur­sion — when f calls it­self, we move down into an­other box. We can view this as a sym­me­try re­la­tion: the full cir­cuit is equiv­a­lent to the top box, plus a copy of the full cir­cuit pasted right be­low the top box.

Of course, if we ac­tu­ally run the code for f, it won’t run for­ever — we won’t eval­u­ate the whole in­finite cir­cuit! Be­cause of the con­di­tional “n == 0?”, the be­hav­ior of the cir­cuit be­low some box is ir­rele­vant to the fi­nal value, so we don’t need to eval­u­ate the whole thing. But we will eval­u­ate some sub­set of the cir­cuit. For n = 2, we would eval­u­ate this part of the cir­cuit:

This is the “com­pu­ta­tional DAG” for f(2). While the code for f defines the func­tion, the com­pu­ta­tional DAG shows the com­pu­ta­tion — which steps are ac­tu­ally performed, in what or­der, with what de­pen­den­cies.

#### Parallelization

The com­pu­ta­tional DAG forms the ba­sis for par­alleliza­tion: speed­ing up an al­gorithm by run­ning on mul­ti­ple cores at once.

Con­sider our sim­ple cir­cuit from ear­lier:

Where can we perform steps in par­allel? Well, we can eval­u­ate the “-” and “+” at the same time. But we can’t perform the “x” un­til af­ter both the “-” and “+” are done: the “x” re­quires the re­sults from those two steps.

More gen­er­ally, look at the topo sort or­der of the cir­cuit. The topo sort is not unique — “+” could come be­fore or af­ter “-”, since there’s no di­rected path be­tween them. That means they can be performed in par­allel: nei­ther re­quires the re­sult from the other. On the other hand, “x” comes strictly later in the topo sort or­der, be­cause it does re­quire the other val­ues as in­put.

For a more com­pli­cated func­tion, we’d have to think about the com­pu­ta­tional DAG. When­ever one node comes strictly af­ter an­other in the com­pu­ta­tional DAG, those two nodes can­not be par­allelized. But as long as nei­ther node comes strictly af­ter the other, we can par­allelize them. For in­stance, in our DP ex­am­ple above, the points C and D in the grid could be eval­u­ated in par­allel.

This sort of think­ing isn’t re­stricted to com­puter sci­ence. Sup­pose we have a com­pany pro­duc­ing cus­tom mail-or­der wid­gets, and ev­ery time an or­der comes in, there’s a bunch of steps to crank out the wid­get:

Some steps de­pend on oth­ers, but some don’t. We can con­firm the ad­dress and print the la­bel in par­allel to pro­duc­ing the wid­get it­self, in or­der to mail it out sooner. A lot of op­ti­miza­tion in real-world com­pany pro­cesses looks like this.

#### Causality

Here’s a clas­sic ex­am­ple from Pearl:

The sea­son could cause rain, or it could cause you to run the sprin­kler, but the sea­son will not di­rectly cause the side­walk to be wet. The side­walk will be wet in some sea­sons more than oth­ers, but that goes away once you con­trol for rain and sprin­kler us­age. Similarly, rain can cause the side­walk to be wet, which causes it to be slip­pery. But if some­thing keeps the side­walk dry — a cov­er­ing, for in­stance — then it won’t be slip­pery no mat­ter how much it rains; there­fore rain does not di­rectly cause slip­per­i­ness.

Th­ese are the sort of com­mon-sense con­clu­sions we can en­code in a causal DAG. The DAG’s ar­rows show di­rect cause-effect re­la­tions, and paths of ar­rows show in­di­rect cause and effect.

A few of the neat things we can do with a causal DAG:

• Math­e­mat­i­cally rea­son about how ex­ter­nal changes (e.g. cov­er­ing a side­walk) will im­pact cause-effect systems

• Ac­count for causal­ity when up­dat­ing be­liefs. For in­stance, a wet side­walk makes rain more likely, but not if we know the sprin­kler ran.

• Find a com­pact rep­re­sen­ta­tion for the prob­a­bil­ity dis­tri­bu­tion of the vari­ables in the DAG

• Statis­ti­cally test whether some data is com­pat­i­ble with a par­tic­u­lar causal DAG

Most of these are out­side the scope of this post, but Highly Ad­vanced Episte­mol­ogy 101 has more.

The DAG struc­ture of causal­ity is the main rea­son for the ubiquity of DAGs: causal­ity is ev­ery­where, so DAGs are ev­ery­where. Cir­cuits are re­ally just cause-effect speci­fi­ca­tions for calcu­la­tions. Not util­ity, though — that one’s kind of an out­lier.

• This is called a util­ity func­tion. This util­ity func­tion U is defined by U(chicken) = 1, U(pork) = 2, U(steak) = 3. We pre­fer things with higher util­ity — in other words, we want to max­i­mize the util­ity func­tion!

Just be­cause you have a DAG of your prefer­ences, does not mean that you have a util­ity func­tion. There are many util­ity func­tions that are con­sis­tent with any par­tic­u­lar DAG, so if all you have is a DAG then your util­ity func­tion is un­der-speci­fied.

First, maybe you don’t care about the differ­ence be­tween pork and steak, and re­move that ar­row from your DAG. In that case, the toposort is go­ing to pro­duce ei­ther U(chicken) = 1, U(pork) = 2, U(steak) = 3 or U(chicken) = 1, U(steak) = 2, U(pork) = 3, ar­bi­trar­ily. To­posort is guaran­teed to pro­duce an or­der­ing that matches the DAG, but there might be sev­eral of them. (A slight var­i­ant on toposort could pro­duce U(chicken) = 1, U(pork) = 2, U(steak) = 2 in­stead, but that’s still a bit ques­tion­able be­cause it’s con­flat­ing in­com­pa­ra­bil­ity with equal­ity.)

Se­cond, the mag­ni­tude of these num­bers is pretty ar­bi­trary in a toposort (they’re just se­quen­tial in­te­gers), but very im­por­tant in a util­ity func­tion. For ex­am­ple, maybe you re­ally like steak, so that the more ap­pro­pri­ate util­ity func­tion is U(chicken) = 1, U(pork) = 2, U(steak) = 10. You’re not go­ing to get this from a toposort.

I just wanted to clar­ify this, be­cause the post could be read as say­ing DAG==util­ity-func­tion.

• Roughly cor­rect. The miss­ing piece is com­plete­ness: for the DAG to uniquely define a util­ity func­tion, we have to have an edge be­tween each pair of nodes. Then the ar­gu­ment works.

The rel­a­tive mag­ni­tude of the num­bers mat­ters only in non­de­ter­minis­tic sce­nar­ios, where we take an ex­pec­ta­tion over pos­si­ble out­comes. If we re­strict our­selves to de­ter­minis­tic situ­a­tions, then any mono­tonic trans­for­ma­tion of the util­ity func­tion pro­duces the ex­act same prefer­ences. In that case, the toposort num­bers are fine.

• And if we as­sume that the nodes are whole wor­lds, rather than pieces of wor­lds.

For ex­am­ple, if I’m also or­der­ing a soda, and pre­fer Pepsi to Coke, then the rel­a­tive mag­ni­tudes be­come im­por­tant. (There’s an im­plicit as­sump­tion here that the util­ity of the whole is the sum of the util­ities of the parts.) How­ever, if the node in­cludes the en­tire meal, so that there are six nodes (chicken, pepsi), (chicken, coke), (pork, pepsi), (pork, coke), (steak, pepsi), (steak, coke), then the mag­ni­tude doesn’t mat­ter. Are util­ity func­tions gen­er­ally as­sumed to be whole-world like this?

• Yes, al­though I would word it as “the nodes in­clude ev­ery­thing rele­vant to our im­plied prefer­ences”, rather than “whole wor­lds”, just to be clear what we’re talk­ing about. Cer­tainly the en­tire no­tion of adding to­gether two util­ities is some­thing which re­quires ad­di­tional struc­ture.

• How­ever, if the node in­cludes the en­tire meal, so that there are six nodes (chicken, pepsi), (chicken, coke), (pork, pepsi), (pork, coke), (steak, pepsi), (steak, coke), then the mag­ni­tude doesn’t mat­ter.

I don’t think this is right; you still want to be able to de­cide be­tween ac­tions which might have prob­a­bil­is­tic “out­comes” (given that your ac­tion is nec­es­sar­ily be­ing taken un­der in­com­plete in­for­ma­tion about its ex­act re­sults).

You could define a con­tin­u­ous DAG over prob­a­bil­ity dis­tri­bu­tions, but that struc­ture is ac­tu­ally too gen­eral; you do want to be able to rely on lin­ear ad­di­tivity if you’re us­ing util­i­tar­i­anism (rather than some other con­se­quen­tial­ism that cares about the whole dis­tri­bu­tion in some non­lin­ear way).

Of course, once you have your func­tion from wor­lds to util­ities, you can con­struct the or­der­ing be­tween nodes of {100% to be X | out­comes X}, but that trans­for­ma­tion is lossy (and you don’t need the full gen­er­al­ity of a DAG, since you’re just go­ing to end up with a lin­ear or­der­ing.

(For mod­el­ing in­com­plete prefer­ences, DAGs are great! Not so great for util­ity func­tions.)

• I want to throw some cold wa­ter on this no­tion be­cause it’s dan­ger­ously ap­peal­ing. When I was do­ing my PhD in graph the­ory I had a similar feel­ing that graphs were ev­ery­thing, but this is a more gen­eral prop­erty of math­e­mat­ics. Graphs are ap­peal­ing to a cer­tain kind of thinker, but there is noth­ing so spe­cial about them be­yond their effi­cacy at mod­el­ing cer­tain things and many iso­mor­phic mod­els are pos­si­ble. In par­tic­u­lar they ad­mit helpful vi­su­al­iza­tions but they are ul­ti­mately no more pow­er­ful (or any less pow­er­ful!) than many equiv­a­lent math­e­mat­i­cal mod­els. I just worry from the tone of your post you might be over­valu­ing graphs so I’d like to pass down my wis­dom that they are valuable but not es­pe­cially valuable in gen­eral.

• Oh lord no, this is just a bunch of ran­dom ap­pli­ca­tions. Though I’ve also seen the failure mode you de­scribe, so good warn­ing.

Con­vex op­ti­miza­tion, though. That’s ev­ery­thing.

• Causal DAGs are used in statis­tics for causal anal­y­sis. Also, widely mi­sused. When real causal­ity isn’t acyclic and real de­pen­dence is highly non­lin­ear, in­fer­ence based on the as­sump­tion that there ex­ists some (quasi-lin­ear) causal DAG can go very very wrong. It may be closer to be­ing true than just as­sum­ing that the most com­pli­cated struc­ture is bi­vari­ate in­ter­ac­tion (Z de­pends on the com­bi­na­tion of X and Y), but it is also a lot more dan­ger­ous.

• This is called a util­ity func­tion. This util­ity func­tion U is defined by U(chicken) = 1, U(pork) = 2, U(steak) = 3. We pre­fer things with higher util­ity — in other words, we want to max­i­mize the util­ity func­tion!

It seems like you’ve got this back­wards, given the stated prefer­ence or­der­ing, yes? U(chicken) = 3, U(pork) = 2, U(steak) = 1 would make more sense.

• Sup­pose I pre­fer chicken over steak, steak over pork, and pork over chicken. … This poses a prob­lem. If I’m hold­ing a steak, then I’ll pay a penny to up­grade it to chicken — since I pre­fer chicken over steak. But then I’ll pay an­other penny to up­grade my chicken to pork, and an­other to up­grade it to steak, and then we’re back to the be­gin­ning and I’ll pay to up­grade the steak to chicken again!

This is an un­war­ranted con­clu­sion, as it re­lies on ad­di­tional un­stated as­sump­tions.

If I pre­fer chicken over steak, then that plau­si­bly means that, if I am offered a choice be­tween a world where I have chicken, and a world which is iden­ti­cal to the first ex­cept that I in­stead have steak, and no other al­ter­na­tives are available to me, and I have no op­tion to avoid choos­ing—then I will choose chicken. Any less triv­ial con­clu­sions than this re­quire ad­di­tional as­sump­tions.

You can eas­ily see this to be true by con­sid­er­ing the pos­si­bil­ity that some­one will choose chicken over steak if offered a choice of one or the other, but will not pay a penny to “up­grade” from steak to chicken. (It is not nec­es­sary that any ac­tual per­son ex­hibits this pat­tern of be­hav­ior, only that this is a co­her­ent sce­nario—which it clearly is.)

Note that we haven’t even got­ten to the mat­ter of non-tran­si­tivity of prefer­ences, etc.; just the ba­sic logic of “prefers X to Y” → “would pay a penny to up­grade from Y to X” is already flawed.

• I en­courage you to make a full post on this topic. I don’t think I’ve seen one about this be­fore. You could ex­plain what as­sump­tions we’re mak­ing, why the’re un­war­ranted, what as­sump­tions you make, what ex­actly co­her­ence is, etc, in full and proper ar­gu­ments. Be­cause leav­ing com­ments on ran­dom posts that men­tion util­ity is not pro­duc­tive.

• Per­haps. Frankly, I find it hard to see what more there is to say. What I said in the grand­par­ent seems perfectly straight­for­ward to me; I was aiming sim­ply to point it out, to bring it to the at­ten­tion of read­ers. There’s just not much to dis­agree with, in what I said; do you think oth­er­wise? What did I say that wasn’t sim­ply true?

(Note that I said noth­ing about the as­sump­tions in ques­tion be­ing un­war­ranted; it’s just that they’re un­stated—which, if one is claiming to be rea­son­ing in a straight­for­ward way from ba­sic prin­ci­ples, rather un­der­mines the whole en­deavor. As for what as­sump­tions I would make—well, I wouldn’t! Why should I? I am not the one try­ing to demon­strate that be­liefs X in­evitably lead to out­come Y…)

(Re: “what ex­actly co­her­ence is”, I was us­ing the term in the usual way, not in any spe­cific tech­ni­cal sense. Feel free to sub­sti­tute “only that this sce­nario could take place”, or some similar phras­ing, if the word “co­her­ent” both­ers you.)

• I meant a post not just on this, but on all of your prob­lems with prefer­ences and util­ities and VNM ax­ioms. It seems to me that you have many be­liefs about those, and you could at least put them all in one place.

Now, your cur­rent dis­agree­ment seems less about util­ity and more about the use­ful­ness of the prefer­ence model it­self. But I’m not sure what you’re say­ing ex­actly. The case where Alice would choose X over Y, but wouldn’t pay a penny to trade her Y for Bob’s X, is in­deed pos­si­ble, and there are a few ways to model that in prefer­ences. But maybe you’re say­ing that there are agents where the en­tire prefer­ence model breaks down? And that these are “in­tel­li­gent” and “sane” agents that we could ac­tu­ally care about?

Note that I said noth­ing about the as­sump­tions in ques­tion be­ing unwarranted

But surely you be­lieve they are un­war­ranted? Be­cause if the only prob­lem with those as­sump­tions is that they are un­stated, then you’re just be­ing pedan­tic.

• I meant a post not just on this, but on all of your prob­lems with prefer­ences and util­ities and VNM ax­ioms. It seems to me that you have many be­liefs about those, and you could at least put them all in one place.

Hmm… an anal­ogy:

Sup­pose you fre­quented some fo­rum where, on oc­ca­sion, other peo­ple said var­i­ous things like:

“2 + 2 equals 3.7.”

“Ad­ding nega­tive num­bers is im­pos­si­ble.”

“64 is a prime num­ber.”

“Any in­te­ger is di­visi­ble by 3.”

And so on. When­ever you en­coun­tered any such strange, mis­taken state­ment about num­bers/​ar­ith­metic/​etc., you replied with a cor­rec­tion. But one day, an­other com­menter said to you: “Why don’t you make a post about all of your prob­lems with num­bers and ar­ith­metic etc.? It seems to me that you have many be­liefs about those, and you could at least put them all in one place.”

What might you say, to such a com­menter? Per­haps some­thing like:

“Text­books of ar­ith­metic, num­ber the­ory, and so on are easy to find. It would be silly and ab­surd for me to re­ca­pitu­late their con­tents from scratch in a post. I sim­ply cor­rect mis­takes where I see them, which is all that may rea­son­ably be asked.”

Now, your cur­rent dis­agree­ment seems less about util­ity and more about the use­ful­ness of the prefer­ence model it­self. But I’m not sure what you’re say­ing ex­actly. The case where Alice would choose X over Y, but wouldn’t pay a penny to trade her Y for Bob’s X, is in­deed pos­si­ble, and there are a few ways to model that in prefer­ences. But maybe you’re say­ing that there are agents where the en­tire prefer­ence model breaks down? And that these are “in­tel­li­gent” and “sane” agents that we could ac­tu­ally care about?

What I’m say­ing is noth­ing more than what I said. I don’t see what’s con­fus­ing about it. If some­one prefers X to Y, that doesn’t mean that they’ll pay to up­grade from Y to X. Without this as­sump­tion, it is, at least, a good deal harder to con­struct Dutch Book ar­gu­ments. (This is, among other rea­sons, why it’s ac­tu­ally very difficult—in­deed, usu­ally im­pos­si­ble—to money-pump ac­tual peo­ple in the real world, de­spite the ex­treme ubiquity of “ir­ra­tional” (i.e., VNM-non­com­pli­ant) prefer­ences.)

But surely you be­lieve they are un­war­ranted? Be­cause if the only prob­lem with those as­sump­tions is that they are un­stated, then you’re just be­ing pedan­tic.

I dis­agree whole­heart­edly. Un­stated as­sump­tions are poi­son to “from first prin­ci­ples” ar­gu­ments. Whether they’re war­ranted is of en­tirely sec­ondary im­por­tance to the ques­tion of whether they are out in the open, so that they may be ex­am­ined in or­der to de­ter­mine whether they’re war­ranted.

• Why don’t you make a post about all of your prob­lems with num­bers and ar­ith­metic etc.? It seems to me that you have many be­liefs about those, and you could at least put them all in one place.

Yes, even in your anal­ogy this makes sense. There are sev­eral benefits.

• you would then be able to link to this post in­stead of re­peat­ing those same cor­rec­tions over and over again.

• you would be able to mea­sure to what ex­tent the other users on this site ac­tu­ally dis­agree with you. You may find out that you have been straw­man­ning them all along.

• other users would be able to try to build con­struc­tive ar­gu­ments why you are wrong (hope­fully, the pos­si­bil­ity of be­ing wrong has oc­cured to you).

If some­one prefers X to Y, that doesn’t mean that they’ll pay to up­grade from Y to X.

Yes, the state­ment that there ex­ists an agent that would choose X over Y but would not pay to up­grade Y to X, is not con­tro­ver­sial. I’ve already agreed to that. And I don’t see that the OP dis­agrees with it ei­ther. It is, how­ever, true that most peo­ple would up­grade, for many in­stances of X and Y. It is nor­mal to make sim­plify­ing as­sump­tions in such cases, and you’re sup­posed to be able to parse them.

• Yes, even in your anal­ogy this makes sense. There are sev­eral benefits. …

It doesn’t make any sense what­so­ever in the anal­ogy (and, analog­i­cally, in the ac­tual case). If my anal­ogy (and sub­se­quent com­men­tary) has failed to con­vince you of this, I’m not sure what more there is to say.

It is, how­ever, true that most peo­ple would up­grade, for many in­stances of X and Y.

Ci­ta­tion needed. (To fore­stall the ob­vi­ous fol­low-up ques­tion: yes, I ac­tu­ally don’t think the quoted claim is true, on any non-triv­ial read­ing—I’m not merely ask­ing for a cita­tion out of sheer pedantry.)

It is nor­mal to make sim­plify­ing as­sump­tions in such cases

The “sim­plify­ing” as­sump­tions, in this case, are far too strong, and far too sim­plify­ing, to bear be­ing as­sumed with­out com­ment.

• If my anal­ogy (and sub­se­quent com­men­tary) has failed to con­vince you of this, I’m not sure what more there is to say.

Well, you could, for ex­am­ple, ad­dress my bul­let points. Hon­estly, I didn’t see any rea­sons against mak­ing a post yet from you. I’d only count the anal­ogy as a rea­son if it’s meant to im­ply that ev­ery­one in LW is in­sane, which you hope­fully do not be­lieve. Also, I think you’re over­es­ti­mat­ing the time re­quired for a proper post with proper ar­gu­ments, com­pared to the time you put into these com­ments.

Ci­ta­tion needed.

Really? Take X=“a cake” and Y=“a turd”. Would you re­ally not pay to up­grade? Or did you make some un­war­ranted as­sump­tions about X and Y? Yes, when X and Y are very similar, peo­ple will some­times not trade, be­cause trad­ing is a pain in the ass.

• I’d only count the anal­ogy as a rea­son if it’s meant to im­ply that ev­ery­one in LW is in­sane, which you hope­fully do not be­lieve.

Why in­sane? Even in the anal­ogy, no one needs to be in­sane, only wildly mis­taken (which I do in­deed be­lieve that many, maybe most, peo­ple on LW [of those who have an opinion on the sub­ject at all] are, where util­ity func­tions and re­lated top­ics are con­cerned).

That said, I will take your sug­ges­tion to write a post about this un­der ad­vise­ment.

• Why in­sane?

Be­cause sane peo­ple can be rea­soned with. If a sane per­son is wildly mis­taken, and you cor­rect them, in a way that’s not in­sult­ing and in a way that’s use­ful to them (as op­posed to pedantry), they can be quite grate­ful for that, at least some­times.

• Really? Take X=“a cake” and Y=“a turd”. Would you re­ally not pay to up­grade? Or did you make some un­war­ranted as­sump­tions about X and Y? Yes, when X and Y are very similar, peo­ple will some­times not trade, be­cause trad­ing is a pain in the ass.

Fair point. I was, in­deed, mak­ing some un­war­ranted as­sump­tions; your ex­am­ple is, of course, cor­rect and illus­tra­tive.

How­ever, this leaves us with the prob­lem that, when those as­sump­tions (which in­volve, e.g., X and Y both be­ing preferred to some third al­ter­na­tive which might be de­scribed as “neu­tral”, or X and Y both be­ing de­scrib­able as “pos­i­tive value” on some non-prefer­ence-or­der­ing-based view of value, or some­thing along such lines) are re­laxed, we find that this…

most peo­ple would up­grade, for many in­stances of X and Y

… while now clearly true, is no longer a use­ful claim to make. Yes, per­haps most peo­ple would up­grade, for many in­stances of X and Y, but the claim in the OP can only be read as a uni­ver­sal claim—or else it’s vac­u­ous. (Note also that tran­si­tive prefer­ences are quite im­plau­si­ble in the ab­sence of the afore­said as­sump­tions.)

• those as­sump­tions (which in­volve, e.g., X and Y both be­ing preferred to some third al­ter­na­tive which might be de­scribed as “neu­tral”, or X and Y both be­ing de­scrib­able as “pos­i­tive value” on some non-prefer­ence-or­der­ing-based view of value, or some­thing along such lines)

X be­ing “good” and Y be­ing “bad” has noth­ing to do with it (al­though those are the most ob­vi­ous ex­am­ples). E.g. if X=\$200 and Y=\$100, then any­one would also pay to up­grade, when clearly both X and Y are “good” things. Or if X=“flu” and Y=“can­cer”, any­one would up­grade, when both are “bad”.

The only case where peo­ple re­ally wouldn’t up­grade is when X and Y are in some sense very close, e.g. if we have Y < X < Y + “1 penny” + “5 min­utes of my time”.

But I agree, it is in­deed rea­son­able that if some­one has in­tran­si­tive prefer­ences, those prefer­ences are ac­tu­ally very close in this sense and money pump­ing doesn’t work.