Yes definitely. I’ve omitted examples from software and math because there’s no “fuzziness” to it; that kind of abstraction is already better-understood than the more probabilistically-flavored use-cases I’m aiming for. But the theory should still apply to those cases, as the limiting case where probabilities are 0 or 1, so they’re useful as a sanity check.
“it depends on what the distributions are, but there is another simple stat you can computer from the Mi, which combined with their average, gives you all the info you need”
Yes, assuming it’s a maximum entropy distribution (e.g. normal, dirichlet, beta, exponential, geometric, hypergeometric, … basically all the distributions we typically use as fundamental building blocks). If it’s not a maximum entropy distribution, then the relevant information can’t be summarized by a simple statistic; we need to keep around the whole distribution P[X=x | M] for every possible value of x. In the maxent case, the summary statistics are sufficient to compute that distribution, which is why we don’t need to keep around anything else.
Possibly tangential, but I have found that the “try it yourself before studying” method is a very effective way to learn about a problem/field. It also lends a gut-level insight which can be useful for original research later on, even if the original attempt doesn’t yield anything useful.
One example: my freshman year of college, I basically spent the whole month of winter break banging my head against 3-sat, trying to find an efficient algorithm to solve it and also just generally playing with the problem. I knew it was NP-complete, but hadn’t studied related topics in any significant depth. Obviously I did not find any efficient algorithm, but that month was probably the most valuable-per-unit-time I’ve spent in terms of understanding complexity theory. Afterwards, when I properly studied the original NP-completeness proof for 3-sat, reduction proofs, the polynomial hierarchy, etc, it was filled with moments of “oh yeah, I played with something like this, that’s a clever way to apply it”.
Better example: I’ve spent a huge amount of time building models of financial markets, over the years. At one point I noticed some structures had shown up in one model which looked an awful lot like utility functions, so I finally got around to properly studying Arrow & Debreu-style equilibrium models. Sure enough, I had derived most of it already. I even had some pieces which weren’t in the textbooks (pieces especially useful for financial markets). That also naturally lead to reading up on more advanced economic theory (e.g. recursive macro), which I doubt I would have understood nearly as well if I hadn’t been running into the same ideas in the wild already.
No problem, and they are great questions. :)
In this example, both scenarios yield exactly the same actual behavior (assuming we’ve set the parameters appropriately), but the counterfactual behavior differs—and that’s exactly what defines a causal model. In this case, the counterfactuals are “what if we inserted a different resistor?” and “what if we adjusted the knob on the supply?”. If it’s a voltage supply, then the voltage → current model correctly answers the counterfactuals. If it’s a current supply, then the current → voltage model correctly answers the counterfactuals.
Note that all the counterfactual queries in this example are physically grounded—they are properties of the territory, not the map. We can actually go swap the resistor in a circuit and see what happens. It is a mistake here to think of “the territory” as just the resistor by itself; the supply is a critical determinant of the counterfactual behavior, so it needs to be included in order to talk about causality.
Of course, there’s still the question of how we decide which counterfactuals to support. That is mainly a property of the map, so far as I can tell, but there’s a big catch: some sets of counterfactual queries will require keeping around far less information than others. A given territory supports “natural” classes of counterfactual queries, which require relatively little information to yield accurate predictions to the whole query class. In this context, the lumped circuit abstraction is one such example: we keep around just high-level summaries of the electrical properties of each component, and we can answer a whole class of queries about voltage or current measurements. Conversely, if we had a few queries about the readings from a voltage probe, a few queries about the mass of various circuit components, and a few queries about the number of protons in a wire mod 3… these all require completely different information to answer. It’s not a natural class of queries.
So natural classes of queries imply natural abstract models, possibly including natural causal models. There will still be some choice in which queries we care about, and what information is actually available will play a role in that choice (i.e. even if we cared about number of protons mod 3, we have no way to get that information).
I have not yet formulated all this enough to be highly confident, but I think in this case the voltage → current model is a natural abstraction when we have a voltage supply, and vice versa for a current supply. The “correct” model, in each case, can correctly predict behavior of the resistor and knob counterfactuals (among others), without any additional information. The “incorrect” model cannot. (I could be missing some other class of counterfactuals which are easily answered by the “incorrect” models without additional information, which is the main reason I’m not entirely sure of the conclusion.)
Thanks for bringing up this question and example, it’s been useful to talk through and I’ll likely re-use it later.
Something about the “science is fragile” argument feels off to me. Perhaps it’s that I’m not really thinking about RCTs; I’m looking at Archimedes, Newton, and Feynman, and going “surely there’s something small that could have been tweaked about culture beforehand to make some of this low-hanging scientific fruit get grabbed earlier by a bunch of decent thinkers, rather than everything needing to wait for lone geniuses”.
I’d propose that there’s a massive qualitative difference between black-box results (like RCTs) and gears-level model-building (like Archimedes, Newton, and Feynman). The latter are where basically all of the big gains are, and it does seem like society is under-invested in building gears-level models. One possible economic reason for the under-investment is that gears-level models have very low depreciation rates, so they pay off over a very long timescale.
Those are all reasonable questions to ask and points to raise, and I’m not going to go to bat defending any of the suggestions I made off the top of my head when writing the original question. The point of the original question was to see if anybody out there had publications asking/answering the sort of questions you pose, and it looks like the answer is “no”.
For some of these questions, as you argue, it’s possible that the lack of literature is because there really isn’t anything interesting to be found. But at least some of these questions would be interesting to have an answer to regardless of what the answer is—e.g. your example “Is the body’s normal state of operation one of scarcity or non-scarcity?”.
More generally, a better analogy than the dog picture would be the periodic table. At first glance (and certainly before the development of quantum theory) an argument could be made that it doesn’t tell us anything we can’t figure out without it. But it did hint at what questions to ask—e.g. undiscovered elements and their properties. If insulin acts as a price signal (even without a rationing role) or if the body’s fat stores are governed by an internally-represented discount rate, then that immediately suggests that a variety of different cell types would look at those signals to determine their behavior—possibly cell types and behaviors not yet examined. It also predicts that cells which convert one resource into another would examine the corresponding price signals, and adjust production based on relative prices. Like the periodic table, these ideas suggest relationships to look for.
Thanks, I had considered adding something at the top but didn’t actually do that. Will add it now.
There are two separate lenses through which I view the idea of competitive markets as backpropagation.
First, it’s an example of the real meat of economics. Many people—including economists—think of economics as studying human markets and exchange. But the theory of economics is, to a large extent, general theory of distributed optimization. When we understand on a gut level that “price = derivative”, and markets are just implementing backprop, it makes a lot more sense that things like markets would show up in other fields—e.g. AI or biology.
Second, competitive markets as backpropagation is a great example of the power of viewing the world in terms of DAGs (or circuits). I first got into the habit of thinking about computation in terms of DAGs/circuits when playing with generalizations of backpropagation, and the equivalence of markets and backprop was one of the first interesting results I stumbled on. Since then, I’ve relied more and more on DAGs (with symmetry) as my go-to model of general computation.
The most interesting dangling thread on this post is, of course, the extension to out-of-equilibrium markets. I had originally promised a post on that topic, but I no longer intend to ever write that post: I have generally decided to avoid publishing anything at all related to novel optimization algorithms. I doubt that simulating markets would be a game-changer for optimization in AI training, but it’s still a good idea to declare optimization off-limits in general, so that absence of an obvious publication never becomes a hint that there’s something interesting to be found.
A more subtle dangling thread is the generalization of the idea to more complex supply chains, in which one intermediate can be used in multiple ways. Although I’ve left the original footnote in the article, I no longer think I know how to handle non-tree-shaped cases. I am quite confident that there is a way to integrate this into the model, since it must hold for “price=derivative” to hold, and I’m sure the economists would have noticed by now if “price=derivative” failed in nontrivial supply chain models. That said, it’s nontrivial to integrate into the model: if a program reads the same value in two places, then both places read the same value, whereas if two processes consume the same supply-chain intermediate, then they will typically consume different amounts which add up to the amount produced. Translating a program into a supply chain, or vice versa, thus presumably requires some nontrivial transformations in general.
That’s not necessary for all results. It would be relevant to some—e.g. monetary economics (obviously), budgets constraints, and anything where the role of money as an incentive is crucial. But it’s not needed for e.g. much of price theory, which is the main sort of application I imagine. Indeed, if we look at Glen Weyl’s definition of price theory, it immediately sounds like it would be applicable to many problems in biology.
(Also, I suspect one could work around the absence of a budget constraint by directly observing the consumption function.)
UPDATE Dec 2019: Based on Cinelli & Pearls response to the OP (& associated paper), it does indeed look like all the relevant information can be integrated into a DAG model.
Over the course of this thread, I came to the impression that an unnecessary focus on identifiability was the main root problem with the OP. Now it looks like that was probably wrong. However, based on the Cinelli & Pearl paper, it does look like causal DAGs + Bayesian probability (or even non-Bayesian probability, for the example at hand) are all we need for this use-case.
That’s a really nice piece. It shows how to formulate the relevant background knowledge in the graphical language, how to derive the intuitive results which Huitfeldt et al predicted, and how the structural formulation correctly predicts when the intuitive results fail. Well done.
In case people are confused by this: I believe the analogy is that people think of economics as a theory of human markets, even though the math is largely about general properties of distributed optimization.
See my response to leggi’s comment, with the dog picture, as well as to Daniel V.
You should check out the explanation of insulin/glucose regulation in my review of Design Principles of Biological Circuits.
economic models = made up by humans
physiology = millennia of chance and adaptions under various pressures, creating immensely complex systems that just blow my mind.
Again, read that review. A central point of the book is that evolved systems repeatedly converge on similar patterns, because those are the patterns which work well. Economics, on the other hand, is largely a mathematical discipline studying general properties of things-which-work-well for optimization. Through that lens, it makes a lot of sense to apply economic models to biological systems.
I would suggest researching the connections between hunger/intake/digestion/absorption/blood glucose/beta cells/insulin/cell uptake/cell demand/exertion/renal excretion of glucose/polyuria/polydipsia—it’s complex.
Biological models have much to teach us. If we ask the questions and notice the connections rather than trying to make them fit with our explanation.
Consider this picture:
There’s a dog in the picture. Once you see the dog, there’s still a lot going on in the picture, but the whole thing makes a lot more sense.
That’s what theory is about. It’s not “trying to make [reality] fit our explanation”, it’s about noticing hidden structure. That’s the sort of value I expect economic theory would provide in physiology.
The structure of insulin/glucose regulation is the structure of a market. Yes, the system is complicated, and there’s a lot of moving pieces—that’s true in markets too. But over long timescales, the pieces are interacting primarily through a price signal—insulin—and that’s the key characteristic which makes it a market, regardless of whatever else is going on locally. That’s a useful piece of hidden structure.
What is the difference between a generic “signal” and a “price signal”? What is a “price” in physiology?
A price signal would need a few properties:
It would be paired with some physiological resource—it represents the price of something
It would be nonlocal, like e.g. a hormone—the point of prices is to coordinate in a distributed system
Local systems would increase/decrease their consumption of the resource in response to low/high price signal, and vice-versa for production
Some mechanism would make the price signal high when the resource is scarce, and vice versa, so that the amount of resource supplied matches up with the amount demanded.
Insulin is a good example: it acts as (inverse) price signal for glucose. It’s a hormone, and many cell types throughout the body increase/decrease their glucose consumption in response to insulin level. The beta cells in the pancreas act as a market maker, setting insulin levels so that glucose supply matches demand over the long run.
Another question is which basic assumptions embraced in economics can reasonably apply to the units of analysis in physiology (cells, etc.). Economists already have a hard enough time validating assumptions for humans.
This is a point which I think is severely under-appreciated both in and out of economics: it is often far easier to apply economic models to systems in biology than to humans. Humans have complicated, opaque decision-making procedures. We can’t observe those procedures directly, and have to make indirect predictions about their effects. Cells have decision-making procedures which we can directly observe and model, and people have already built many of those models.
Conversely, many of the debatable assumptions economics make about humans would have directly-observable effects in biological systems. If subsystems’ behavior can’t be described by utility functions, then that will directly result in resources consumed to produce things and then destroy them without using them. To the extent that such energy waste is minimized, the system behavior can be approximated by utilities. Another example: if subsystems’ revealed preferences aren’t concave, then that will directly result in instabilities in physiological behavior.
It seems like a lot of examples of virtue signalling require sacrificing intelligence, but sacrificing virtue seems like a less common requirement to signal intelligence. So one possible model would be that, rather than a pareto frontier on which the two trade off symmetrically, intelligent decisions are an input which are destructively consumed to produce virtue signals—like trees are consumed to produce paper.
Could you expand a bit on why you expect a trade-off between intelligence/virtue signalling, as opposed to two independent axes? I can sort of see a case where intelligence is the “cost” part of “costly virtue signalling”, and virtue is the “cost” part of “costly intelligence signalling”, like the examples in toxoplasma of rage. On the other hand, looking at those examples of the dangers of runaway IQ signalling, they generally don’t seem to trade-off against virtue.