You mention above that Pearl’s ontology ‘has blinded us to the obvious next question’. I am very sympathetic to research programmes that try to overcome such blindness, this is the kind or research I have been doing myself recently. The main type of blindness that I have been trying to combat is blindness to complex types of self-referencing and indirect representation that can be present inside online machine learning agents, specifically in my recent work I have added a less blind viewpoint by modifying and extending Pearl’s causal graphs, so that you end up with a two-causal-diagram model of agency and machine learning. These extensions may be of interest to you, especially in relation to problems of embeddedness, but the main point I want to make here is a methodological one.
What I found, somewhat to my surprise, is that I did not need to develop the full mathematical equivalent of all of Pearl’s machinery, in order to shed more light on the problems I wanted to investigate. For example, the idea of d-separation is very fundamental to the type of thing that Pearl does with causal graphs, fundamental to clarifying problems of experimental design and interpretation in medical experiments. But I found that this concept was irrelevant to my aims. Above, you have a table of how concepts like d-separation map to the mathematics developed in your talk. My methodological suggestion here is that you probably do not want to focus on defining mathematical equivalents for all of Pearl’s machinery, instead it will be a sign of de-blinding progress if you define new stuff that is largely orthogonal.
While I have been looking at blindness to problems of indirection. your part two subtitle suggests you are looking at blindness with respect to the problem of ‘time’ instead. However, my general feeling is that you are addressing another type of blindness, both this talk and in ‘carthesian frames’. You are working to shed more light on the process that creates a causal model, be it a Pearlian or semi-Pearlian model, the process that generates the nodes and the arrows/relations between these nodes.
The mechanical generation of correct (or at least performant) causal models from observational data is a whole (emerging?) subfield of ML I believe, I have nor read much of the literature in this field, but here is one
recent paper that may serve as an entry point.
How I can interpret factoring graphically
Part of your approach is to convert Pearl’s partly graphical math into a different, non-graphical formalism you are more comfortable with. That being said, I will now construct a graphical analogy to the operation of factoring you define.
You define factoring as taking a set S and creating a set of factors (sets) B={b1,b2,⋯bn}, such that (in my words) every s∈S can be mapped to an equivalent tuple (bb1,bb2,⋯,bbn). where bb1∈b1, etc.
Graphically, I can depict S would be a causal graph with just a single node, a node S representing a random variable that takes values in S. The factoring B would be an n-node graph where each node bi represents a random variable taking values from bi. So I can imagine factorization as an operation that splits a single graph node S into many nodes b1,b2,⋯bn.
In terms of mainstream practice in experimental design, this splitting operation replaces a single observable with several sub-observables. Where you depart from normal practice is that you require the splitting operation to create a full bijection: this kind of constraint is much more loosely applied in normal practice. It feels to me you are after some kind of no-loss-of-information criterion in defining partitioning as you do—the criterion you apply seems to be unnecessarily strict however, though it does create a fun mathematical sequence.
In any case, if a single node S splits into n nodes b1,⋯,bn, we can wonder how we should picture the arrows between these nodes b1,⋯,bn, that need to be drawn in after the split. Seems to me that this is a key question you are trying to answer: how does the split create arrows, or other relations that are almost but not entirely like Peal’s causal arrows? My own visual picture here is that, in the most general case, the split creates fully connected directed graph: each node bi has an arrow to every other node bj. This would be a model representation that is compatible with the theory that all observables represented by the bi nodes are dependent on each other. Then, we might transform this fully connected graph into a DAG, a DAG that is still compatible with observed statistical relations, by deleting certain arrows, and potentially by adding unobserved nodes with emerging arrows. (Trivial example: drawing an arrow bi→bj is equivalent to stating a theory that maybe bi is not statistically independent of bj. If I can disprove that theory, I can remove the arrow.)
This transformation process typically allows for many different candidate DAGs to be created which are all compatible with observational data. Pearl also teaches that we may design and run experiments with causal interventions in order to generate more observational data which can eliminate many of these candidate DAGs.
Some general comments:
Overcoming blindness
You mention above that Pearl’s ontology ‘has blinded us to the obvious next question’. I am very sympathetic to research programmes that try to overcome such blindness, this is the kind or research I have been doing myself recently. The main type of blindness that I have been trying to combat is blindness to complex types of self-referencing and indirect representation that can be present inside online machine learning agents, specifically in my recent work I have added a less blind viewpoint by modifying and extending Pearl’s causal graphs, so that you end up with a two-causal-diagram model of agency and machine learning. These extensions may be of interest to you, especially in relation to problems of embeddedness, but the main point I want to make here is a methodological one.
What I found, somewhat to my surprise, is that I did not need to develop the full mathematical equivalent of all of Pearl’s machinery, in order to shed more light on the problems I wanted to investigate. For example, the idea of d-separation is very fundamental to the type of thing that Pearl does with causal graphs, fundamental to clarifying problems of experimental design and interpretation in medical experiments. But I found that this concept was irrelevant to my aims. Above, you have a table of how concepts like d-separation map to the mathematics developed in your talk. My methodological suggestion here is that you probably do not want to focus on defining mathematical equivalents for all of Pearl’s machinery, instead it will be a sign of de-blinding progress if you define new stuff that is largely orthogonal.
While I have been looking at blindness to problems of indirection. your part two subtitle suggests you are looking at blindness with respect to the problem of ‘time’ instead. However, my general feeling is that you are addressing another type of blindness, both this talk and in ‘carthesian frames’. You are working to shed more light on the process that creates a causal model, be it a Pearlian or semi-Pearlian model, the process that generates the nodes and the arrows/relations between these nodes.
The mechanical generation of correct (or at least performant) causal models from observational data is a whole (emerging?) subfield of ML I believe, I have nor read much of the literature in this field, but here is one recent paper that may serve as an entry point.
How I can interpret factoring graphically
Part of your approach is to convert Pearl’s partly graphical math into a different, non-graphical formalism you are more comfortable with. That being said, I will now construct a graphical analogy to the operation of factoring you define.
You define factoring as taking a set S and creating a set of factors (sets) B={b1,b2,⋯bn}, such that (in my words) every s∈S can be mapped to an equivalent tuple (bb1,bb2,⋯,bbn). where bb1∈b1, etc.
Graphically, I can depict S would be a causal graph with just a single node, a node S representing a random variable that takes values in S. The factoring B would be an n-node graph where each node bi represents a random variable taking values from bi. So I can imagine factorization as an operation that splits a single graph node S into many nodes b1,b2,⋯bn.
In terms of mainstream practice in experimental design, this splitting operation replaces a single observable with several sub-observables. Where you depart from normal practice is that you require the splitting operation to create a full bijection: this kind of constraint is much more loosely applied in normal practice. It feels to me you are after some kind of no-loss-of-information criterion in defining partitioning as you do—the criterion you apply seems to be unnecessarily strict however, though it does create a fun mathematical sequence.
In any case, if a single node S splits into n nodes b1,⋯,bn, we can wonder how we should picture the arrows between these nodes b1,⋯,bn, that need to be drawn in after the split. Seems to me that this is a key question you are trying to answer: how does the split create arrows, or other relations that are almost but not entirely like Peal’s causal arrows? My own visual picture here is that, in the most general case, the split creates fully connected directed graph: each node bi has an arrow to every other node bj. This would be a model representation that is compatible with the theory that all observables represented by the bi nodes are dependent on each other. Then, we might transform this fully connected graph into a DAG, a DAG that is still compatible with observed statistical relations, by deleting certain arrows, and potentially by adding unobserved nodes with emerging arrows. (Trivial example: drawing an arrow bi→bj is equivalent to stating a theory that maybe bi is not statistically independent of bj. If I can disprove that theory, I can remove the arrow.)
This transformation process typically allows for many different candidate DAGs to be created which are all compatible with observational data. Pearl also teaches that we may design and run experiments with causal interventions in order to generate more observational data which can eliminate many of these candidate DAGs.