Maybe I’m explaining it badly? I’m trying to point to the Judea Pearl thing in my own words. The claim is not that causality “just is” conditional independence relationships. (Pearl repeatedly explicitly disclaims that causal concepts are different from statistical concepts and require stronger assumptions.) Do you have an issue with the graph formalism itself (as an explanation of the underlying reality of how causality and counterfactuals work), separate from practical concerns about how one would learn a particular graph?
Maybe I’m explaining it badly? I’m trying to point to the Judea Pearl thing in my own words. The claim is not that causality “just is” conditional independence relationships. (Pearl repeatedly explicitly disclaims that causal concepts are different from statistical concepts and require stronger assumptions.)
Partly it’s explaining it badly. In addition to the points listed above, there’s also issues like focusing entirely on rung 2 causality and disregarding rung 3 causality, which is arguably the truer kind of causality.
Do you have an issue with the graph formalism itself (as an explanation of the underlying reality of how causality and counterfactuals work), separate from practical concerns about how one would learn a particular graph?
I assume that here we are understanding the graph formalism sufficiently broadly as to include e.g. differential equations, as otherwise there’s definitely a problem already there. And in the same vein, for most problems both DAGs and differential equations are too rigid/vector-spacey to work, and we probably need new formalisms that can better handle systems with varying structure of variables.
Regardless, I don’t think the question of how one would learn a particular graph is merely a practical concern; it’s the core part. Not just learning the edges between the vertices, but also in selecting the variables that are supposed to feature in the graphs. In fact I suspect once we have a good understanding of representation learning, we will see that causal structure learning follows mostly from the representations we choose, because the things that make certain function interesting as features tend to be the causal effects they have.
As far as I know, most of the focus of the causal inference literature is on effect size estimation. Which is probably important too, but it’s not really the hard part that OP is asking about. As far as I know, it only has slight focus on causal structure learning, and the typical advice seems to be to have human experts do the causal structure specification. And as far as I know, they don’t have an answer at all to representation learning. (Instead, John Wentworth seems to be the hero who is working on a solid theory for this.)
Maybe I’m explaining it badly? I’m trying to point to the Judea Pearl thing in my own words. The claim is not that causality “just is” conditional independence relationships. (Pearl repeatedly explicitly disclaims that causal concepts are different from statistical concepts and require stronger assumptions.) Do you have an issue with the graph formalism itself (as an explanation of the underlying reality of how causality and counterfactuals work), separate from practical concerns about how one would learn a particular graph?
Partly it’s explaining it badly. In addition to the points listed above, there’s also issues like focusing entirely on rung 2 causality and disregarding rung 3 causality, which is arguably the truer kind of causality.
I assume that here we are understanding the graph formalism sufficiently broadly as to include e.g. differential equations, as otherwise there’s definitely a problem already there. And in the same vein, for most problems both DAGs and differential equations are too rigid/vector-spacey to work, and we probably need new formalisms that can better handle systems with varying structure of variables.
Regardless, I don’t think the question of how one would learn a particular graph is merely a practical concern; it’s the core part. Not just learning the edges between the vertices, but also in selecting the variables that are supposed to feature in the graphs. In fact I suspect once we have a good understanding of representation learning, we will see that causal structure learning follows mostly from the representations we choose, because the things that make certain function interesting as features tend to be the causal effects they have.
As far as I know, most of the focus of the causal inference literature is on effect size estimation. Which is probably important too, but it’s not really the hard part that OP is asking about. As far as I know, it only has slight focus on causal structure learning, and the typical advice seems to be to have human experts do the causal structure specification. And as far as I know, they don’t have an answer at all to representation learning. (Instead, John Wentworth seems to be the hero who is working on a solid theory for this.)