I’m not an expert on causal inference, and I’m having some trouble grasping what you mean by “data-generating mechanism”. Intuitively, I would think that data is generated by a joint probability distribution, but you say that the “data-generating mechanism” is something different than the joint probability distribution and it actually generates it.
To test my understanding, I’ll try to reformulate this in the language of standard (non-causal) probability theory: The “data-generating mechanism” is a stochastic process that jointly generates all your observed data. It can have inputs (that is, it can be a channel), it can be non-stationary, and so on. The data that is generated can be broken into individual samples (e.g. patients). Under certain assumptions, it can be considered that these samples were independently sampled from a memory-less, input-less probability distribution, which is what you call the “joint probability distribution” (it is joint on the variables within each samples, but not on the samples themselves). We are interested in estimating the “data-generating mechanism” from the observed data. Since this is a difficult task, we break it into two sub-tasks: biostatistics, which consists in estimating the per-sample joint probability distribution, and epidemiology, which consists in estimating the whole stochastic process from this per-sample distribution.
Thanks for the comment! I am not confident in answering whether your summary is correct or not, partly because it looks like we come from different backgrounds which use different languages to describe the same things.
The point I am trying to make is that the joint distribution of A, B and C only consists of information such as “in 15% of the population, A=1, B=1 and C=1, in 5% of the population A=0, B=1 and C=1” etc
If the data was generated by a joint distribution, this seems like it would be something like an algorithm that just says “Assign ‘A=1, B=1, C=1’ with probability 0.15 and assign ‘A=0, B=1, C=1’ with probability 0.05” etc
However, for causal inference, it is necessary to model the world as if the joint distribution is generated by three separate algorithms: One for A, one for B and one for C. There are many possible sets of such algorithms that will result in the same joint distribution.
We will therefore need to set up the problem so that we explicitly state the order in which the variables are generated, and stipulate that the input can consist only of variables from the past.
I’m not an expert on causal inference, and I’m having some trouble grasping what you mean by “data-generating mechanism”. Intuitively, I would think that data is generated by a joint probability distribution, but you say that the “data-generating mechanism” is something different than the joint probability distribution and it actually generates it.
To test my understanding, I’ll try to reformulate this in the language of standard (non-causal) probability theory:
The “data-generating mechanism” is a stochastic process that jointly generates all your observed data. It can have inputs (that is, it can be a channel), it can be non-stationary, and so on.
The data that is generated can be broken into individual samples (e.g. patients). Under certain assumptions, it can be considered that these samples were independently sampled from a memory-less, input-less probability distribution, which is what you call the “joint probability distribution” (it is joint on the variables within each samples, but not on the samples themselves).
We are interested in estimating the “data-generating mechanism” from the observed data. Since this is a difficult task, we break it into two sub-tasks: biostatistics, which consists in estimating the per-sample joint probability distribution, and epidemiology, which consists in estimating the whole stochastic process from this per-sample distribution.
Is my summary correct?
Thanks for the comment! I am not confident in answering whether your summary is correct or not, partly because it looks like we come from different backgrounds which use different languages to describe the same things.
The point I am trying to make is that the joint distribution of A, B and C only consists of information such as “in 15% of the population, A=1, B=1 and C=1, in 5% of the population A=0, B=1 and C=1” etc
If the data was generated by a joint distribution, this seems like it would be something like an algorithm that just says “Assign ‘A=1, B=1, C=1’ with probability 0.15 and assign ‘A=0, B=1, C=1’ with probability 0.05” etc
However, for causal inference, it is necessary to model the world as if the joint distribution is generated by three separate algorithms: One for A, one for B and one for C. There are many possible sets of such algorithms that will result in the same joint distribution.
We will therefore need to set up the problem so that we explicitly state the order in which the variables are generated, and stipulate that the input can consist only of variables from the past.