Mostly the essay is careful not to flatly say that a node value X_1 is a function of a node value X_2.
Sometimes it is a random function of X_2 (note the qualifier “random”), sometimes it is a function of X_2 and a random value Y_1, where Y_1 does not have its own node (so does not increase the size of the graph). And of course there is an exception when proposing the alternate approach, where the nodes are divided into the random ones, and those which are a deterministic function of other node values.
In my example, I would not say the tar node value was a function of the smokes node value, it would be a function of the smokes node value and an extra random variable. In the alternate approach the extra random variable needs its own node, in the main approach it doesn’t, you can just assume every node may have extra random variables influencing it without representing them on the graph.
Mostly the essay is careful not to flatly say that a node value X_1 is a function of a node value X_2.
I’m not sure where you’re getting that. Here is how he describes the dependence among nodes under the “first approach”:
Motivated by the above discussion, one way we could define causal influence would be to require that X_j be a function of its parents:
X_j = f_j(X_{\mbox{pa}(j)}),
where f_j(\cdot) is some function. In fact, we’ll allow a slightly more general notion of causal influence, allowing X_j to not just be a deterministic function of the parents, but a random function. We do this by requiring that X_j be expressible in the form:
where f_j is a function, and Y_{j,\cdot} is a collection of random variables such that: (a) the Y_{j,\cdot} are independent of one another for different values of j; and (b) for each j, Y_{j,\cdot} is independent of all variables X_k, except when X_k is X_j itself, or a descendant of X_j.
So, unless you have some exogenous Y-nodes, X_j will be a deterministic function of its parent X-nodes. The only way he introduces randomness is by introducing the Y-nodes. My question is, how is that any different from the “alternative approach” that he discusses later?
I see. Thanks. I was thrown off because he’d already said that he would “overload” the notation for random variables, using it also to represent nodes or sets of nodes. But what you say makes sense.
I’m not sure what the real difference is, though. The graph is just a way to depict dependencies among random variables. If you’re already working with a collection of random variables with given dependencies, the graph is just, well, a graphical way to represent what you’re already dealing with. Am I right, then, in thinking that the only difference between the two “approaches” is whether you bother to create this auxiliary graphical structure to represent what you’re doing, instead of just working directly with the random variable X_i, Y_ij, and their dependency functions f_i ?
It’s easier for humans to think in terms of pictures. But if you were programming a computer to reason causally this way, wouldn’t you implement the two “approaches” in essentially the same way?
If you have a specified causal system you could represent it either way, yes.
Speculating on another reason he may have made the distinction: often he posed problems with specified causal graphs but unspecified functions. So he may have meant that in the problems like these, with one approach you can easily specify some node values as being deterministic functions of other node values, whereas with the other approach you don’t (since a specified graph rules out further random influences in one approach but not the other).
Mostly the essay is careful not to flatly say that a node value X_1 is a function of a node value X_2. Sometimes it is a random function of X_2 (note the qualifier “random”), sometimes it is a function of X_2 and a random value Y_1, where Y_1 does not have its own node (so does not increase the size of the graph). And of course there is an exception when proposing the alternate approach, where the nodes are divided into the random ones, and those which are a deterministic function of other node values.
In my example, I would not say the tar node value was a function of the smokes node value, it would be a function of the smokes node value and an extra random variable. In the alternate approach the extra random variable needs its own node, in the main approach it doesn’t, you can just assume every node may have extra random variables influencing it without representing them on the graph.
I’m not sure where you’re getting that. Here is how he describes the dependence among nodes under the “first approach”:
So, unless you have some exogenous Y-nodes, X_j will be a deterministic function of its parent X-nodes. The only way he introduces randomness is by introducing the Y-nodes. My question is, how is that any different from the “alternative approach” that he discusses later?
That is not the same as there being Y-nodes. Nodes would be part of the graph structure, and so be more visible when you look at the graph.
The only difference is whether the Y-values require their own nodes.
I see. Thanks. I was thrown off because he’d already said that he would “overload” the notation for random variables, using it also to represent nodes or sets of nodes. But what you say makes sense.
I’m not sure what the real difference is, though. The graph is just a way to depict dependencies among random variables. If you’re already working with a collection of random variables with given dependencies, the graph is just, well, a graphical way to represent what you’re already dealing with. Am I right, then, in thinking that the only difference between the two “approaches” is whether you bother to create this auxiliary graphical structure to represent what you’re doing, instead of just working directly with the random variable X_i, Y_ij, and their dependency functions f_i ?
It’s easier for humans to think in terms of pictures. But if you were programming a computer to reason causally this way, wouldn’t you implement the two “approaches” in essentially the same way?
If you have a specified causal system you could represent it either way, yes.
Speculating on another reason he may have made the distinction: often he posed problems with specified causal graphs but unspecified functions. So he may have meant that in the problems like these, with one approach you can easily specify some node values as being deterministic functions of other node values, whereas with the other approach you don’t (since a specified graph rules out further random influences in one approach but not the other).