To save space it is enough to note that P factors as
P(A=a,B=b,C=c) = F1(a) F2(b,a,c) F3(c)
for some real-valued functions F1, F2, and F3. In other words, that P is represented by a Markov network A—B - C. The directions on the edges were not essential.
Can’t the directions be recovered automatically from that expression, though? That is, discarding the directions from the notation of conditional probabilities doesn’t actually discard them.
The reconstruction algorithm would label every function argument as “primary” or “secondary”, begin with no arguments labelled, and repeatedly do this:
For every function with no primary variable and exactly one unlabelled variable, label that variable as primary and all of its occurrences as arguments to other functions as secondary.
When all arguments are labelled, make a graph of the variables with an arrow from X to Y whenever X and Y occur as arguments to the same function, X as secondary and Y as primary. If the functions F1 F2 etc. originally came from a Bayesian network, won’t this recover that precise network?
If the original graph was A ← B → C, the expression would have been F1(a,b) F2(b) F3(c,b).
If the functions F1 F2 etc. originally came from a Bayesian network, won’t this recover that precise network?
I think this is right, if you know that the factors were learned by fitting them to a Bayesian network, you can recover what that network must have been. And you can go even further, if you only have a joint distribution you can use the techniques of the original article to see which Bayesian networks could be consistent with it.
But there is a separate question about why we are interested in Bayesian networks in the first place. SilasBarta seemed to claim that you are naturally led to them if you are interested in representing probability distributions efficiently. But for that purpose (I claim), you only need the idea of factors, not the directed graph structure. E.g. a probability distribution which fits the (equivalent) Bayesian networks A → B → C or A ← B ← C or A ← B → C can be efficiently represented as F1(a,b) F2(b,c). You would not think of representing it as F1(a) F2(a,b) F3(b,c) unless you were already interested in causality.
Can’t the directions be recovered automatically from that expression, though? That is, discarding the directions from the notation of conditional probabilities doesn’t actually discard them.
The reconstruction algorithm would label every function argument as “primary” or “secondary”, begin with no arguments labelled, and repeatedly do this:
For every function with no primary variable and exactly one unlabelled variable, label that variable as primary and all of its occurrences as arguments to other functions as secondary.
When all arguments are labelled, make a graph of the variables with an arrow from X to Y whenever X and Y occur as arguments to the same function, X as secondary and Y as primary. If the functions F1 F2 etc. originally came from a Bayesian network, won’t this recover that precise network?
If the original graph was A ← B → C, the expression would have been F1(a,b) F2(b) F3(c,b).
I think this is right, if you know that the factors were learned by fitting them to a Bayesian network, you can recover what that network must have been. And you can go even further, if you only have a joint distribution you can use the techniques of the original article to see which Bayesian networks could be consistent with it.
But there is a separate question about why we are interested in Bayesian networks in the first place. SilasBarta seemed to claim that you are naturally led to them if you are interested in representing probability distributions efficiently. But for that purpose (I claim), you only need the idea of factors, not the directed graph structure. E.g. a probability distribution which fits the (equivalent) Bayesian networks A → B → C or A ← B ← C or A ← B → C can be efficiently represented as F1(a,b) F2(b,c). You would not think of representing it as F1(a) F2(a,b) F3(b,c) unless you were already interested in causality.