Then how do you choose between the three directed representatives? Is there some connotation, some coming-apart of concepts that would become apparent after generalization, or did you just pick X ← Y → X because it’s symmetric?
Good question. The three directed graphical representations directly expand to different but equivalent expressions, and sometimes we want to think in terms of one of those expressions over another.
The most common place this comes up for us: sometimes we want to think about a latent Γ in terms of its defining conditional distribution P[Γ|X]. When doing that, we try to write diagrams with Γ downstream, like e.g. X1→X2→Γ. Other times, we want to think of Γ in terms of the distribution P[X|Γ]. When thinking that way, we try to write diagrams with Γ upstream, e.g. Γ→X1→X2.
In the case of X←Y→X, the natural information-theoretic expression for its error is H(X|Y) or equivalently I(X;X|Y). Those condition on Y, so we want Y “upstream” in order to most directly express those quantities. You could derive the same errors from X←Y←X or X→Y→X, but if you write out those DKL errors and try to reduce them to H(X|Y), you’ll probably find that it takes more steps.
If the expressions cease to be equivalent in some natural generalization of this setting, then I recommend that you try to find a proof there, because the proof space should be narrower and thus easier to search.
Then how do you choose between the three directed representatives? Is there some connotation, some coming-apart of concepts that would become apparent after generalization, or did you just pick X ← Y → X because it’s symmetric?
Good question. The three directed graphical representations directly expand to different but equivalent expressions, and sometimes we want to think in terms of one of those expressions over another.
The most common place this comes up for us: sometimes we want to think about a latent Γ in terms of its defining conditional distribution P[Γ|X]. When doing that, we try to write diagrams with Γ downstream, like e.g. X1→X2→Γ. Other times, we want to think of Γ in terms of the distribution P[X|Γ]. When thinking that way, we try to write diagrams with Γ upstream, e.g. Γ→X1→X2.
In the case of X←Y→X, the natural information-theoretic expression for its error is H(X|Y) or equivalently I(X;X|Y). Those condition on Y, so we want Y “upstream” in order to most directly express those quantities. You could derive the same errors from X←Y←X or X→Y→X, but if you write out those DKL errors and try to reduce them to H(X|Y), you’ll probably find that it takes more steps.
If the expressions cease to be equivalent in some natural generalization of this setting, then I recommend that you try to find a proof there, because the proof space should be narrower and thus easier to search.