I think what is going on here is that both and are of the form with and , respectively. Let’s define the star operator as . Then , by associativity of function composition. Further, if and commute, then so do and :
So the commutativity of the geometric expectation and derivative fall directly out of their representation as and , respectively, by commutativity of and , as long as they are over different variables.
We can also derive what happens when the expectation and gradient are over the same variables: . First, notice that , so .. Also .
Now let’s expand the composition of the gradient and expectation. , using the log-derivative trick. So .
Therefore, .
Writing it out, we have .
I take back the part about pi and update determining the causal structure, because many causal diagrams are constant with the same poly diagram