Briefly Extending Differential Optimization to Distributions

I’ve done some work on a definition of optimization which applies to “trajectories” in deterministic, differentiable models. What happens when we try and introduce uncertainty?

Suppose we have the following system consisting of three variables, the past , future , and some agent . The agent “acts” on the system to push the value of 80% of the way towards being zero. We can think of this as follows: . Under these circumstances, which means our optimization function gives: .

What if we instead consider a normal distribution over ? This must be parameterized by a mean and a standard deviation . Our formulae now look like this:



So what does it look like for to “not depend” on ? We could just “pick” some value for but this seems like cheating. What if we set up a new model, in which depends on and , but depends on instead of ? We can allow and to have the same distributions as before:



Calculating is a bit more difficult. We can think of it as adding two uncorrelated normal distributions together. For normal distributions this just means adding the means and variances together. Our distributions have means and , and variances and . Therefore we get a new distribution with mean and variance . This gives a standard deviation of .

What’s the entropy of a normal distribution? Well, it’s difficult to say properly, since entropy is poorly-defined on continuous variables. If one take the limiting density of discrete points one gets , where goes to infinity. This is a problem unless we happen to be subtracting one entropy from another. So let’s do that.





Ok so we got the sign wrong the first time. Nevermind. But there is another issue, this is higher than our previous value. This is because we’re double-counting the variance from . We get the variance from and in . We can correct this by changing the object of study from to . This works exactly like you’d expect: it gives a weighted average of the value of for all possible values of . In this case it is trivial: for any fixed value of we get . So lets take a look:




In any Bayes-ish net-ish model, if we can get an agent’s behaviour in the following form:

A network with nodes P, A, and F. There are arrows from P to F, P to A, and A to F

We can make the following transformation, and get .

The network from above is shown. An arrow points from it to a new network with nodes P

I will think more about whether this extension is properly valid. One limitation is that we cannot have multiple sets of arrows into and out of , since this would mess with the splitting of .

No comments.