Defining Optimization in a Deeper Way Part 4

In the last post I introduced a potential measure for optimization, and applied it to a very simple system. In this post I will show how it applies to some more complex systems. My five takeaways so far are:

We can recover an intuitive measure of optimization
Even around a stable equilibrium, $O p (A; n, m)$ can be negative
Our measures throw up issues in some cases
Our measures are very messy in chaotic environments
$O p$ seems to be defined even in chaotic systems

It’s good to be precise with our language, so let’s be precise. Remember our model system which looks like this:

In this network, each node is represented by a real number. We’ll use superscript notation to notate the value of a node: $w^{n}$ is the value of node $n$ in the world $W$ .

The heart of this is a quantity I’ll call $C o m p$ , which is:

$C o m p (A; n, m) = lim x^{m} \to w^{n} [\frac{x^{m} - w^{m}}{y^{m} - w^{m}}]$

Which is equivalent to.

$C o m p (A : n, m) = \frac{\partial s^{m}}{\partial s^{n}} {∣ ∣}_{A varies} / \frac{\partial s^{m}}{\partial s^{n}} {∣ ∣}_{A constant}$

( $s^{n}$ is the generic version of $w^{n}$ , $x^{n}$ , $y^{n}$ )

Our current measure for optimization is the following value:

$O p (A; n, m) = lim x^{m} \to w^{n} [- log | C o m p (A : n, m) |]$

$O p$ is positive when the nodes in $A$ are doing something optimizer-ish towards the node $m$ . This corresponds when $C o m p$ is < 1. We can understand this as when $A$ is allowed to vary with respect to changes in $s^{n}$ , the change that propagates forwards to $s^{m}$ is smaller.

$O p$ is negative when the nodes in $A$ are doing something like “amplification” of the variance in $m$ . Specifically, we refer to $A$ optimizing $m$ with respect to $n$ around the specific trajectory $W$ , by an amount of nats equal to $O p (A; n, m)$ . We’ll investigate this measure in a few different systems.

A Better Thermostat Model

Our old thermostat was not a particularly good model of a thermostat. Realistically a thermostat cannot apply infinite heating or cooling to a system. For a better model let’s consider the function

$T h e r m (θ, p; s^{T}) ⎧ ⎪ ⎨ ⎪ ⎩ \begin{matrix} p & θ \leq s^{T} \frac{p}{θ} s^{T} & - θ < s^{T} < θ - p & s^{T} \leq - θ \end{matrix}$

Now imagine we redefine our continuous thermostat like this:

$s_{t + δ t}^{T} = s_{t}^{R} - d$

$s_{t + δ t}^{R} = s_{t}^{R} - δ t \times T h e r m (θ, p; s_{t}^{T})$

Within the narrow “basin” of $- θ \leq s^{T} \leq θ$ , it behaves like before. But outside the change in temperature over time is constant. This looks like the following:

When we look at our optimizing measure, we can see that in while $s^{R}$ remains in the linear decreasing region, $O p = 0$ . It only increases when $s^{R}$ reaches the exponentially decreasing region.

Now we might want to ask ourselves another question, for what values of $s_{0}^{R}$ is $O p (T; s_{0}^{R}, s_{t}^{R})$ positive for a given value of $t$ , say $t = 10$ ? Let’s set $p = 1$ , $θ = 1$ and the initial $s_{0}^{T} = 0$ . The graph of this looks like the following:

Every initial $s^{R}$ which has a trajectory which leads into the “optimizing region” between the temperatures of 24 and 26 is optimized a bit. The maximum $O p$ values are trajectories which start in this region.

Point 1: We can Recover an Intuitive Measure of Optimization

What we might want to do is measure the “amount” of optimization in this region, between the points $s_{0}^{R} = 5$ and $s_{0}^{R} = 45$ , with respect to $s_{10}^{R}$ . If we choose this measure to be the integral of $1 - C o m p$ , we get some nice properties.

It (almost) no longer depends on $θ$ , but depends linearly on $p$ .

$θ = 1$ , $p = 1$ gives an integral of $19.961$

$θ = 0.5$ , $p = 1$ gives an integral of $19.611$

$θ = 1$ , $p = 0.5$ gives an integral of $9.980$

As $θ \to 0$ , our integral remains (pretty much) the same. This is good because it means we can assign some “opimizing power” to a thermostat which acts in the “standard” way, i.e. applying a change of $+ p$ each time unit to the temperature if it’s below the set point, and a change of $- p$ each time unit if it’s above the set point. And it’s no coincidence that that power is equal to $2 p t$ .

Let’s take a step back to consider what we’ve done here. If we consider the following differential equation:

$\frac{d T}{d t} = {\begin{matrix} - p & T > T_{s e t} 0 & T = T_{s e t} p & T < T_{s e t} \end{matrix}$

It certainly looks like $T$ values are being compressed about $T_{s e t}$ by $2 p$ per time unit, but that requires us to do a somewhat awkward manoeuvre: We have to equivocate our metric of the space of $T$ at $t = 10$ with our metric of the space of $T$ at $t = 0$ . For temperatures this can be done in a natural way, but this doesn’t necessarily extend to other systems. It also doesn’t stack up well with systems which naturally compress themselves along some sort of axis, for example water going into a plughole.

We’ve managed to recreate this using what I consider to be a much more flexible, well-defined, and natural measure. This is a good sign for our measure.

The Lorenz System

This is a famed system defined by the differential equations:

$\frac{d a}{d t} = σ (a - b)$

$\frac{d b}{d t} = a (ρ - c) - b$

$\frac{d c}{d t} = a b - β c$

(I have made the notational change from the “standard” $x, y, z \to a, b, c$ in order to avoid collision with my own notation)

Which can fairly easily and relatively accurately be converted to discreet time. We’ll keep $σ = 10, β = \frac{8}{3}$ as constant values. For values of $ρ < 1$ we have a single stable equilibrium point. For values $1 < ρ < 24.74$ we get three stable equilibria, and for values $ρ > 24.74$ we have a chaotic system. We’ll investigate the first and third cases.

The most natural choices for $A$ are all of any one of the $a$ $b$ or $c$ values. We could also equally validly choose $A$ to be a pair of them, although this might cause some issues. A reasonable choice for $n$ would be the initial value of either of the two $a$ , $b$ , or $c$ which aren’t chosen for $A$ .

Point 2: Even Around a Stable Equilibrium, $O p (A; n, m)$ an be Negative

Let’s choose $ρ = 0.8$ , which means we have a single stable point at $a = 0, b = 0, c = 0$ . Here are plots for the choice of $a$ as the set $A$ , and $s_{0}^{b}$ as the axis along which to measure optimization. (So we’re changing the value of $s_{0}^{b}$ and looking at how future values of $s_{t}^{b}$ and $s_{t}^{c}$ change, depending on whether or not values of $s_{t}^{a}$ are allowed to change)

Due to my poor matplotlib abilities, those all look like one graph. This indicates that we are not in the chaotic region of the Lorenz system. The variables $a$ , $b$ , and $c$ approach zero in all cases.

As we can see, difference $y_{t}^{b} - w_{t}^{b}$ is greater than the difference $x_{t}^{b} - w_{t}^{b}$ . The mathematics of this are difficult to interpret meaningfully, so I’ll settle with the idea that changes in $a$ , $b$ , and $c$ in some way compound on one another over time, even as all three approach zero. When we plot values for $O p$ we get this:

The values for $O p (a; s_{0}^{b}, s^{b})$ and $O p (a; s_{0}^{b}, s^{c})$ are negative, as expected. This is actually really important! It’s important that our measure captures the fact that even though the future is being “compressed” — in the sense that future values of $a$ , $b$ , and $c$ approach zero as $t \to \infty$ — it’s not necessarily the case that these variables (which are the only variables in the system) are optimizing each other.

Point 3: Our Measures Throw Up Issues in Some Cases

Now what about variation along the axis $s_{0}^{c}$ ?

We run into a bit of an issue! For a small chunk of time, the differences $x^{c} - w^{c}$ and $y^{c} - w^{c}$ have different signs. This causes $O p$ to be complex valued, whoops!

Point 4: Our Measures are Very Messy in Chaotic Environments

When when choose $ρ = 28$ , it’s a different story. Here we are with $a$ as $A$ , $s_{0}^{b}$ as the axis of optimization:

Now the variations are huge! And they’re wild and fluctuating.

Huge variations across everything. This is basically what it means to have a chaotic system. But interestingly there is a trend towards $O p$ becoming negative in most cases, which should tell us something, namely that these things are spreading one another out.

What happens if we define $A$ as $a, t \leq 10$ ? This means that for $s_{t}^{a}$ values with $t > 10$ we allow a difference between the $w^{a}$ and $y^{a}$ values. We get graphs that look like this:

This is actually a good sign. Since $A$ only has a finite amount of influence, we’d expect that it can only de-optimize $b$ and $c$ by a finite degree into the future.

Point 5: $O p$ Seems to be Defined Even in Chaotic Systems

It’s also worth noting that we’re only looking at an approximation of $O p$ here. What happens when we reduce the $δ b = x_{0}^{b} - w_{0}^{b}$ by some amount? In our other cases we get the same answer. Let’s just consider the effect on $s^{b}$ .

Works for a shorter simulation, what about a longer one?

This seems to be working mostly fine.

Conclusions and Next Steps

Looks like our system is working reasonably well. I’d like to apply it to some even more complex models but I don’t particularly know which ones to use yet! I’d also like to look at landscapes of $O p$ and $C o m p$ values for the Lorenz system, the same way I looked at landscapes of the thermostat system. The aim is to be able to apply this analysis to a neural network.