Followup to: Anatomy of Multiversal Utility Functions: Tegmark Level IV

Outline: In the previous post, I discussed the properties of utility functions in the extremely general setting of the Tegmark level IV multiverse. In the current post, I am going to show how the discovery of a theory of physics allows the agent performing a certain approximation in its decision theory. I’m doing this with an eye towards analyzing decision theory and utility calculus in universes governed by realistic physical theories (quantum mechanics, general relativity, eternal inflation...)

A Naive Approach

Previously, we have used the following expression for the expected utility:

[1] $E [U] = \int_{X} U (x) d μ (x)$

Since the integral is over the entire “level IV multiverse” (the space of binary sequences), [1] makes no reference to a specific theory of physics. On the other hand, a realistic agent is usually expected to use its observations to form theories about the universe it inhabits, subsequently optimizing its action with respect to the theory.

Since this process crucially depend on observations, we need to make the role of observations explicit. Since we assume the agent uses some version of UDT, we are not supposed to update on observations, instead evaluating the logical conditional expectation values

[2] $v_{A, U} (π) = E_{l o g} [\int_{X} U (x) d μ (x) ∣ \forall i \in I : A (i) = π (i)]$

Here $A$ is the agent, $π : I \to O$ is a potential policy for the agent (mapping from sensory inputs to actions) and $E_{l o g}$ is expectation value with respect to logical uncertainty.

Now suppose $A$ made observations $τ$ leading it to postulate physical theory $T$ . For the sake of simplicity, we suppose $A$ is only deciding its actions in the universes in which observations $τ$ were made¹. Thus, we assume that the input space factors as $I = I_{p a s t} \times I_{f u t u r e}$ and we’re only interested in inputs in the set $τ \times I_{f u t u r e}$ . This simplification leads to replacing [2] by

[3] $v_{A, U}^{f u t u r e} (π) = E_{l o g} [\int_{X} U (x) d μ (x) ∣ \forall i \in I_{f u t u r e} : A (τ \times i) = π (i)]$

where $π : I_{f u t u r e} \to O$ is a “partial” policy referring to the $τ$ -universe only.

The discovery of $T$ allows $A$ to perform a certain approximation of [2′]. A naive guess of the form of the approximation is

[4′] $v_{A, U}^{f u t u r e} (π) \approx w_{A, U}^{f u t u r e} E_{l o g} [\int_{X} U (x) d ν_{T} (x) ∣ \forall i \in I_{f u t u r e} : A (τ \times i) = π (i)]$

Here, $w_{A, U}^{f u t u r e}$ is a constant representing the contributions of the universes in which $T$ is not valid (whose logical-uncertainty correlation with $A$ we neglect) and $ν_{T}$ is a measure on $X$ corresponding to $T$ . Now, physical theories in the real world often specify time evolution equations without saying anything about the initial conditions. Such theories are “incomplete” from the point of view of the current formalism. To complete it we need a measure on the space of initial conditions: a “cosmology”. A simple example of a “complete” theory $T$ : a cellular automaton with deterministic (or probabilistic) evolution rules and a measure on the space of initial conditions (e.g. set each cell to an independently random state).

However, [4′] is in fact not a valid approximation of [3]. This is because the use of $ν_{T}$ fixes the ontology: $ν_{T}$ treats binary sequences as encoding the universe in a way natural for $T$ whereas dominant² contributions to [3] come from binary sequences which encode the universe in a way natural for $U$ .

Ontology Decoupling

Allow me a small digression to discussing desiderata of logical uncertainty. Consider an expression of the form $E_{l o g} [2 x]$ where $x$ is a mathematical constant with some complex definition e.g. $π$ or the Euler-Mascheroni constant $γ$ . From the point of view of an agent with bounded computing resources, $x$ is a random variable rather than a constant (since its value is not precisely known). Now, in usual probability theory we are allowed to use identities such as $E_{l o g} [2 x] = 2 E_{l o g} [x]$ . In the case of logical uncertainty, the identity is less obvious since the operation of multiplying by 2 has non-vanishing computing cost. However, since this cost is very small we expect to have the approximate identity $E_{l o g} [2 x] \approx 2 E_{l o g} [x]$ .

Consider a set $Δ$ of programs computing functions $X \to X$ containing the identity program. Then, the properties of the Solomonoff measure give us the approximation

[5] $\int_{X} U (x) d μ (x) \approx \sum_{f \in Δ} 2^{- | f |} \int_{X} U (f (x)) d μ_{Δ} (x)$

Here $μ_{Δ}$ is the restriction of $μ$ to hypotheses which don’t decompose as applying some program in $Δ$ to another hypothesis and $| f |$ is the length of the program $f$ .

Applying [5] to [3] we get

$v_{A, U}^{f u t u r e} (π) \approx E_{l o g} [\sum_{f \in Δ} 2^{- | f |} \int_{X} U (f (x)) d μ_{Δ} (x) ∣ ψ_{A}^{f u t u r e} (π)]$

Here $ψ_{A}^{τ} (π)$ is a shorthand notation for $\forall i \in I_{f u t u r e} : A (τ \times i) = π (i)$ . Now, according to the discussion above, if we choose $Δ$ to be a set of sufficiently cheap programs³ we can make the further approximation

$v_{A, U}^{f u t u r e} (π) \approx \sum_{f \in Δ} 2^{- | f |} E_{l o g} [\int_{X} U (f (x)) d μ_{Δ} (x) ∣ ψ_{A}^{f u t u r e} (π)]$

If we also assume $Δ$ to sufficiently large, it becomes plausible to use the approximation

[4] $v_{A, U}^{f u t u r e} (π) \approx w_{A, U}^{f u t u r e} \sum_{f \in Δ} 2^{- | f |} E_{l o g} [\int_{X} U (f (x)) d ν_{T} (x) ∣ ψ_{A}^{f u t u r e} (π)]$

The ontology problem disappears since $f$ bridges between the ontologies of $T$ and $U$ . For example, if $T$ describes the Game of Life and $U$ describes glider maximization in the Game of Life, but the two are defined using different encodings of Game of Life histories, the term corresponding to the re-encoding $f$ will be dominant² in [4].

Stay Tuned

The formalism developed in this post does not yet cover the entire content of a physical theory. Realistic physical theories not only describe the universe in terms of an arbitrary ontology but explain how this ontology relates to the “classical” world we experience. In other words, a physical theory comes with an explanation of the embedding of the agent in the universe (a phenomenological bridge). This will be addressed in the next post where I explain the Cartesian approximation: the approximation decoupling between the agent and the rest of the universe.

Subsequent posts will apply this formalism to quantum mechanics and eternal inflation to understand utility calculus in Tegmark levels III and II respectively.

¹ As opposed to a fully fledged UDT agent which has to simultaneously consider its behavior in all universes.

² By “dominant” I mean dominant in dependence on the policy $π$ rather than absolutely.

³ They have to be cheap enough to take the entire sum out of the expectation value rather than only the $2^{- | f |}$ factor in a single term. This condition depends on the amount of computing resources available to our agent, which is an implicit parameter of the logical-uncertainty expectation values $E_{l o g}$ .

The Role of Physics in UDT: Part I

A Naive Approach

Ontology Decoupling

Stay Tuned