There are many confused discussions of anthropic reasoning, both on LW and in surprisingly mainstream literature. In this article I will discuss UDASSA, a framework for anthropic reasoning due to Wei Dai. This framework has serious shortcomings, but at present it is the only one I know which produces reasonable answers to reasonable questions; at the moment it is the only framework which I would feel comfortable using to make a real decision.
I will discuss 3 problems:
1. In an infinite universe, there are infinitely many copies of you (infinitely many of which are Boltzmann brains). How do you assign a measure to the copies of yourself when the uniform distribution is unavailable? Do you rule out spatially or temporally infinite universes for this reason?
2. Naive anthropics ignore the substrate on which a simulation is running and count how many instances of a simulated experience exist (or how many distinct versions of that experience exist). These beliefs are inconsistent with basic intuitions about conscious experience, so we have to abandon something intuitive.
3. The Born probabilities seem mysterious. They can be explained (as well as any law of physics can be explained) by UDASSA.
Why Anthropic Reasoning?
When I am trying to act in my own self-interest, I do not know with certainty the consequences of any particular decision. I compare probability distributions over outcomes: an action may lead to one outcome with probability 1⁄2, and a different outcome with probability 1⁄2. My brain has preferences between probability distributions built into it.
My brain is not built with the machinery to decide between different universes each of which contains many simulations I care about. My brain can’t even really grasp the notion of different copies of me, except by first converting to the language of probability distributions. If I am facing the prospect of being copied, the only way I can grapple with it is by reasoning “I have a 50% chance of remaining me, and a 50% chance of becoming my copy.” After thinking in this way, I can hope to intelligently trade-off one copy’s preferences against the other’s using the same machinery which allows me to make decisions with uncertain outcomes.
In order to perform this reasoning in general, I need a better framework for anthropic reasoning. What I want is a probability distribution over all possible experiences (or “observer-moments”), so that I can use my existing preferences to make intelligent decisions in a universe with more than one observer I care about.
I am going to leave many questions unresolved. I don’t understand continuity of experience or identity, so I am simply not going to try to be selfish (I don’t know how). I don’t understand what constitutes conscious experience, so I am not going to try and explain it. I have to rely on a complexity prior, which involves an unacceptable arbitrary choice of a notion of complexity.
The Absolute Self-Selection Assumption
A thinker using Solomonoff induction searches for the simplest explanation for its own experiences. It eventually learns that the simplest explanation for its experiences is the description of an external lawful universe in which its sense organs are embedded and a description of that embedding.
As humans using Solomonoff induction, we go on to argue that this external lawful universe is real, and that our conscious experience is a consequence of the existence of certain substructure in that universe. The absolute self-selection assumption discards this additional step. Rather than supposing that the probability of a certain universe depends on the complexity of that universe, it takes as a primitive object a probability distribution over possible experiences.
By the same reasoning that led a normal Solomonoff inductor to accept the existence of an external universe as the best explanation for its experiences, the least complex description of your conscious experience is the description of an external lawful universe and directions for finding the substructure embodying your experience within that substructure.
This requires specifying a notion of complexity. I will choose a universal computable distribution over strings for now, to mimic conventional Solomonoff induction as closely as possible (and because I know nothing better). The resulting theory is called UDASSA, for Universal Distribution + ASSA.
Recovering Intuitive Anthropics
Suppose I create a perfect copy of myself. Intuitively, I would like to weight the two copies equally. Similarly, my anthropic notion of “probability of an experience” should match up with my intuitive notion of probability. Fortunately, UDASSA recovers intuitive anthropics in intuitive situations.
The shortest description of me is a pair (U, x), where U is a description of my universe and x is a description of where to find me in that universe. If there are two copies of me in the universe, then the experience of each can be described in the same way: (U, x1) and (U, x2) are descriptions of approximately equal complexity, so I weight the experience of each copy equally. The total experience of my copies is weighted twice as much as the total experience of an uncopied individual.
Part of x is a description of how to navigate the randomness of the universe. For example, if the last (truly random) coin I saw flipped came up heads, then in order to specify my experiences you need to specify the result of that coin flip. An equal number of equally complex descriptions point to the version of me who saw heads and the version of me who saw tails.
Problem #1: Infinite Cosmologies
Modern physics is consistent with infinite universes. An infinite universe contains infinitely many observers (infinitely many of which share all of your experiences so far), and it is no longer sensible to talk about the “uniform distribution” over all of them. You could imagine taking a limit over larger and larger volumes, but there is no particular reason to suspect such a limit would converge in a meaningful sense. One solution that has been suggested is to choose an arbitrary but very large volume of spacetime, and to use a uniform distribution over observers within it. Another solution is to conclude that infinite universes can’t exist. Both of these explanations are unsatisfactory.
UDASSA provides a different solution. The probability of an experience depends exponentially on the complexity of specifying it. Just existing in an infinite universe with a short description does not guarantee that you yourself have a short description; you need to specify a position within that infinite universe. For example, if your experiences occur 34908172349823478132239471230912349726323948123123991230 steps after some naturally specified time 0, then the (somewhat lengthy) description of that time is necessary to describe your experiences. Thus the total measure of all observer-moments within a universe is finite.
Problem #2: Splitting Simulations
Consider a computer which is 2 atoms thick running a simulation of you. Suppose this computer can be divided down the middle into two 1 atom thick computers which would both run the same simulation independently. We are faced with an unfortunate dichotomy: either the 2 atom thick simulation has the same weight as two 1 atom thick simulations put together, or it doesn’t.
In the first case, we have to accept that some computer simulations count for more, even if they are running the same simulation (or we have to de-duplicate the set of all experiences, which leads to serious problems with Boltzmann brains). In this case, we are faced with the problem of comparing different substrates, and it seems impossible not to make arbitrary choices.
In the second case, we have to accept that the operation of dividing the 2 atom thick computer has moral value, which is even worse. Where exactly does the transition occur? What if each layer of the 2 atom thick computer can run independently before splitting? Is physical contact really significant? What about computers that aren’t physically coherent? What two 1 atom thick computers periodically synchronize themselves and self-destruct if they aren’t synchronized: does this synchronization effectively destroy one of the copies? I know of no way to accept this possibility without extremely counter-intuitive consequences.
UDASSA implies that simulations on the 2 atom thick computer count for twice as much as simulations on the 1 atom thick computer, because they are easier to specify. Given a description of one of the 1 atom thick computers, then there are two descriptions of equal complexity that point to the simulation running on the 2 atom thick computer: one description pointing to each layer of the 2 atom thick computer. When a 2 atom thick computer splits, the total number of descriptions pointing to the experience it is simulating doesn’t change.
Problem #3: The Born Probabilities
A quantum mechanical state can be described as a linear combination of “classical” configurations. For some reason we appear to experience ourselves as being in one of these classical configurations with probability proportional the coefficient of that configuration squared. These probabilities are called the Born probabilities, and are sometimes described either as a serious problem for MWI or as an unresolved mystery of the universe.
What happens if we apply UDASSA to a quantum universe? For one, the existence of an observer within the universe doesn’t say anything about conscious experience. We need to specify an algorithm for extracting a description of that observer from a description of the universe.
Consider the randomized algorithm A: compute the state of the universe at time t, then sample a classical configuration with probability proportional to its squared inner product with the universal wavefunction.
Consider the randomized algorithm B: compute the state of the universe at time t, then sample a classical configuration with probability proportional to its inner product with the universal wavefunction.
Using either A or B, we can describe a single experience by specifying a random seed, and picking out that experience within the classical configuration output by A or B using that random seed. If this is the shortest explanation of an experience, the probability of an experience is proportional to the number of random seeds which produce classical configurations containing it.
The universe as we know it is typical for an output of A but completely improbable as an output of B. For example, the observed behavior of stars is consistent with almost all observations weighted according to algorithm A, but with almost no observations weighted according to algorithm B. Algorithm A constitutes an immensely better description of our experiences, in the same sense that quantum mechanics constitutes an immensely better description of our experiences than classical physics.
You could also imagine an algorithm C, which uses the same selection as algorithm B to point to the Everett branch containing a physicist about to do an experiment, but then uses algorithm A to describe the experiences of the physicist after doing that experiment. This is a horribly complex way to specify an experience, however, for exactly the same reason that a Solomonoff inductor places very low probability on the laws of physics suddenly changing for just this one experiment.
Of course this leaves open the question of “why the Born probabilities and not some other rule?” Algorithm B is a valid way of specifying observers, though they would look exactly as foreign as observes with different rules of physics (Wei Dai has suggested that the structures specified by algorithm B are not even self-aware as justification for the Born rule). The fact that we are described by algorithm A rather than B is no more or less mysterious than the fact that the laws of physics are like so instead of some other way.
In the same way that we can retroactively justify our laws of physics by appealing to their elegance and simplicity (in a sense we don’t yet really understand) I suspect that we can justify selection according to algorithm A rather than algorithm B. In an infinite universe, algorithm B doesn’t even work (because the sum of the inner products of the universal wavefunction with the classical configurations is infinite) and even in a finite universe algorithm B necessarily involves the additional step of normalizing the probability distribution or else producing nonsense. Moreover, algorithm A is a nicer mathematical object than algorithm B when the evolution of the wavefunction is unitary, and so the same considerations that suggest elegant laws of physics suggest algorithm A over B (or some other alternative).
Note that this is not the core of my explanation of the Born probabilities; in UDASSA, choosing a selection procedure is just as important as describing the universe, and so some explicit sort of observer selection is a necessary part of the laws of physics. We predict the Born rule to hold in the future because it has held in the past, just like we expect the laws of physics to hold in the future because they have held in the past.
In summary, if you use Solomonoff induction to predict what you will see next based on everything you have seen so far, your predictions about the future will be consistent with the Born probabilities. You only get in trouble when you use Solomonoff induction to predict what the universe contains, and then get bogged down in the question “Given that the universe contains all of these observers, which one should I expect to be me?”
The Absolute Self-Selection Assumption
There are many confused discussions of anthropic reasoning, both on LW and in surprisingly mainstream literature. In this article I will discuss UDASSA, a framework for anthropic reasoning due to Wei Dai. This framework has serious shortcomings, but at present it is the only one I know which produces reasonable answers to reasonable questions; at the moment it is the only framework which I would feel comfortable using to make a real decision.
I will discuss 3 problems:
1. In an infinite universe, there are infinitely many copies of you (infinitely many of which are Boltzmann brains). How do you assign a measure to the copies of yourself when the uniform distribution is unavailable? Do you rule out spatially or temporally infinite universes for this reason?
2. Naive anthropics ignore the substrate on which a simulation is running and count how many instances of a simulated experience exist (or how many distinct versions of that experience exist). These beliefs are inconsistent with basic intuitions about conscious experience, so we have to abandon something intuitive.
3. The Born probabilities seem mysterious. They can be explained (as well as any law of physics can be explained) by UDASSA.
Why Anthropic Reasoning?
When I am trying to act in my own self-interest, I do not know with certainty the consequences of any particular decision. I compare probability distributions over outcomes: an action may lead to one outcome with probability 1⁄2, and a different outcome with probability 1⁄2. My brain has preferences between probability distributions built into it.
My brain is not built with the machinery to decide between different universes each of which contains many simulations I care about. My brain can’t even really grasp the notion of different copies of me, except by first converting to the language of probability distributions. If I am facing the prospect of being copied, the only way I can grapple with it is by reasoning “I have a 50% chance of remaining me, and a 50% chance of becoming my copy.” After thinking in this way, I can hope to intelligently trade-off one copy’s preferences against the other’s using the same machinery which allows me to make decisions with uncertain outcomes.
In order to perform this reasoning in general, I need a better framework for anthropic reasoning. What I want is a probability distribution over all possible experiences (or “observer-moments”), so that I can use my existing preferences to make intelligent decisions in a universe with more than one observer I care about.
I am going to leave many questions unresolved. I don’t understand continuity of experience or identity, so I am simply not going to try to be selfish (I don’t know how). I don’t understand what constitutes conscious experience, so I am not going to try and explain it. I have to rely on a complexity prior, which involves an unacceptable arbitrary choice of a notion of complexity.
The Absolute Self-Selection Assumption
A thinker using Solomonoff induction searches for the simplest explanation for its own experiences. It eventually learns that the simplest explanation for its experiences is the description of an external lawful universe in which its sense organs are embedded and a description of that embedding.
As humans using Solomonoff induction, we go on to argue that this external lawful universe is real, and that our conscious experience is a consequence of the existence of certain substructure in that universe. The absolute self-selection assumption discards this additional step. Rather than supposing that the probability of a certain universe depends on the complexity of that universe, it takes as a primitive object a probability distribution over possible experiences.
By the same reasoning that led a normal Solomonoff inductor to accept the existence of an external universe as the best explanation for its experiences, the least complex description of your conscious experience is the description of an external lawful universe and directions for finding the substructure embodying your experience within that substructure.
This requires specifying a notion of complexity. I will choose a universal computable distribution over strings for now, to mimic conventional Solomonoff induction as closely as possible (and because I know nothing better). The resulting theory is called UDASSA, for Universal Distribution + ASSA.
Recovering Intuitive Anthropics
Suppose I create a perfect copy of myself. Intuitively, I would like to weight the two copies equally. Similarly, my anthropic notion of “probability of an experience” should match up with my intuitive notion of probability. Fortunately, UDASSA recovers intuitive anthropics in intuitive situations.
The shortest description of me is a pair (U, x), where U is a description of my universe and x is a description of where to find me in that universe. If there are two copies of me in the universe, then the experience of each can be described in the same way: (U, x1) and (U, x2) are descriptions of approximately equal complexity, so I weight the experience of each copy equally. The total experience of my copies is weighted twice as much as the total experience of an uncopied individual.
Part of x is a description of how to navigate the randomness of the universe. For example, if the last (truly random) coin I saw flipped came up heads, then in order to specify my experiences you need to specify the result of that coin flip. An equal number of equally complex descriptions point to the version of me who saw heads and the version of me who saw tails.
Problem #1: Infinite Cosmologies
Modern physics is consistent with infinite universes. An infinite universe contains infinitely many observers (infinitely many of which share all of your experiences so far), and it is no longer sensible to talk about the “uniform distribution” over all of them. You could imagine taking a limit over larger and larger volumes, but there is no particular reason to suspect such a limit would converge in a meaningful sense. One solution that has been suggested is to choose an arbitrary but very large volume of spacetime, and to use a uniform distribution over observers within it. Another solution is to conclude that infinite universes can’t exist. Both of these explanations are unsatisfactory.
UDASSA provides a different solution. The probability of an experience depends exponentially on the complexity of specifying it. Just existing in an infinite universe with a short description does not guarantee that you yourself have a short description; you need to specify a position within that infinite universe. For example, if your experiences occur 34908172349823478132239471230912349726323948123123991230 steps after some naturally specified time 0, then the (somewhat lengthy) description of that time is necessary to describe your experiences. Thus the total measure of all observer-moments within a universe is finite.
Problem #2: Splitting Simulations
Consider a computer which is 2 atoms thick running a simulation of you. Suppose this computer can be divided down the middle into two 1 atom thick computers which would both run the same simulation independently. We are faced with an unfortunate dichotomy: either the 2 atom thick simulation has the same weight as two 1 atom thick simulations put together, or it doesn’t.
In the first case, we have to accept that some computer simulations count for more, even if they are running the same simulation (or we have to de-duplicate the set of all experiences, which leads to serious problems with Boltzmann brains). In this case, we are faced with the problem of comparing different substrates, and it seems impossible not to make arbitrary choices.
In the second case, we have to accept that the operation of dividing the 2 atom thick computer has moral value, which is even worse. Where exactly does the transition occur? What if each layer of the 2 atom thick computer can run independently before splitting? Is physical contact really significant? What about computers that aren’t physically coherent? What two 1 atom thick computers periodically synchronize themselves and self-destruct if they aren’t synchronized: does this synchronization effectively destroy one of the copies? I know of no way to accept this possibility without extremely counter-intuitive consequences.
UDASSA implies that simulations on the 2 atom thick computer count for twice as much as simulations on the 1 atom thick computer, because they are easier to specify. Given a description of one of the 1 atom thick computers, then there are two descriptions of equal complexity that point to the simulation running on the 2 atom thick computer: one description pointing to each layer of the 2 atom thick computer. When a 2 atom thick computer splits, the total number of descriptions pointing to the experience it is simulating doesn’t change.
Problem #3: The Born Probabilities
A quantum mechanical state can be described as a linear combination of “classical” configurations. For some reason we appear to experience ourselves as being in one of these classical configurations with probability proportional the coefficient of that configuration squared. These probabilities are called the Born probabilities, and are sometimes described either as a serious problem for MWI or as an unresolved mystery of the universe.
What happens if we apply UDASSA to a quantum universe? For one, the existence of an observer within the universe doesn’t say anything about conscious experience. We need to specify an algorithm for extracting a description of that observer from a description of the universe.
Consider the randomized algorithm A: compute the state of the universe at time t, then sample a classical configuration with probability proportional to its squared inner product with the universal wavefunction.
Consider the randomized algorithm B: compute the state of the universe at time t, then sample a classical configuration with probability proportional to its inner product with the universal wavefunction.
Using either A or B, we can describe a single experience by specifying a random seed, and picking out that experience within the classical configuration output by A or B using that random seed. If this is the shortest explanation of an experience, the probability of an experience is proportional to the number of random seeds which produce classical configurations containing it.
The universe as we know it is typical for an output of A but completely improbable as an output of B. For example, the observed behavior of stars is consistent with almost all observations weighted according to algorithm A, but with almost no observations weighted according to algorithm B. Algorithm A constitutes an immensely better description of our experiences, in the same sense that quantum mechanics constitutes an immensely better description of our experiences than classical physics.
You could also imagine an algorithm C, which uses the same selection as algorithm B to point to the Everett branch containing a physicist about to do an experiment, but then uses algorithm A to describe the experiences of the physicist after doing that experiment. This is a horribly complex way to specify an experience, however, for exactly the same reason that a Solomonoff inductor places very low probability on the laws of physics suddenly changing for just this one experiment.
Of course this leaves open the question of “why the Born probabilities and not some other rule?” Algorithm B is a valid way of specifying observers, though they would look exactly as foreign as observes with different rules of physics (Wei Dai has suggested that the structures specified by algorithm B are not even self-aware as justification for the Born rule). The fact that we are described by algorithm A rather than B is no more or less mysterious than the fact that the laws of physics are like so instead of some other way.
In the same way that we can retroactively justify our laws of physics by appealing to their elegance and simplicity (in a sense we don’t yet really understand) I suspect that we can justify selection according to algorithm A rather than algorithm B. In an infinite universe, algorithm B doesn’t even work (because the sum of the inner products of the universal wavefunction with the classical configurations is infinite) and even in a finite universe algorithm B necessarily involves the additional step of normalizing the probability distribution or else producing nonsense. Moreover, algorithm A is a nicer mathematical object than algorithm B when the evolution of the wavefunction is unitary, and so the same considerations that suggest elegant laws of physics suggest algorithm A over B (or some other alternative).
Note that this is not the core of my explanation of the Born probabilities; in UDASSA, choosing a selection procedure is just as important as describing the universe, and so some explicit sort of observer selection is a necessary part of the laws of physics. We predict the Born rule to hold in the future because it has held in the past, just like we expect the laws of physics to hold in the future because they have held in the past.
In summary, if you use Solomonoff induction to predict what you will see next based on everything you have seen so far, your predictions about the future will be consistent with the Born probabilities. You only get in trouble when you use Solomonoff induction to predict what the universe contains, and then get bogged down in the question “Given that the universe contains all of these observers, which one should I expect to be me?”