A Possible Decision Theory for Many Worlds Living
Hey LessWrong! I may have gone in over my head as I am not well-versed in decision theory literature, but I tentatively believe I have a new decision theory for decision-making in a MWI universe. Let me know what you think!
Originally posted at: https://www.evanward.org/a-decision-theory-for-many-worlds-living/
Here, I describe a decision theory that I believe applies to Many-Worlds living that combines principles of quantum mechanical randomness, evolutionary theory, and choice-worthiness. Until someone comes up with a better term for it, I will refer to it as Random Evolutionary Choice-worthy Many-worlds Decisions Theory, or RECMDT.
If the Many World’s Interpretation (MWI) of quantum mechanics is true, does that have any ethical implications? Should we behave any differently in order to maximize ethical outcomes? This is an extremely important question that I’m not aware has been satisfactorily answered. If MWI is true and if we can affect the distribution of worlds through our actions, it means that our actions have super-exponentially more impact on ethically relevant phenomena. I take ethically relevant phenomena to be certain fundamental physics operations responsible for the suffering and well-being associated with the minds of conscious creatures.
We ought to make decisions probabilistically based on sources of entropy which correspond with the splitting of worlds (e.g. particle decay) and the comparative choice-worthiness of different courses of action (CoA). By choice-worthiness, I mean a combination of the subjective degree of normative uncertainty and expected utility of a CoA. I will go into determining choice-worthiness in another post.
If one CoA is twice as choice-worthy as another, then I argue that we should commit to doing that CoA with 2:1 odds or 66% of the time based on radioactive particle decay.
Under a single unfolding of history, the traditional view is that we should choose whichever CoA available to us which has the highest choice-worthiness. When presented with a binary decision, the thought is that we should choose the most choice-worthy option given the sum of evidence every single time. However, the fact that a decision is subjectively choice-worthy does not mean it is guaranteed to actually be the right decision—it could actually move us towards worse possible worlds. If we think we are living in a single unfolding of history but are actually living under MWI, then a significant subset of the 3↑↑↑3+ (but a finite number) of existing worlds end up converging on similar futures, which are by no means destined to be good.
However, if we are living in a reality of constantly splitting worlds, I assert that it is in everyone’s best interest to increase the variance of outcomes in order to more quickly move towards either a utopia or extinction. This essentially increases evolutionary selection pressure that child worlds experience so that they either more quickly become devoid of conscious life or more quickly converge on worlds that are utopian.
As a rough analogy, imagine having a planet covered with trillions of identical, simple microbes. You want them to evolve towards intelligent life that experiences much more well-being. You could leave these trillions of microbes alone and allow them to slowly incur gene edits so that some of their descendants drift towards more intelligent/evolved creatures. However, if you had the option, why not just increase the rate of the gene edits, by say, UV exposure? This will surely push up the timeline for intelligence and well-being and allow a greater magnitude of well-being to take place. Each world under MWI is like a microbe, and we might as well increase the variance, and thus, evolutionary selection pressure in order to help utopias happen as soon and as abundantly as possible.
What this Theory Isn’t
A key component of this decision heuristic is not maximizing chaos and treating different CoAs equally, but choosing CoAs relative to their choice-worthiness. For example, in a utopian world with, somehow, 99% of the proper CoAs figured out, only in 1 out of 100 child worlds must a less choice worthy course of action be taken. In other words, once we get confident in particular CoA, we can take that action the majority of the time. After all, the goal isn’t for 1 world to end up hyper-utopian, but to maximize utility over all worlds.
If we wanted just a single world to end up hyper utopian, then we want to act in as many possible ways based on the results of true sources of entropy. It would be ideal to come up with any and flip a (quantum) coin and go off its results like Two-Face. Again, the goal is to maximize utility over all worlds, so we only want to explore paths in proportion to the odds that we think a particular path is optimal.
Is it Incrementally Useful?
A key component of most useful decision theories is that they are useful insofar as they are followed. As long as MWI is true, each time RECMDT is deliberately adhered to, it is supposed to increase the variance of child worlds. Following this rule just once, depending on the likelihood of worlds becoming utopian relative to the probability of them being full of suffering, likely ensures many future utopias will exist.
While RECMDT should increase the variance and selection pressure on any child worlds of worlds that implement it, we do not know enough about the likelihood and magnitude of suffering at an astronomical level to guarantee that the worlds that remain full of life will overwhelmingly tend to be net-positive in subjective well-being. It could be possible that worlds with net-suffering are very stable and do not tend to approach extinction. The merit of RECMDT may largely rest on the landscape of energy-efficiency of suffering as opposed to well-being. If suffering is very energy inefficient compared to well-being, then that is good evidence in favor of this theory. I will write more about the implications of the energy-efficiency of suffering soon.
Is RECMDT Safer if Applied Only with Particular Mindsets?
One way to hedge against astronomically bad outcomes may be to only employ RECMDT when one fully understands and is committed to ensuring that survivability remains dependent on well-being. This works because following this decision theory essentially increases the variance of child worlds like using birdshot instead of a slug. If one employs this heuristic only while having a firm belief and commitment to a strong heuristic to reduce the probability of net-suffering worlds, then it seems that yourself in child worlds will also have this belief and be prepared to act on it. You can also only employ RECMDT while you believe in your ability to take massive-action on behalf of your belief that survivability should remain dependant on well-being. Whenever you feel unable to carry out this value, you should perhaps not act to increase the variance of child worlds because you will not be prepared to deal with the worst-case scenarios in those child worlds.
Evidence against applying RECMDT only when one holds certain values strongly, however, is all the Nth-order effects of our actions. For decisions that have extremely localized effects where one’s beliefs dominate the ultimate outcome, the plausible value of RECMDT over not applying it is rather small.
For decision with many Nth order effects, such as deciding which job to take (which, for example, has many unpredictable effects on the economy), it seems that one cannot control for the majority of the effects of one’s actions after an initial decision is made. The ultimate effects likely rest on features of our universe (e.g. the nature of human market economies in our local group of many-worlds) that one’s particular belief has little influence over. In other words, for many decisions, one can affect the world once, but they cannot control the Nth order effects through acting a second time. Thus, while certain mindsets are useful to hold dearly regardless of whether one employs RECMDT, it seems that it is not generally useful for one to not employ RECMDT if they are not holding any particular mindsets.
Converting Radioactive Decay to Random Bit Strings
In order to implement this decision theory, agents much require access to a true source of entropy—pseudo-random number generators will NOT work. There are a variety of ways to implement this, such as by having an array of Geiger counters surrounding a radioactive isotope and looking at which groups of sensors get triggered first in order to yield a decision. However, I suspect one of the cheapest and most reliably random sensors would be built to implement the following algorithm from HotBits:
Since the time of any given decay is random, then the interval between two consecutive decays is also random. What we do, then, is measure a pair of these intervals, and emit a zero or one bit based on the relative length of the two intervals. If we measure the same interval for the two decays, we discard the measurement and try again, to avoid the risk of inducing bias due to the resolution of our clock.from HotBits
Converting Random Bit Strings to Choices
We have a means above to generate truly random bit strings that should differ between child worlds. The next question is how do we convert these bit strings to choices regarding which CoA we will execute? This depends on the number of CoAs we were considering and the specific ratios that we arrived at for comparative choice-worthiness. We simply need to determine the least common multiple of all the individual odds of each CoA, and acquire a bit string that is long enough that its representation as a binary number is higher than the least common multiple. From there, we can use a simple preconceived encoding scheme to have the base 2 number encoded in the bit string select for a particular course of action.
For example, in a scenario where one CoA is 4x as choice-worthy as another, we need a random number that represents the digits 0 to 4 equally. Drawing the number 4 can mean we must do the less-choice worthy CoA, and drawing 0-3 can mean we do the more choice-worth CoA. We need at least 3 random bits in order to do this. Since 2^3 is 8 and there is no way to divide the states 5, 6, 7 equally to the states 0, 1, 2, 3, and 4, we cannot use this bit string if it is over 4, and must acquire another one until we acquire a number under 4. Once we select a bitstring with a number below our least-common-multiple, we can use the value of the bit string to select our course of action.
The above selection method prevents us from having to make any rounding errors, and it shouldn’t take that many bits to implement as any given bit string of the proper length always has over a 50% chance of working out. Other encoding schemes introduce rounding errors, which only detract from the uncertainty of our choice-worthiness calculations.
What Does Application Look Like?
I think everyone with solid choice-worthy calibrating ability should have access to truly random bits to choose courses of action from.
Importantly, the time of the production of these random bits is relevant. A one-year-old random bitstring captured from radioactivity is just as random as one captured 5 seconds ago, but employing the latter is key for ensuring the maximum number of recent sister universes make different decisions.
Thus, people need access to recently created bit strings. These could be from a portable, personal Gieger counter, but it could also be from a centralized Gieger counter in say, the middle of the United States. The location does not matter as much as the recency of bit production. Importantly, however, bit strings should not ever be reused as this is not as random as using new bit strings as whatever made you decide to reuse them is non-random information.
Can We Really Affect the Distribution of Other Worlds through Our Actions?
One may think that since everything is quantum mechanics including our brains, can we really affect the distribution of child worlds from our intentions and decisions? This raises the classic problem of free will and our place in a deterministic universe. I think the simplest question to ask is: do our choices have an effect on ethically-relevant phenomena? If the answer is no, then why should we care about decision theory in general? I think it’s useful to think of the answer as yes.
What if Many Worlds Isn’t True?
If MWI isn’t true, then RECMDT optimizes for worlds that will not exist at the potential cost to our own. This may seem to be incredibly dangerous and costly. However, as long as people make accurate choice-worthiness comparisons between different CoAs, then I will actually argue that adhering to RECMDT is not that risky. After all, choice-worthiness is distinct from expected-utility.
It would be a waste to have people, in a binary choice of actions with one having 9x more expected-utility than the other, choose the action with less expected-utility even 10% of the time. However, it seems best, even in a single unfolding of history, that where we are morally uncertain, we should actually cycle through actions based on our moral uncertainty via relative choice-worthiness.
By always acting to maximize choice-worthiness, we risk not capturing any value at all through our actions. While I agree that we should maximize expected-utility in both one shot and iterative scenarios alike and be risk neutral assuming we adequately defined our utility function, I think that given the fundamental uncertainty at play in a normative uncertainty assessment, it is risk neutral to probabilistically decide to implement different CoAs relative to their comparative choice-worthiness. Importantly, this is only the ideal method if the CoAs are mutually exclusive—if they are not, one might as well optimize for both moral frameworks.
Hence, while I think RECMDT is true, I also think that even if MWI is proven false, a decision theory exists which combines randomness and relative choice-worthiness. Perhaps we can call this Random Choice-worthy Decision Theory, or RCDT.
Thanks for reading. Let me know what you think of this!