I think I can explain why we might expect an UDT agent to avoid these problems. You’re probably already familiar with the argument at this level, but I haven’t seen it written up anywhere yet.
First, we’ll describe (informally) an UDT agent as a mathematical object. The preferences of the agent are built in (so no reward channel, which allows us to avoid preference solipsism). It will also have models of every possible universe, and also an understanding of its own mathematical structure. To make a decision given a certain input, it will scan each universe model for structures that will be logically dependent on its output. It will then predict what will happen in each universe for each particular output. Then, it will choose the output that maximizes its preferences.
Now let’s see why it won’t have the immortality problem. Let’s say the agent is considering an output string corresponding to an anvil experiment. After running the predictions of this in its models, it will realize that it will lose a significant amount of structure which is logically dependent on it. So unless it has very strange preferences, it will mark this outcome as low utility, and consider better options.
Similarly, the agent will also notice that some outputs correspond to having more structures which are logically dependent on it. For example, an output that built a faster version of an UDT agent would allow more things to be affected by future outputs. In other words, it would be able to self-improve.
To actually implement an UDT agent with these preferences, we just need to create something (most likely a computer programmed appropriately) that will be logically dependent on this mathematical object to a sufficiently high degree. This, of course, is the hard part, but I don’t see any reasons why a faithful implementation might suddenly have these specific problems again.
Another nice feature of UDT (which sometimes is treated as a bug) is that it is extremely flexible in how you can choose the utility function. Maybe you Just Don’t Care about worlds that don’t follow the Born probabilities—so just ignore anything that happens in such a universe in your utility function. I interpret this as meaning that UDT is a framework decision theory that could be used regardless of what the answers (or maybe just preferences) to anthropics, induction or other such things end up being.
Oh, and if anyone notices something I got wrong, or that I seem to be missing, please let me know—I want to understand UDT better :)
It will also have models of every possible universe, and also an understanding of its own mathematical structure. To make a decision given a certain input, it will scan each universe model for structures that will be logically dependent on its output. It will then predict what will happen in each universe for each particular output. Then, it will choose the output that maximizes its preferences.
Apologies if this is a stupid question—I am not an expert—but how do we know what “level of reality” to have our UDT-agent model its world-models with? That is, if we program the agent to produce and scan universe-models consisting of unsliced representations of quark and lepton configurations, what happens if we discover that quarks and leptons are composed of more elementary particles yet?
Wei Dai has suggested that the default setting for a decision theory be Tegmark’s Level 4 Multiverse—where all mathematical structures exist in reality. So a “quark—lepton” universe and a string theory universe would both be considered among the possible universes—assuming they are consistent mathematically.
Of course, this makes it difficult to specify the utility function.
To elaborate on “the preferences of the agent are built in”, that means the agent is coded with a description of a large but fixed mathematical formula with no free variables, and wants the value of that formula to be as high as possible. That doesn’t make much sense in simple cases like “I want the value of 2+2 to be as high as possible”, but it works in more complicated cases where the formula contains instances of the agent itself, which is possible by quining.
To elaborate on why “scanning each universe model for structures that will be logically dependent on its output” doesn’t need bridging laws, let’s note that it can be viewed as theorem proving. The agent might look for easily provable theorems of the form “if my mathematical structure has a certain input-output map, then this particular universe model returns a certain value”. Or it could use some kind of approximate logical reasoning, but in any case it wouldn’t need explicit bridging laws.
I think I can explain why we might expect an UDT agent to avoid these problems. You’re probably already familiar with the argument at this level, but I haven’t seen it written up anywhere yet.
First, we’ll describe (informally) an UDT agent as a mathematical object. The preferences of the agent are built in (so no reward channel, which allows us to avoid preference solipsism). It will also have models of every possible universe, and also an understanding of its own mathematical structure. To make a decision given a certain input, it will scan each universe model for structures that will be logically dependent on its output. It will then predict what will happen in each universe for each particular output. Then, it will choose the output that maximizes its preferences.
Now let’s see why it won’t have the immortality problem. Let’s say the agent is considering an output string corresponding to an anvil experiment. After running the predictions of this in its models, it will realize that it will lose a significant amount of structure which is logically dependent on it. So unless it has very strange preferences, it will mark this outcome as low utility, and consider better options.
Similarly, the agent will also notice that some outputs correspond to having more structures which are logically dependent on it. For example, an output that built a faster version of an UDT agent would allow more things to be affected by future outputs. In other words, it would be able to self-improve.
To actually implement an UDT agent with these preferences, we just need to create something (most likely a computer programmed appropriately) that will be logically dependent on this mathematical object to a sufficiently high degree. This, of course, is the hard part, but I don’t see any reasons why a faithful implementation might suddenly have these specific problems again.
Another nice feature of UDT (which sometimes is treated as a bug) is that it is extremely flexible in how you can choose the utility function. Maybe you Just Don’t Care about worlds that don’t follow the Born probabilities—so just ignore anything that happens in such a universe in your utility function. I interpret this as meaning that UDT is a framework decision theory that could be used regardless of what the answers (or maybe just preferences) to anthropics, induction or other such things end up being.
Oh, and if anyone notices something I got wrong, or that I seem to be missing, please let me know—I want to understand UDT better :)
Apologies if this is a stupid question—I am not an expert—but how do we know what “level of reality” to have our UDT-agent model its world-models with? That is, if we program the agent to produce and scan universe-models consisting of unsliced representations of quark and lepton configurations, what happens if we discover that quarks and leptons are composed of more elementary particles yet?
Wei Dai has suggested that the default setting for a decision theory be Tegmark’s Level 4 Multiverse—where all mathematical structures exist in reality. So a “quark—lepton” universe and a string theory universe would both be considered among the possible universes—assuming they are consistent mathematically.
Of course, this makes it difficult to specify the utility function.
Yeah, your explanation sounds right.
To elaborate on “the preferences of the agent are built in”, that means the agent is coded with a description of a large but fixed mathematical formula with no free variables, and wants the value of that formula to be as high as possible. That doesn’t make much sense in simple cases like “I want the value of 2+2 to be as high as possible”, but it works in more complicated cases where the formula contains instances of the agent itself, which is possible by quining.
To elaborate on why “scanning each universe model for structures that will be logically dependent on its output” doesn’t need bridging laws, let’s note that it can be viewed as theorem proving. The agent might look for easily provable theorems of the form “if my mathematical structure has a certain input-output map, then this particular universe model returns a certain value”. Or it could use some kind of approximate logical reasoning, but in any case it wouldn’t need explicit bridging laws.