I think UDT2 also correctly solves Gary’s Agent-Simulates-Predictor problem and my “two more challenging Newcomb variants”. (I’ll skip the details unless someone asks.) I applied Gary’s trick of converting multi-agent problems into Newcomb variants to come up with two more single-agent problems that UDT1 (and perhaps Nesov’s formulation of UDT as well) does badly on.
I think UDT2 also correctly solves Gary’s Agent-Simulates-Predictor problem and my “two more challenging Newcomb variants”. (I’ll skip the details unless someone asks.)
I applied Gary’s trick of converting multi-agent problems into Newcomb variants to come up with two more single-agent problems that UDT1 (and perhaps Nesov’s formulation of UDT as well) does badly on.
I’m curious about both of these, as well who Gary is.
The differences between this and UDT2:
This is something we can define precisely, whereas UDT2 isn’t.
Rather than being totally updateless, this is just mostly updateless, with the parameter $f$ determining how updateless it is.
I don’t think there’s a problem this gets right which we’d expect UDT2 to get wrong.
If we’re using the version of logical induction where the belief jumps to 100% as soon as something gets proved, then a weighty trader who believes crossing the bridge is good will just get knocked out immediately if the theorem prover starts proving that crossing is bad (which helps that step inside the Löbian proof go through). (I’d be surprised if the analysis turns out much different for the kind of LI which merely rapidly comes to believe things which get proved, but I can see how that distinction might block the proof.) But certainly it would be good to check this more thoroughly.
Without reading closely, this seems very close to UDT2. Is there a problem that this gets right which UDT2 gets wrong (or for which there is ambiguity about the specification of UDT2?)
Without thinking too carefully, I don’t believe the troll bridge argument. We have to be super careful about “sufficiently large,” and about Lob’s theorem. To see whether the proof goes through, it seems instructive to consider the case where a trader with 90% of the initial mass really wants to cross the bridge. What happens when they try?
This seems very similar to what I named “UDT2” on the decision theory mailing list.
I never really got why UDT2 wasn’t just a special case of UDT1, in which the set of outputs was restricted to outputs of the form “Turn into program T at time t”. (This was Vladimir Nesov’s immediate response on the mailing list.) I suppose that there should also be a corresponding “UDT2.1″, in which the agent instead chooses among all input-output maps mapping inputs to outputs of the form “Turn into program T at time t”.
replacing vague as crap TDT with less-vague UDT and UDT2
As I’ve figured out while writing the last few posts, TDT hasn’t been explained well, but it is a genuinely formalizable theory. (You’ll have to trust me until Part III or check the decision-theory mailing list.) But it’s a different theory from ADT and UDT, and the latter ones are preferable.
This does seem to be the “obvious” next step in the UDT approach. I proposed something similar as “UDT2″ in a 2011 post to the “decision theory workshop” mailing list, and others have made similar proposals.
But there is a problem with having to choose how much time/computing resources to give to the initial decision process. If you give it too little then its logical probabilities might be very noisy and you could end up with a terrible decision, but if you give it too much then it could update on too many logical facts and lose on acausal bargaining problems. With multiple AI builders, UDT2 seems to imply a costly arms-race situation where each has an incentive to give their initial decision process less time (than would otherwise be optimal) so that their AI could commit faster (and hopefully be logically updated upon by other AIs) and also avoid updating on other AI’s commitments.
I’d like to avoid this but don’t know how. I’m also sympathetic to Nesov’s (and others such as Gary Drescher’s) sentiment that maybe there is a better approach to the problems that UDT is trying to solve, but I don’t know what that is either.
Would you rather be tortured for 3^^^3 years, or have a dust speck in your eye?
If I use UDT2 can I choose ‘both’?
A logically updateless agent will, when it increases expected utility, commit to carrying out a certain policy early in logical time, so that predictors with at least a small amount of compute will know that the agent will carry out this policy. See also: UDT2, Policy selection solves most problems.
ASP doesn’t seem impossible to solve (in the sense of having a decision theory that handles it well and not at the expense of doing poorly on other problems) so why define a class of “fair” problems that excludes it? (I had an idea that I called UDT2 which I think does better on it than UDT1.1 but it’s not as elegant as I hoped.) Defining such problem classes may be useful for talking about the technical properties of specific decision theories, but that doesn’t seem to be what you’re trying to do here. The only other motivation I can think of is finding a way to justify not solving certain problems, but I don’t think that makes sense in the case of ASP.
As far as I am aware this crap isn’t in development. It isn’t the highest research priority so the other SingInst researchers haven’t been working on it much and Eliezer himself is mostly focused on writing a rationality book. Other things like decision theory are being worked on—which has involved replacing vague as crap TDT with less-vague UDT and UDT2.
I would like to see more work published on CEV. The most recent I am familiar with is this.
I think this might be a situation where people tend to leave the debate and move on to something else when they seem to have found a satisfactory position
Well not exactly, I came up with UDASSA originally but found it not entirely satisfactory, so I moved on to something that eventually came to be called UDT. I wrote down my reasons at against UD+ASSA and under Paul’s post.
Perhaps it would be good to have this history be more readily available to people looking for solutions to anthropic reasoning though, if you guys have suggestions on how to do that.
We should look at this problem and think, ”I want to output A or B, but in such a way that has the side effect that the other copy of me outputs B or A respectively.” S could search through functions considering their output on input 1 and the side effects of that function. S might decide to run the UDT 1.1 algorithm, which would have the desired result.
This seems very similar to what I named “UDT2” on the decision theory mailing list. Here’s how I described it:
How to formulate UDT2 more precisely is not entirely clear yet. Assuming the existence of a math intuition module which runs continuously to refine its logical uncertainties, one idea is to periodically interrupt it, and during the interrupt, ask it about the logical consequences of statements of the form “S, upon input X, becomes T at time t” for all programs T and t being the time at the end of the current interrupt. At the end of the interrupt, return T(X) for the T that has the highest expected utility according to the math intuition module’s “beliefs”. (One of these Ts should be equivalent to “let the math intuition module run for another period and ask again later”.)
So aside from the unfortunately terminology, I think you’re probably going in the right direction.
Suppose you’re currently running a decision theory that would “take the whole pie” in this situation. Now what if Omega first informed you of the setup without telling you what the millionth digit of pi is, and gave you a chance to self-modify? And suppose you don’t have enough computing power to compute the digit yourself at this point. Doesn’t it seems right to self-modify into someone who would give control of the universe to the staples maximizer, since that gives you 1⁄2 “logical” probability of 10^20 paperclips instead of 1⁄2 “logical” probability of 10^10 paperclips? What is wrong with this reasoning? And if it is wrong, both UDT1 and UDT2 are wrong since UDT1 would self-modify and UDT2 would give control to the staples maximizer without having to self-modify, so what’s the right decision theory?
When a ceedeetee agent met a 9-bot, she would reason causally: “Well, the other agent is going to name 9, so I had better name 1 if I want any payoff at all!”
How does a ceedeetee agent tell what kind of opponent they’re facing, and what prevents ceedeetee agents from evolving to or deciding to hide such externally visible differences?
Depending on such details, there are situations where TDT/UDT/FDT seemingly does worse than CDT. See this example (a variant of 2TDT-1CDT) from cousin_it:
Imagine two parallel universes, both containing large populations of TDT agents. In both universes, a child is born, looking exactly like everyone else. The child in universe A is a TDT agent named Alice. The child in universe B is named Bob and has a random mutation that makes him use CDT. Both children go on to play many blind PDs with their neighbors. It looks like Bob’s life will be much happier than Alice’s, right?
More tangentially, the demand game is also one that UDT 1.x loses to a human, because of “unintentional simulation”.
UDT can update in that way, in practice (you need that, to avoid Dutch Books). It just doesn’t have a position on the anthropic probability itself, just on the behaviour under evidence update.
UDT may indeed be an aspect of bridging laws. The reason I’m not willing to call it a full solution is as follows:
1) Actually, the current version of UDT that I write down as an equation involves maximizing over maps from sensory sequences to actions. If there’s a version of UDT that maximizes over something else, let me know.
2) We could say that it ought to be obvious to the math intuition module that choosing a map R := S->A ought to logically imply that R^ = S^->A for simple isomorphisms over sensory experience for isomorphic reductive hypotheses, thereby eliminating a possible degree of freedom in the bridging laws. I agree in principle. We don’t actually have that math intuition module. This is a problem with all logical decision theories, yes, but that is a problem.
3) Aspects of the problem like “What prior space of universes?” aren’t solved by saying “UDT”. Nor, “How exactly do you identify processes computationally isomorphic to yourself inside that universe?” Nor, “How do you manipulate a map which is smaller than the territory where you don’t reason about objects by simulating out the actual atoms?” Nor very much of, “How do I modify myself given that I’m made of parts?”
There’s an aspect of UDT that plausibly answers one particular aspect of “How do we do naturalized induction?”, especially a particular aspect of how we write bridging laws, and that’s exciting, but it doesn’t answer what I think of as the entire problem, including the problem of the prior over universes, multilevel reasoning about physical laws and high-level objects, the self-referential aspects of the reasoning, updating in cases where there’s no predetermined Cartesian boundary of what constitutes the senses, etc.
UDT doesn’t handle non-base-level maximization vantage points (previously “epistemic vantage points”) for blackmail—you can blackmail a UDT agent because it assumes your strategy is fixed, and doesn’t realize you’re only blackmailing it because you’re simulating it being blackmailable. As currently formulated UDT is also non-naturalistic and assumes the universe is divided into a not-you environment and a UDT algorithm in a Cartesian bubble, which is something TDT is supposed to be better at (though we don’t actually have good fill-in for the general-logical-consequence algorithm TDT is supposed to call).
I expect the ultimate theory to look more like “TDT modded to handle UDT’s class of problems and blackmail and anything else we end up throwing at it” than “UDT modded to be naturalistic and etc”, but I could be wrong—others have different intuitions about this.
UDT doesn’t give us conceptual tools for dealing with multiagent coordination problems.
I think there’s no best player of multiplayer games. Or rather, choosing the best player depends on what other players exist in the world, and that goes all the way down (describing the theory of choosing the best player also depends on what other players exist, and so on).
Of course that doesn’t mean UDT is the best we can do. We cannot solve the whole problem, but UDT carves out a chunk, and we can and should try to carve out a bigger chunk.
For me the most productive way has been to come up with crisp toy problems and try to solve them. (Like ASP, or my tiling agents formulation.) Your post makes many interesting points; I’d love to see crisp toy problems for each of them!
UDT is totally supposed to smoke on the smoking lesion problem. That’s kinda the whole point of TDT, UDT, and all the other theories in the family.
It seems to me that your high-stakes predictor case is adequately explained by residual uncertainty about the scenario setup and whether Omega actually predicts you perfectly, which will yield two-boxing by TDT in this case as well. Literal, absolute epistemic certainty will lead to one-boxing, but this is a degree of certainty so great that we find it difficult to stipulate even in our imaginations.
I ought to steal that “stick of chewing gum vs. a million children” to use on anyone who claims that the word of the Bible is certain, but I don’t think I’ve ever met anyone in person who said that.
UDT is basically the bare definition of reflective consistency: it is a non-solution, just statement of the problem in constructive form. UDT says that you should think exactly the same way as the “original” you thinks, which guarantees that the original you won’t be disappointed in your decisions (reflective consistency). It only looks good in comparison to other theories that fail this particular requirement, but otherwise are much more meaningful in their domains of application.
TDT fails reflective consistency in general, but offers a correct solution in a domain that is larger than those of other practically useful decision theories, while retaining their expressivity/efficiency (i.e. updating on graphical models).