Eliezer Yudkowsky comments on Towards a New Decision Theory

Eliezer Yudkowsky 13 Aug 2009 12:50 UTC
2 points
0
2) The key problem in Drescher’s(?) Counterfactual Mugging is that after you actually see the coinflip, your posterior probability of “coin comes up heads” is no longer 0.5 - so if you compute the answer after seeing the coin, the answer is not the reflectively consistent one. I still don’t know how to handle this—it’s not in the class of problems to which my TDT corresponds.

Please note that the problem persists if we deal in a non-quantum coin, like an unknown binary digit of pi.
- Steve_Rayhawk 15 Aug 2009 18:38 UTC
  10 points
  0
  Parent
  I thought the answer Vladimir Nesov already posted solved Counterfactual Mugging for a quantum coin?
  
  Basically, all the local decisions come from the same computation that would be performed to set the most general precommitment for all possible states of the world. The expected utility maximization is defined only once, on the global state space, and then the actual actions only retrieve the global solution, given encountered observations. The observations don’t change the state space over which the expected utility optimization is defined (and don’t change the optimal global solution or preference order on the global solutions), only what the decisions in a given (counterfactual) branch can affect. Since the global precommitment is the only thing that defines the local agents’ decisions, the “commitment” part can be dropped, and the agents’ actions can just be defined to follow the resulting preference order.
  
  In this solution, there is no belief updating; there is just decision theory. (All probabilities are “timestamped” to the beliefs of the agent’s creator when the agent was created.) This means that the use of Bayesian belief updating with expected utility maximization may be just an approximation that is only relevant in special situations which meet certain independence assumptions around the agent’s actions. In the more general Newcomb-like family of situations, computationally efficient decision algorithms might use a family of approximations more general than Bayesian updating.
  
  There would, for example, be no such thing as “posterior probability of ‘coin comes up heads’” or “probability that you are a Boltzmann brain”; there would only be a fraction of importance-measure that brains with your decision algorithm could affect. As Vladimir Nesov commented:
  
  Agents self-consistent under reflection are counterfactual zombies, indifferent to whether they are real or not.
  
  Anna and I noticed this possible decision rule around four months before Vladimir posted it (with “possible observations” replaced by “partial histories of sense data and actions”, and also some implications about how to use limited computing power on “only what the decisions in a given (counterfactual) branch can affect” while still computing predicted decisions on one’s other counterfactual branches well enough to coordinate with them). But we didn’t write it up to a polished state, partly because we didn’t think it seemed enough like it was the central insight in the area. Mostly, that was because this decision rule doesn’t explain how to think about any logical paradoxes of self-reference, such as algorithms that refer to each others’ output. It also doesn’t explain how to think about logical uncertainty, such as the parity of the trillionth digit of pi, because the policy optimization is assumed to be logically omniscient. But maybe we were wrong about how central it was.
  What links here?
  - Why (and why not) Bayesian Updating? by Wei Dai (16 Nov 2009 21:27 UTC; 35 points)
  - Will_Newsome's comment on Comments on Pascal’s Mugging by [deleted] (5 May 2012 9:08 UTC; 2 points)
  - Vladimir_Nesov 15 Aug 2009 19:40 UTC
    5 points
    0
    Parent
    It looks like the uncertainty about your own actions in other possible worlds is entirely analogous to uncertainty about mathematical facts: in both cases, the answer is in denotation of the structure you already have at hand, so it doesn’t seem like the question about your own actions should be treated differently from any other logical question.
    
    (The following is moderately raw material and runs a risk of being nonsense, I don’t understand it well enough.)
    
    One perspective that wasn’t mentioned and that I suspect may be important is considering interaction between different processes (or agents) as working by the same mechanism as common partial histories between alternative versions of the same agent. If you can have logical knowledge about your own actions in other possible states that grow in time and possibilities from your current structure, the same treatment can be given to possible states of the signal you send out, in either time-direction, that is to consequences of actions and observations. One step further, any knowledge (properly defined) you have at all about something else gives the same power of mutual coordination with that something, as the common partial history gives to alternative or at-different-times versions of yourself.
    
    This problem seems deeply connected to logic and theoretical computer science, in particular models of concurrency.
    
    By the way, you say “partial histories of sense data and actions”. I try considering this problem in time-reversible dynamic, it adds a lot of elegance, and there actions are not part of history, but more like something that is removed from history. State of the agent doesn’t accumulate from actions and observations, instead it’s added to by observations and taken away from by actions. The point at which something is considered observation or action and not part of agent’s state is itself rather arbitrary, and both can be seen as points of shifting the scope on what is considered part of agent. (This doesn’t have anything agent-specific, and is more about processes in general.)
    - Jonathan_Graehl 15 Mar 2012 23:50 UTC
      2 points
      0
      Parent
      Everything you said sounds correct, except the last bit, which is just unclear to me. I’d welcome a demonstration (or formal definition) some day:
      
      By the way, you say “partial histories of sense data and actions”. I try considering this problem in time-reversible dynamic, it adds a lot of elegance, and there actions are not part of history, but more like something that is removed from history. State of the agent doesn’t accumulate from actions and observations, instead it’s added to by observations and taken away from by actions. The point at which something is considered observation or action and not part of agent’s state is itself rather arbitrary, and both can be seen as points of shifting the scope on what is considered part of agent. (This doesn’t have anything agent-specific, and is more about processes in general.)
    - Will_Newsome 15 Mar 2012 23:38 UTC
      0 points
      0
      Parent
      
      models of concurrency
      
      Just curious, did you get the name “ambient control” from ambient calculi?
  - Steve_Rayhawk 15 Aug 2009 18:39 UTC
    3 points
    0
    Parent
    
    the use of Bayesian belief updating with expected utility maximization may be just an approximation
    
    (It’s strange that I can use the language of possibility like that!)
- Wei Dai 13 Aug 2009 13:00 UTC
  2 points
  0
  Parent
  Edit: I first saw this problem in Nesov’s post. Are you sure Drescher talks about it in his book? I can’t find it.
  
  The solution I came up with is that the AI doesn’t do Bayesian updating. No matter what input it sees, it keeps using the original probabilities. Did you read this part, and if so, does it fail to explain my solution?
  
  Note that Bayesian updating is not done explicitly in this decision theory. When the decision algorithm receives input X, it may determine that a subset of programs it has preferences about never calls it with X and are also logically independent of its output, and therefore it can safely ignore them when computing the consequences of a choice. There is no need to set the probabilities of those programs to 0 and renormalize.
  
  ETA: I think I actually got the idea from Nesov: http://lesswrong.com/lw/14a/thomas_c_schellings_strategy_of_conflict/zrx
  - Eliezer Yudkowsky 13 Aug 2009 14:30 UTC
    5 points
    0
    Parent
    
    Note that Bayesian updating is not done explicitly in this decision theory. When the decision algorithm receives input X, it may determine that a subset of programs it has preferences about never calls it with X
    
    That’s odd, I remember reading through the whole post, but my eyes must have skipped that part. Probably lack of sleep.
    
    I was recently talking over a notion similar but not identical to this with Nick Bostrom. It shares with this idea the property of completely ruling out all epistemic anthropic reasoning even to the extent of concluding that you’re probably not a Boltzmann brain. I may post on it now that you’ve let the cat loose on “decide for all correlated copies of yourself”.
    
    The four main things to be verified are (a) whether this works with reasoning about impossible possible worlds, say if the coinflip is a digit of pi, (b) that the obvious way of extending it to probabilistic hypotheses (namely separating the causal mechanism into determistic and uncorrelated probabilistic parts a la Pearl) actually works, (c) that there are no even more startling consequences not yet observed, and (d) that you can actually formally say when and how to make a decision that correlates to a copy of yourself in a world that a classical Bayesian would call “ruled out” (with the obvious idea being to assume similarity only with possible computations that have received the same inputs you do, and then being similar in your own branch to the computation depended on by Omega in the Counterfactual Mugging—I have to think about this further and maybe write it out formally to check if it works, though).
    What links here?
    Steve_Rayhawk's comment on Towards a New Decision Theory by Wei Dai (15 Aug 2009 18:38 UTC; 10 points)
    cousin_it's comment on Ingredients of Timeless Decision Theory by Eliezer Yudkowsky (20 Aug 2009 19:34 UTC; 0 points)
    - Eliezer Yudkowsky 13 Aug 2009 14:59 UTC
      7 points
      0
      Parent
      Further reflecting, it looks to me like there may be an argument which forces Wei Dai’s “updateless” decision theory, very much akin to the argument that I originally used to pin down my timeless decision theory—if you expect to face Counterfactual Muggings, this is the reflectively consistent behavior; a simple-seeming algorithm has been presented which generates it, so unless an even simpler algorithm can be found, we may have to accept it.
      
      The face-value interpretation of this algorithm is a huge bullet to bite even by my standards—it amounts to (depending on your viewpoint) accepting the Self-Indication Assumption or rejecting anthropic reasoning entirely. If a coin is flipped, and on tails you will see a red room, and on heads a googolplex copies of you will be created in green rooms and one copy in a red room, and you wake up and find yourself in a red room, you would assign (behave as if you assigned) 50% posterior probability that the coin had come up tails. In fact it’s not yet clear to me how to interpret the behavior of this algorithm in any epistemic terms.
      
      To give credit where it’s due, I’d only been talking with Nick Bostrom about this dilemma arising from altruistic timeless decision theorists caring about copies of themselves; the idea of applying the same line of reasoning to all probability updates including over impossible worlds, and using this to solve Drescher’s(?) Counterfactual Mugging, had not occurred to me at all.
      
      Wei Dai, you may have solved one of the open problems I named, with consequences that currently seem highly startling. Congratulations again.
      What links here?
      Wei Dai's comment on Open thread, 21-27 April 2014 by Metus (25 Apr 2014 4:31 UTC; 14 points)
      Wei Dai's comment on Suggestions for naming a class of decision theories by orthonormal (24 Apr 2012 21:47 UTC; 7 points)
      - Wei Dai 13 Aug 2009 16:38 UTC
        3 points
        0
        Parent
        Credit for the no-update solution to Counterfactual Mugging really belongs to Nesov, and he came up with the problem in the first place as well, not Drescher. (Unless you can find a mention of it in Drescher’s book, I’m going to assume you misremembered.)
        
        I will take credit for understanding what he was talking about and reformulating the solution in a way that’s easier to understand. :)
        
        Nesov, you might want to reconsider your writing style, or something… maybe put your ideas into longer posts instead of scattered comments and try to leave smaller inferential gaps. You obviously have really good ideas, but often a person almost has to have the same idea already before they can understand you.
        Gary_Drescher 17 Aug 2009 2:56 UTC
        14 points
        0
        Parent
        My book discusses a similar scenario: the dual-simulation version of Newcomb’s Problem (section 6.3), in the case where the large box is empty (no $1M) and (I argue) it’s still rational to forfeit the $1K. Nesov’s version nicely streamlines the scenario.
        Gary_Drescher 19 Aug 2009 11:29 UTC
        16 points
        0
        Parent
        Just to elaborate a bit, Nesov’s scenario and mine share the following features:
        
        In both cases, we argue that an agent should forfeit a smaller sum for the sake of a larger reward that would have been obtainted (couterfactually contingently on that forfeiture) if a random event had turned out differently than in fact it did (and than the agent knows it did).
        
        We both argue for using the original coin-flip probability distribution (i.e., not-updating, if I’ve understood that idea correctly) for purposes of this decision, and indeed in general, even in mundane scenarios.
        
        We both note that the forfeiture decision is easier to justify if the coin-toss was quantum under MWI, because then the original probability distribution corresponds to a real physical distribution of amplitude in configuration-space.
        
        Nesov’s scenario improves on mine in several ways. He eliminates some unnecessary complications (he uses one simulation instead of two, and just tells the agent what the coin-toss was, whereas my scenario requires the agent to deduce that). So he makes the point more clearly, succinctly and dramatically. Even more importantly, his analysis (along with Yudkowsky, Dai, and others here) is more formal than my ad hoc argument (if you’ve looked at Good and Real, you can tell that formalism is not my forte.:)).
        
        I too have been striving for a more formal foundation, but it’s been elusive. So I’m quite pleased and encouraged to find a community here that’s making good progress focusing on a similar set of problems from a compatible vantage point.
        SilasBarta 20 Aug 2009 20:56 UTC
        3 points
        0
        Parent
        
        I’m quite pleased and encouraged to find a community here that’s making good progress focusing on a similar set of problems from a compatible vantage point.
        
        And I think I speak for everyone when I say we’re glad you’ve started posting here! Your book was suggested as required rationalist reading. It certainly opened my eyes, and I was planning to write a review and summary so people could more quickly understand its insights.
        
        (And not to be a suck-up, but I was actually at a group meeting the other day where the ice-breaker question was, “If you could spend a day with any living person, who would it be?” I said Gary Drescher. Sadly, no one had heard the name.)
        
        I won’t be able to contribute much to these discussions for a while, unfortunately. I don’t have a firm enough grasp of Pearlean causality and need to read up more on that and Newcomb-like problems (halfway through your book’s handling of it).
        Gary_Drescher 21 Aug 2009 14:16 UTC
        1 point
        0
        Parent
        
        If you could spend a day with any living person
        
        I think you’d find me anticlimactic. :) But I do appreciate the kind words.
        Vladimir_Nesov 13 Aug 2009 16:43 UTC
        2 points
        0
        Parent
        Being in a transitionary period from sputtering nonsense to thinking in math, I don’t feel right to write anything up (publicly) until I understand it well enough. But I can’t help making occasional comments. Well, maybe that’s a wrong mode as well.
        What links here?
        Wei Dai's comment on
        Wei Dai 14 Aug 2009 23:45 UTC
        3 points
        0
        Parent
        I guess there’s a tradeoff between writing too early, wasting your and other people’s time, and writing too late and wasting opportunities to clear other people’s confusion earlier and have them work in the same direction.
        Vladimir_Nesov 14 Aug 2009 23:55 UTC
        1 point
        0
        Parent
        And on the same note: was my comment about state networks understandable? What do you think about that? I’d appreciate if people who have sufficient background to in principle understand a given comment but who are unable to do so due to insufficiently clear or incomplete explanation spoke up about that fact.
        Wei Dai 16 Aug 2009 21:44 UTC
        4 points
        0
        Parent
        Another point that may help: if you’re presenting a complex idea, you need to provide some motivation for the reader to try to understand it. In your mind, that idea is linked to many others and form a somewhat coherent whole. But if you just describe the idea in isolation as math, either in equations or in words, the reader has no idea why they should try to understand it, except that you think it might be important for them to understand it. Perhaps because you’re so good at thinking in math, you seriously underestimate the amount of effort involved when others try it.
        
        I think that’s the main reason to write in longer form. If you try to describe ideas individually, you have to either waste a lot of time motivating each one separately and explain how it fits in with other ideas, or risk having nobody trying seriously to understand you. If you describe the system as a whole, you can skip a lot of that and achieve an economy of scale.
        Vladimir_Nesov 16 Aug 2009 22:28 UTC
        1 point
        0
        Parent
        Yeah, and math is very helpful as an explanation tool, because people can reconstruct the abstract concepts written in formulas correctly on the first try, even if math seems unnecessary for a particular point. Illusion of transparency of informal explanation, which is even worse where you know that formal explanation can’t fail.
        Wei Dai 15 Aug 2009 0:25 UTC
        1 point
        0
        Parent
        I didn’t understand it on my first try. I’ll have another go at it later and let you know.
      - Vladimir_Nesov 13 Aug 2009 15:25 UTC
        3 points
        0
        Parent
        Hmm… I’ve been talking about no-updating approach to decision-making for months, and Counterfactual Mugging was constructed specifically to show where it applies well, in a way that sounds on the surface opposite to “play to win”.
        
        The idea itself doesn’t seem like anything new, just a way of applying standard expectation maximization, not to individual decisions, but to a choice of strategy as a whole, or agent’s source code.
        
        From the point of view of agent, everything it can ever come to know results from computations it runs with its own source code, that take into account interaction with environment. If the choice of strategy doesn’t depend on particular observations, on context-specific knowledge about environment, then the only uncertainty that remains is the uncertainty about what the agent itself is going to do (compute) according to selected strategy. In simple situations, uncertainty disappears altogether. In more real-world situations, uncertainty results from there being a huge number of possible contexts in which the agent could operate, so that when the agent has to calculate its action in each such context, it can’t know for sure what it’s going to calculate in other contexts, while that information is required for the expected utility calculation. That’s logical uncertainty.
        timtyler 14 Aug 2009 18:44 UTC
        −8 points
        0
        Parent
        Re: The idea itself doesn’t seem like anything new [...]
        
        That was my overwhelming impression.
      - cousin_it 13 Aug 2009 15:28 UTC
        0 points
        0
        Parent
        Wei Dai’s theory does seem to imply this, and the conclusions don’t startle me much, but I’d really like a longer post with a clearer explanation.
    - Wei Dai 13 Aug 2009 15:55 UTC
      0 points
      0
      Parent
      
      I was recently talking over a notion similar but not identical to this with Nick Bostrom. It shares with this idea the property of completely ruling out all epistemic anthropic reasoning even to the extent of concluding that you’re probably not a Boltzmann brain. I may post on it now that you’ve let the cat loose on “decide for all correlated copies of yourself”.
      
      That reminds me, I actually had a similar idea back in 2001, and posted it on everything-list. I recall thinking at the time something like “This is a really alien way of reasoning and making decisions, and probably nobody will be able to practice it even if it works.”
      - Vladimir_Nesov 13 Aug 2009 16:10 UTC
        0 points
        0
        Parent
        Notice that which instances of the agent (making the choice) are possible in general depends on what choice it makes.
        
        Consider what is accessible if you trace the history of the agent along counterfactuals. Let’s say the time is discrete, and at each moment the agent is in a certain state. Going forwards in time, you include both options for the agent’s state after receiving a binary observation from environment, and conversely, going backwards, you include both options for the agent’s state before each option for a binary action that agent could make to arrive to the current state (action and observation are dual under time-reversal in reversible deterministic world dynamic). Iterating with these operations, you construct a “state network” of accessible agent states. (You include the states arrived at by “zig-zag” as well: first, a step to the past, then, a step to the future along an observation other than the one that led to the original state from which the tracing began—and you arrive at a counterfactual state in the usual sense—but these time-forward and time-backward steps can be repeated infinite number of times.)
        
        Now, the set of all possible states of the agent becomes divided into equivalence classes of states belonging to the same state networks. If the agent belongs to one of the state networks, if couldn’t be in any other state network (in the generalized sense of “coundn’t”). But which states belong to which network depends on the agent’s algorithm. In fact, the choice of the algorithm is equivalent to the choice of networks that cover the state set. I’m not really sure what to do with this construction, and whether the structure of the networks other that the network that contains the current state should matter. From the principle that observations shouldn’t influence the choice of strategy, the other state networks should matter just as well, but then again they are not even counterfactual...
        What links here?
        Vladimir_Nesov's comment on Towards a New Decision Theory by Wei Dai (14 Aug 2009 23:55 UTC; 1 point)
        lukstafi 30 Aug 2009 20:22 UTC
        0 points
        0
        Parent
        Action and observation are not “intuitively” dual, to my first thought they are invariant on time reversal. Action is a state-transition of the environment, and observation is a state-transition of the agent. I can see how the duality can be suggested by viewing action as a move of the agent-player and observation as a move of the environment-player. But here duality is in that a node which in one direction was a move by A (associated with arrows to the right), in the other direction is a move by E (associated with arrows to the left).
        Wei Dai 20 Aug 2009 11:55 UTC
        0 points
        0
        Parent
        Ok, I understood this on my second reading, but I don’t know what to make of it either. Why did you decide to think about agents like this, or did the idea just pop into your head and you wanted to see if it has any applications?
        Vladimir_Nesov 20 Aug 2009 12:18 UTC
        2 points
        0
        Parent
        It’s more or less a direct rendition of the idea of UDT: actions (with state transitions) depend on state of knowledge, so what does it say about the geometry of state transitions?
        
        More relevant to the recent discussion: Where does logical dependence come from and how to track it in a representation detailed enough? The source of logical dependence, beside what comes from the common algorithm, is actions and observations. In forward-time, all states following a given observation become dependent on that observation, and in backward-time, states preceding an action. A single observation can make multiple actions depend on it, and thus make them dependent.
        
        Connection with logic: states of knowledge in the state network are programs/proofs, and actions/observations are variables parameterizing more general programs that resolve into specific states of knowledge given these actions/observations. Also related to game semantics. This is one dimension along which to compress the knowledge representation and seek further understanding.