Wei Dai comments on Open Thread: April 2010

Wei Dai 6 Apr 2010 11:16 UTC
9 points
0
I’ve written a reply to Bayesian Flame, one of cousin_it’s posts from last year. It’s titled Frequentist Magic vs. Bayesian Magic. I’d appreciate some review and comments before I post it here. Mainly I’m concerned about whether I’ve correctly captured the spirit of frequentism, and whether I’ve treated it fairly.

BTW, I wish there is a “public drafts” feature on LessWrong, where I can make a draft accessible to others by URL, but not show up in recent posts, so I don’t have to post a draft elsewhere to get feedback before I officially publish it.
What links here?
- Frequentist Magic vs. Bayesian Magic by Wei Dai (8 Apr 2010 20:34 UTC; 58 points)
- Vladimir_Nesov 6 Apr 2010 11:32 UTC
  5 points
  0
  Parent
  
  Why does the universe that we live in look like a giant computer? What about uncomputable physics?
  
  Consider “syntactic preference” as an order on agent’s strategies (externally observable possible behaviors, but in mathematical sense, independently on what we can actually arrange to observe), where the agent is software running on an ordinary computer. This is “ontological boxing”, a way of abstracting away any unknown physics. Then, this syntactic order can be given interpretation, as in logic/model theory, for example by placing the “agent program” in environment of all possible “world programs”, and restating the order on possible agent’s strategies in terms of possible outcomes for the world programs (as an order on sets of outcomes for all world programs), depending on the agent.
  
  This way, we first factor out the real world from the problem, leaving only the syntactic backbone of preference, and then reintroduce a controllable version of the world, in a form of any convenient mathematical structure, an interpretation of syntactic preference. The question of whether the model world is “actually the real world”, and whether it reflects all possible features of the real world, is sidestepped.
  What links here?
  - Vladimir_Nesov's comment on The Concepts Problem by Kaj_Sotala (16 Apr 2010 15:50 UTC; 3 points)
  - Wei Dai 6 Apr 2010 12:59 UTC
    3 points
    0
    Parent
    Thanks (and upvoted) for this explanation of your current approach. I think it’s definitely worth exploring, but I currently see at least two major problems.
    
    The first is that my preferences seem to have a logical dependency on the ultimate nature of reality. For example, I currently think reality is just “all possible mathematical structures”, but I don’t know what my preferences are until I resolve what “all possible mathematical structures” means exactly. What would happen if you tried to use your idea to extract my preferences before I resolve that question?
    
    The second is that I don’t see how you plan to differentiate within “syntactic preference”, those that are true preferences, and those that are caused by computational limitations and/or hardware/software errors. Internally, the agent is computing the optimal strategy (as best as it can) from a preference that’s stated in terms of “the real world” and maybe also in terms of subjective anticipation. If we could somehow translate those preferences directly into preferences on mathematical structures, we would be able to bypass those computational limitations and errors without having to single them out.
    - Vladimir_Nesov 6 Apr 2010 15:35 UTC
      5 points
      0
      Parent
      
      The first is that my preferences seem to have a logical dependency on the ultimate nature of reality.
      
      An important principle of FAI design to remember here is “be lazy!”. For any problem that people would want to solve, where possible, FAI design should redirect that problem to FAI, instead of actually solving it in order to construct a FAI.
      
      Here, you, as a human, may be interested in “nature of reality”, but this is not a problem to be solved before the construction of FAI. Instead, the FAI should pursue this problem in the same sense you would.
      
      Syntactic preference is meant to capture this sameness of pursuits, without understanding of what these pursuits are about. Instead of wanting to do the same thing with the world as you would want to, the FAI having the same syntactic preference wants to perform the same actions as you would want to. The difference is that syntactic preference refers to actions (I/O), not to the world. But the outcome is exactly the same, if you manage to represent your preference in terms of your I/O.
      
      I don’t know what my preferences are until I resolve what “all possible mathematical structures” means exactly
      
      You may still know the process of discovery that you want to follow while doing what you call getting to know your own preference. That process of discovery gives definition of preference. We don’t need to actually compute preference in some predefined format, to solve the conceptual problem of defining preference. We only need to define a process that determines preference.
      
      The second is that I don’t see how you plan to differentiate within “syntactic preference”, those that are true preferences, and those that are caused by computational limitations and/or hardware/software errors.
      
      This issue is actually the last conceptual milestone I’ve reached on this problem, just a few days ago. The trouble is in how would the agent reason about the possibility of corruption of its own hardware. The answer is that human preference is to a large extent concerned with consequentialist reasoning about the world, so human preference can be interpreted as modeling the environment, including the agent’s hardware. This is an informal statement, referring to the real world, but the behavior supporting this statement is also determined by formal syntactic preference that doesn’t refer to the real world. Thus, just mathematically implementing human preference is enough to cause the agent to worry about how its hardware is doing (it isn’t in any sense formally defined as its own hardware, but what happens in the agent’s formal mind can be interpreted as recognizing the hardware’s instrumental utility). In particular, this solves the issues of possible morally harmful impact of the FAI’s computation (e.g. simulating tortured people and then deleting them from memory, etc.), and of upgrading the FAI beyond the initial hardware (so that it can safely discard the old hardware).
      - Wei Dai 6 Apr 2010 22:05 UTC
        2 points
        0
        Parent
        Once we implement this kind of FAI, how will we be better off than we are today? It seems like the FAI will have just built exact simulations of us inside itself (who, in order to work out their preferences, will build another FAI, and so on). I’m probably missing something important in your ideas, but it currently seems a lot like passing the recursive buck.
        
        ETA: I’ll keep trying to figure out what piece of the puzzle I might be missing. In the mean time, feel free to take the option of writing up your ideas systematically as a post instead of continuing this discussion (which doesn’t seem to be followed by many people anyway).
        Vladimir_Nesov 6 Apr 2010 22:40 UTC
        2 points
        0
        Parent
        FAI doesn’t do what you do; it optimizes its strategy according to preference. It’s more able than a human to form better strategies according to a given preference, and even failing that it still has to be able to avoid value drift (as a minimum requirement).
        
        Preference is never seen completely, there is always loads of logical uncertainty about it. The point of creating a FAI is in fixing the preference so that it stops drifting, so that the problem that is being solved is held fixed, even though solving it will take the rest of eternity; and in creating a competitive preference-optimizing agent that ensures the preference to fair OK against possible threats, including different-preference agents or value-drifted humanity.
        
        Preference isn’t defined by an agent’s strategy, so copying a human without some kind of self-reflection I don’t understand is pretty pointless. Since I never described a way of extracting preference from a human (and hence defining it for a FAI), I’m not sure where do you see the regress in the process of defining preference.
        
        FAI is not built without exact and complete definition of preference. The uncertainty about preference can only be logical, in what it means/implies. (At least, when we are talking about syntactic preference, where the rest of the world is necessarily screened off.)
        andreas 7 Apr 2010 1:22 UTC
        3 points
        0
        Parent
        
        Since I never described a way of extracting preference from a human (and hence defining it for a FAI), I’m not sure where do you see the regress in the process of defining preference.
        
        Reading your previous post in this thread, I felt like I was missing something and I could have asked the question Wei Dai asked (“Once we implement this kind of FAI, how will we be better off than we are today?”). You did not explicitly describe a way of extracting preference from a human, but phrases like “if you manage to represent your preference in terms of your I/O” made it seem like capturing strategy was what you had in mind.
        
        I now understand you as talking only about what kind of object preference is (an I/O map) and about how this kind of object can contain certain preferences that we worry might be lost (like considerations of faulty hardware). You have not said anything about what kind of static analysis would take you from an agent’s ~~strategy~~ program to an agent’s preference.
        Wei Dai 22 Apr 2010 10:58 UTC
        3 points
        0
        Parent
        After reading Nesov’s latest posts on the subject, I think I better understand what he is talking about now. But I still don’t get why Nesov seems confident that this is the right approach, as opposed to a possible one that is worth looking into.
        
        You [Nesov] have not said anything about what kind of static analysis would take you from an agent’s program to an agent’s [syntactic] preference.
        
        Do we have at least an outline of how such an analysis would work? If not, why do we think that working out such an analysis would be any easier than, say, trying to state ourselves what our “semantic” preferences are?
        Vladimir_Nesov 22 Apr 2010 14:04 UTC
        2 points
        0
        Parent
        
        But I still don’t get why Nesov seems confident that this is the right approach, as opposed to a possible one that is worth looking into.
        
        What other approaches do you refer to? This is just the direction my own research has taken. I’m not confident it will lead anywhere, but it’s the best road I know about.
        
        Do we have at least an outline of how such an analysis would work? If not, why do we think that working out such an analysis would be any easier than, say, trying to state ourselves what our “semantic” preferences are?
        
        I have some ideas, though too vague to usefully share (I wrote about a related idea on the SIAI decision theory list, replying to Drescher’s bounded Newcomb variant, where a dependence on strategy is restored from a constant syntactic expression in terms of source code). For “semantic preference”, we have the ontology problem, which is a complete show-stopper. (Though as I wrote before, interpretations of syntactic preference in terms of formal “possible worlds”—now having nothing to do with the “real world”—are a useful tool, and it’s the topic of the next blog post.)
        
        At this point, syntactic preference (1) solves the ontology problem, (2) gives focus to investigation of what kind of mathematical structure could represent preference (strategy is a well-understood mathematical structure, and syntactic preference is something allowing to compute a strategy, with better strategies resulting from more computation), and (3) gives a more technical formulation of the preference extraction problem, so that we can think about it more clearly. I don’t know of another effort towards clarifying/developing preference theory (that reaches even this meager level of clarity).
        
        If not, why do we think that working out such an analysis would be any easier than, say, trying to state ourselves what our “semantic” preferences are?
        
        Returning to this point, there are two show-stopping problems: first, as I pointed out above, there is the ontology problem: even if humans were able to write out their preference, the ontology problem makes the product of such an effort rather useless; second, we do know that we can’t write out our preference manually. Figuring out an algorithmic trick for extracting it from human minds automatically is not out of the question, hence worth pursuing.
        
        P.S. These are important questions, and I welcome this kind of discussion about general sanity of what I’m doing or claiming; I only saw this comment because I’m subscribed to your LW comments.
        Wei Dai 25 Apr 2010 16:37 UTC
        0 points
        0
        Parent
        Why do you consider the ontology problem to be a complete show-stopper? It seems to me there are at least two other approaches to it that we can take:
        
        We human beings seem to manage to translate our preferences from one ontology to another when necessary, so try to figure out how we do that, and program it into the FAI.
        
        Work out what the true, correct ontology is, then translate our preferences into that ontology. It seems that we already have a good candidate of this in the form of “all mathematical structures”. Formalizing that notion seems really hard, but why should it be impossible?
        
        You claim that syntactic preference solves the ontology problem, but I have even fewer ideas about how to extract the syntactic preference of arbitrary programs. You mention that you do have some vague ideas, so I guess I’ll just have to be patient and let you work them out.
        
        second, we do know that we can’t write out our preference manually.
        
        How do we know that? It’s not clear to me that there is any more evidence for “we can’t write out our preferences manually”, than for “we can’t build an artificial general intelligence manually”.
        
        I only saw this comment because I’m subscribed to your LW comments.
        
        I had a hunch that might be the case. :)
        Vladimir_Nesov 26 Apr 2010 14:20 UTC
        0 points
        0
        Parent
        
        Why do you consider the ontology problem to be a complete show-stopper? It seems to me there are at least two other approaches to it that we can take:
        
        By “show-stopper” I simply mean that we absolutely have to solve it in some way. Syntactic preference is one way, what you suggest could conceivably be another.
        
        You claim that syntactic preference solves the ontology problem, but I have even fewer ideas about how to extract the syntactic preference of arbitrary programs.
        
        An advantage I see with syntactic preference is that it’s at least more or less clear what are we working with: formal programs and strategies. This opens the whole palette of possible approaches to the remaining problems to try on. With “all mathematical structures” thing, we still don’t know what we are supposed to talk about, there is as of now no way forward already at that step. At least syntactic preference allows to make one step further, to firmer ground, even though admittedly it’s unclear what to do next.
        
        second, we do know that we can’t write out our preference manually.
        
        How do we know that? It’s not clear to me that there is any more evidence for “we can’t write out our preferences manually”, than for “we can’t build an artificial general intelligence manually”.
        
        I mean the “complexity of value”/”value is fragile” thesis. It seems to me quite convincing, and from the opposite direction, I have the “preference is detailed” conjecture resulting from the nature of preference in general. For “is it possible to build AI”, we don’t have similarly convincing arguments (and really, it’s an unrelated claim that only contributes connotation of error in judgment, without giving an analogy in the method of arriving at that judgment).
        Expand this thread
        Wei Dai 1 May 2010 3:17 UTC
        2 points
        0
        Parent
        
        I mean the “complexity of value”/”value is fragile” thesis.
        
        I agree with “complexity of value” in the sense that human preference, as a mathematical object, has high information content. But I don’t see a convincing argument from this premise to the conclusion that the best course of action for us to take, in the sense of maximizing our values under the constraints that we’re likely to face, involves automated extraction of preferences, instead of writing them down manually.
        
        Consider the counter-example of someone who has the full complexity of human values, but would be willing to give up all of their other goals to fill the universe with orgasmium, if that choice were available. Such an agent could “win” by building a superintelligence with just that one value. How do we know, at this point, that our values are not like that?
        Vladimir_Nesov 3 May 2010 23:15 UTC
        0 points
        0
        Parent
        Whatever the case is with how acceptable the simplified values are, automated extraction of preference seems to be the only way to actually knowably win, rather than striking a compromise, which simplified preference is suggested to be. We must decide from information we have; how would you come to know that a particular simplified preference definition is any good? I don’t see a way forward without having a more precise moral machine than a human first (but then, we won’t need to consider simplified preference).
        Vladimir_Nesov 7 Apr 2010 8:13 UTC
        3 points
        0
        Parent
        
        I now understand you as talking only about what kind of object preference is (an I/O map) and about how this kind of object can contain certain preferences that we worry might be lost (like considerations of faulty hardware).
        
        Correct. Note that “strategy” is a pretty standard term, while “I/O map” sounds ambiguous, though it emphasizes that everything except the behavior at I/O is disregarded.
        
        You have not said anything about what kind of static analysis would take you from an agent’s strategy to an agent’s preference.
        
        An agent is more than its strategy: strategy is only external behavior, normal form of the algorithm implemented in the agent. The same strategy can be implemented by many different programs. I strongly suspect that it takes more than a strategy to define preference, that introspective properties are important (how the behavior is computed, as opposed to just what the resulting behavior is). It is sufficient for preference, when it is defined, to talk about strategies, and disregard how they could be computed; but to define (extract) a preference, a single strategy may be insufficient, it may be necessary to look at how the reference agent (e.g. a human) works on the inside. Besides, the agent is never given as its strategy, it is given as its source code that normalizes to that strategy, and computing the strategy may be tough (and pointless).
- JGWeissman 6 Apr 2010 16:51 UTC
  4 points
  0
  Parent
  You can do better than frequentist approach without using the “magic” universal prior. You can just use a prior that represents initial ignorance of the frequency at which the machine produces head-biased and tail-biased coins. (dP(f) = 1df). If you want to look for repeating patterns, you can assign probability (1/2)(1/2^n) to the theory that the machine produces each type of coin on a frequency depending on the last n coins it produced. This requires treating a probability as a strength of belief, and not the frequency of anything, which is what (as I understand it) frequentists are not willing to do.
  
  Note the universal prior, if you can pull it off, is still better than what I described. The repeating pattern seeking prior will not notice, for example, if the machine makes head biased coins on prime-numbered trials, but tailbiased coins on composite-numbered trials. This is because it implicitly assigns probability 0 to that type of machine, which takes infinite evidence to update.
  What links here?
  - JGWeissman's comment on Frequentist Magic vs. Bayesian Magic by Wei Dai (8 Apr 2010 22:44 UTC; 1 point)
- JGWeissman 6 Apr 2010 16:00 UTC
  2 points
  0
  Parent
  
  BTW, I wish there is a “public drafts” feature on LessWrong, where I can make a draft accessible to others by URL, but not show up in recent posts, so I don’t have to post a draft elsewhere to get feedback before I officially publish it.
  
  I second this feature request.
  
  ETA: I did not notice earlier Steve Rayhawk made the same comment.
- Steve_Rayhawk 6 Apr 2010 11:53 UTC
  2 points
  0
  Parent
  
  I wish there is a “public drafts” feature on LessWrong
  
  Seconded. See also JenniferRM on editorial-level versus object-level comments.
  What links here?
  - JGWeissman's comment on Open Thread: April 2010 by Unnamed (6 Apr 2010 16:00 UTC; 2 points)
- Morendil 6 Apr 2010 11:38 UTC
  0 points
  0
  Parent
  Agreed. I’ll be investigating what it would take to implement that.
  
  (Edit: interesting; draft folders are apparently private sub-reddits created when a user registers and admin’ed by that user.)