Anti-reductionist comments on Crisis of Faith

Anti-reductionist 10 Oct 2008 23:43 UTC
−3 points
−1
Many in this world retain beliefs whose flaws a ten-year-old could point out

Very true. Case in point: the belief that “minimum description length” or “Solomonoff induction” can actually predict anything. Choose a language that can describe MWI more easily than Copenhagen, and they say you should believe MWI; choose a language that can describe Copenhagen more easily than MWI, and they say you should believe Copenhagen. I certainly could have told you that when I was ten...
- [deleted] 25 Aug 2009 10:30 UTC
  14 points
  0
  Parent
  The argument in this post is precisely analogous to the following:
  
  Bayesian reasoning cannot actually predict anything. Choose priors that result in the posterior for MWI being greater than that for Copenhagen, and it says you should believe MWI; choose priors that result in the posterior for Copenhagen being greater than that for MWI, and it says you should believe Copenhagen.
  
  The thing is, though, choosing one’s own priors is kind of silly, and choosing one’s own priors with the purpose of making the posteriors be a certain thing is definitely silly. Priors should be chosen to be simple but flexible. Likewise, choosing a language with the express purpose of being able to express a certain concept simply is silly; languages should be designed to be simple but flexible.
  - cousin_it 25 Aug 2009 10:38 UTC
    11 points
    0
    Parent
    It seems to me that you’re waving the problem away instead of solving it. For example, I don’t know of any general method for devising a “non-silly” prior for any given parametric inference problem. Analogously, what if your starting language accidentally contains a shorter description of Copenhagen than MWI?
    - [deleted] 27 Aug 2009 0:56 UTC
      3 points
      0
      Parent
      If you’re just doing narrow AI, then look at your hypothesis that describes the world (e.g. “For any two people, they have some probability X of having a relationship we’ll call P. For any two people with relationship P, every day, they have a probability Y of causing perception A.”), then fill in every parameter (in this case, we have X and Y) with reasonable distributions (e.g. X and Y independent, each with a ¹⁄₃ chance of being 0, a ¹⁄₃ chance of being 1, and a ¹⁄₃ chance of being the uniform distribution).
      
      Yes, I said “reasonable”. Subjectivity is necessary; otherwise, everyone would have the same priors. Just don’t give any statement an unusually low probability (e.g. a probability practically equal to zero that a certain physical constant is greater than Graham’s number), nor any statement an unusually high probability (e.g. a 50% probability that Christianity is true). I think good rules are that the language your prior corresponds to should not have any atoms that can be described reasonably easily (perhaps 10 atoms or less) using only other atoms, and that every atom should be mathematically useful.
      
      If the starting language accidentally contains a shorter description of Copenhagen than MWI? Spiffy! Assuming there is no evidence either way, Copenhagen will be more likely than MWI. Now, correct me if I’m wrong, but MWI is essentially the idea that the set of things causing wavefunction collapse is empty, while Copenhagen states that it is not empty. Supposing we end up with a ¹⁄₃ chance of MWI being true and a ²⁄₃ chance that it’s some other simple thing, is that really a bad thing? Your agent will end up designing devices that will work only if a certain subinterpretation of the Copenhagen interpretation is true and try them out. Eventually, most of the simple, easily-testable versions of the Copenhagen interpretation will be ruled out—if they are, in fact, false—and we’ll be left with two things: unlikely versions of the Copenhagen interpretation, and versions of the Copenhagen interpretation that are practically identical to MWI.
      
      (Do I get a prize for saying “e.g.” so much?)
      - Alicorn 27 Aug 2009 1:03 UTC
        17 points
        0
        Parent
        
        (Do I get a prize for saying “e.g.” so much?)
        
        Yes. Here is an egg and an EEG.
- Ronny Fernandez 13 Aug 2011 9:18 UTC
  −1 points
  0
  Parent
  The minimum description length formulation doesn’t allow for that at all. You are not allowed to pick whatever language you want, you have to pick the optimal code. If in the most concise code possible, state ‘a’ has a smaller code than state ‘b’, then ‘a’ must be more probable than ‘b’, since the most concise codes possible assign the smallest codes to the most probable states.
  
  So if you wanna know what state a system is in, and you have the ideal (or close to ideal) code for the states in that system, the probability of that state will be strongly inversely correlated with the length of the code for that state.
  - [deleted] 13 Aug 2011 12:14 UTC
    7 points
    0
    Parent
    
    You are not allowed to pick whatever language you want, you have to pick the optimal code. If in the most concise code possible, state ‘a’ has a smaller code than state ‘b’, then ‘a’ must be more probable than ‘b’, since the most concise codes possible assign the smallest codes to the most probable states.
    
    I haven’t read anything like this in my admittedly limited readings on Solomonoff induction. Disclaimer: I am only a mere mathematician in a different field, and have only read a few papers surrounding Solomonoff.
    
    The claims I’ve seen revolve around “assembly language” (for some value of assembly language) being sufficiently simple that any biases inherent in the language are small (some people claim constant multiple on the basis that this is what happens when you introduce a symbol ‘short-circuiting’ a computation). I think a more correct version of Anti-reductionist’s argument should run, “we currently do not know how the choice of language affects SI; it is conceivable that small changes in the base language imply fantastically different priors.”
    
    I don’t know the answer to that, and I’d be very glad to know if someone has proved it. However, I think it’s rather unlikely that someone has proved it, because 1) I expect it will be disproven (on the basis that model-theoretic properties tend to be fragile), and 2) given the current difficulties in explicitly calculating SI, finding an explicit, non-trivial counter-example would probably be difficult.
    
    Note that
    
    Choose a language that can describe MWI more easily than Copenhagen, and they say you should believe MWI; choose a language that can describe Copenhagen more easily than MWI, and they say you should believe Copenhagen.
    
    is not such a counter-example, because we do not know if “sufficiently assembly-like” languages can be chosen which exhibit such a bias. I don’t think the above thought-experiment is worth pursuing, because I don’t think we even know a formal (on the level of assembly-like languages) description of either CI or MWI.
    - Ronny Fernandez 13 Aug 2011 12:36 UTC
      0 points
      0
      Parent
      Not Solomonoff, minimum description length, I’m coming from an information theory background, I don’t know very much about Solomonoff induction.
      - [deleted] 13 Aug 2011 12:39 UTC
        0 points
        0
        Parent
        OP is talking about Solomonoff priors, no? Is there a way to infer on minimum description length?
        Ronny Fernandez 13 Aug 2011 12:42 UTC
        0 points
        0
        Parent
        What is OP?
        Vladimir_Nesov 13 Aug 2011 12:47 UTC
        0 points
        0
        Parent
        EY
        [deleted] 13 Aug 2011 12:49 UTC
        0 points
        0
        Parent
        I meant Anti-reductionist, the person potato originally replied to… I suppose grandparent would have been more accurate.
        Ronny Fernandez 13 Aug 2011 12:52 UTC
        0 points
        0
        Parent
        He was talking about both.
        
        the belief that “minimum description length” or “Solomonoff induction” can actually predict anything
        
        [deleted] 13 Aug 2011 12:56 UTC
        1 point
        0
        Parent
        So how do you predict with minimum description length?
        lessdazed 13 Aug 2011 16:32 UTC
        1 point
        0
        Parent
        With respect to the validity of reductionism, out of MML and SI, one theoretically predicts and the other does not. Obviously.
  - Oscar_Cunningham 13 Aug 2011 11:31 UTC
    2 points
    0
    Parent
    Aren’t you circularly basing your code on your probabilities but then taking your priors from the code?
    - Ronny Fernandez 13 Aug 2011 12:07 UTC
      0 points
      0
      Parent
      Yep, but that’s all the proof shows: the more concise your code, the stronger the inverse correlation between the probability of a state and the code length of that state.