MileyCyrus comments on [missing post]

MileyCyrus 30 Mar 2012 7:19 UTC
2 points
“So for every extra unit of disutility predicted the probability penalty due to not knowing enough about the current state of the universe becomes greater.”

Sure, but the probability shrinks slower than the disutility rises. A scenario in which 1000 times 3^^3 people are tortured has more probability that the probability that 3^^3 people are tortured, divided by 1000. Or more formally:

[P(Mugger tortures 1000*3^^3 people)] > [P(Mugger tortures 3^^3 people)]/1000

Read about Solomonoff Induction to find out why this is true.
- Dmytry 31 Mar 2012 6:57 UTC
  2 points
  Parent
  How’s about that: the probabilities of torture of exact number of beings, got to sum to 1 or less?
- Manfred 30 Mar 2012 12:32 UTC
  2 points
  Parent
  A word of caution—Solomonoff induction applies to things like the laws of physics, not to all hypotheses. Otherwise, if you flipped a coin 100 times, you would expect to see 100 heads much more often than average, and we don’t.
  - MileyCyrus 30 Mar 2012 15:40 UTC
    2 points
    Parent
    
    Otherwise, if you flipped a coin 100 times, you would expect to see 100 heads much more often than average, and we don’t.
    
    If you flip a coin 15 times, this result:
    
    HHHHHHHHHHHHHHH
    
    is far more probable than this:
    
    HTHTTHTHTTTHHTH
    
    That’s because some coins are rigged, and it’s much easier to rig a coin to conform the first pattern than the second.
    - Antisuji 30 Mar 2012 16:09 UTC
      3 points
      Parent
      This is true, but doesn’t explain why we’re more surprised when we see the former than the latter.
      - [deleted] 30 Mar 2012 16:21 UTC
        1 point
        Parent
        
        we’re more surprised when we see the former than the latter
        
        I don’t think this is actually true. If MileyCyrus successfully predicted the exact sequence of coinflips HTHTTHTHTTTHHTH, wouldn’t you be more surprised than if it were HHHHHHHHHHHHHHH?
        Antisuji 30 Mar 2012 18:44 UTC
        2 points
        Parent
        Of course. When I said “we’re more surprised” I was referring to the typical person who hasn’t read this discussion thread. In the absence of the above prediction, I would be far more surprised to see HHHHHHHHHHHHHHH than HTHTTHTHTTTHHTH. Once the prediction is made, I become extremely surprised if either sequence appears, but somewhat more surprised by HTHTTHTHTTTHHTH.
        [deleted] 30 Mar 2012 18:58 UTC
        2 points
        Parent
        Oh, I see. In the case of the typical person, the answer is even easier: Lack of understanding of the conjunction rule of probability. HTHTTHTHTTTHHTH feels more representative of a random series of coin flips, so it is intuitively judged as more probable than HHHHHHHHHHHHHHH.
    - philh 30 Mar 2012 18:56 UTC
      0 points
      Parent
      First reaction: I don’t know about “far” more probable. What’s the prior that a coin is rigged? I would have said less than ¹⁄₃₂₇₆₈, but low confidence on that.
      
      According to this, you can’t rig a coin to do that, which increases my confidence.
      
      But you can rig your tossing, even by mistake; if it lands heads, and you balance it to flip with heads up again, then it’s slightly more likely to land heads. I remember hearing a figure of 51% for that; in which case H*15 has probability 1/24331 instead of ¹⁄₃₂₇₆₈; about a third more probable. But that scenario (fifteen times) is itself unlikely… if we estimate P(next is heads | last was heads) = 0.505 (corresponding to keeping the same side up ³⁄₄ of the time, I still feel that’s an overestimate), we get 1/28204, 16% more likely.
      
      If we switched to dice, I would agree that 666666666666666 is far more probable than 136112642345553.
    - Manfred 30 Mar 2012 16:24 UTC
      0 points
      Parent
      I suppose that isn’t all that unintuitive (though does this actually work if you start with a uniform prior over weights and do the math?). But does your intuitive model also predict the fact that HTHTHTHTHT is more probable than HTHHTHTHTT? :D
      - Dmytry 31 Mar 2012 7:01 UTC
        2 points
        Parent
        Well, it is the case that all the random sequences together have much larger probability than HHHHHHHHHHHH , and so we should expect the sequence to be one among the random sequences.
        
        edit: interesting issue: suppose you assign some prior probability to each possible sequence. Upon seeing the actual sequence, with probability that your eyes deceived you 0.0001, how are you to update the probability of this particular sequence? Why would we assume sensory failure (or a biased coin) when we observe hundred heads, but not something random-looking? It should have to do with the sensory failure being much less likely for something random looking.
- Arran_Stirton 31 Mar 2012 14:16 UTC
  0 points
  Parent
  I’m treating the current state of the universe as a different thing entirely to the mugger’s implied hypothesis about how the universe works. Both a program simulating Maxwell’s equations would obviously win out over a program simulating Thor, but in terms of predicting the shape of a magnetic field in a certain spot, that depends on the current state of the universe (at least the parts of the universe relevant to the equation).
  
  Though if this is an invalid line of reasoning for some reason, please let me know, thanks.
  - MileyCyrus 31 Mar 2012 17:53 UTC
    0 points
    Parent
    I have no idea where you’re going with this.
    
    Both a program simulating Maxwell’s equations would obviously win out over a program simulating Thor,
    
    You use the word “both” but then refer to only one object. Did you forget to include something?
    - Arran_Stirton 4 Apr 2012 0:35 UTC
      0 points
      Parent
      Sorry I’ll try to clarify:
      
      If you want to predict the exact state of a system five minutes into the future you need to know the current state of the system and the laws of that system. Call the current state s and the future state s’, the laws of the system are simulated by the Turing machine L. Instead of knowing the state of the system, we only know its laws (or rather we take them as a given).
      
      Then any prediction we make about the future state of the system will restrict the range of value for s’ that will validate our prediction. The more specific we are about s’ the smaller the range of values it can be. In turn this restricts the range of possible values for s (as L(s) = s’) that will give s’.
      
      Because we have no information about the current state of the system all possible states are equally likely, and as such the probability that the system will end up in a particular range of s’ is the same as the fraction of s (out of all possible s) that will map there.
      
      This is not in relation to any hypothesis about the laws of the system, but instead the current state of the system. I hope this makes my original argument make more sense. If not I’m sorry; please highlight to me where my explanation is going wrong.