“So for every extra unit of disutility predicted the probability penalty due to not knowing enough about the current state of the universe becomes greater.”
Sure, but the probability shrinks slower than the disutility rises. A scenario in which 1000 times 3^^3 people are tortured has more probability that the probability that 3^^3 people are tortured, divided by 1000. Or more formally:
A word of caution—Solomonoff induction applies to things like the laws of physics, not to all hypotheses. Otherwise, if you flipped a coin 100 times, you would expect to see 100 heads much more often than average, and we don’t.
we’re more surprised when we see the former than the latter
I don’t think this is actually true. If MileyCyrus successfully predicted the exact sequence of coinflips HTHTTHTHTTTHHTH, wouldn’t you be more surprised than if it were HHHHHHHHHHHHHHH?
Of course. When I said “we’re more surprised” I was referring to the typical person who hasn’t read this discussion thread. In the absence of the above prediction, I would be far more surprised to see HHHHHHHHHHHHHHH than HTHTTHTHTTTHHTH. Once the prediction is made, I become extremely surprised if either sequence appears, but somewhat more surprised by HTHTTHTHTTTHHTH.
Oh, I see. In the case of the typical person, the answer is even easier: Lack of understanding of the conjunction rule of probability. HTHTTHTHTTTHHTH feels more representative of a random series of coin flips, so it is intuitively judged as more probable than HHHHHHHHHHHHHHH.
First reaction: I don’t know about “far” more probable. What’s the prior that a coin is rigged? I would have said less than 1⁄32768, but low confidence on that.
According to this, you can’t rig a coin to do that, which increases my confidence.
But you can rig your tossing, even by mistake; if it lands heads, and you balance it to flip with heads up again, then it’s slightly more likely to land heads. I remember hearing a figure of 51% for that; in which case H*15 has probability 1/24331 instead of 1⁄32768; about a third more probable. But that scenario (fifteen times) is itself unlikely… if we estimate P(next is heads | last was heads) = 0.505 (corresponding to keeping the same side up 3⁄4 of the time, I still feel that’s an overestimate), we get 1/28204, 16% more likely.
If we switched to dice, I would agree that 666666666666666 is far more probable than 136112642345553.
I suppose that isn’t all that unintuitive (though does this actually work if you start with a uniform prior over weights and do the math?). But does your intuitive model also predict the fact that HTHTHTHTHT is more probable than HTHHTHTHTT? :D
Well, it is the case that all the random sequences together have much larger probability than HHHHHHHHHHHH , and so we should expect the sequence to be one among the random sequences.
edit: interesting issue: suppose you assign some prior probability to each possible sequence. Upon seeing the actual sequence, with probability that your eyes deceived you 0.0001, how are you to update the probability of this particular sequence? Why would we assume sensory failure (or a biased coin) when we observe hundred heads, but not something random-looking? It should have to do with the sensory failure being much less likely for something random looking.
I’m treating the current state of the universe as a different thing entirely to the mugger’s implied hypothesis about how the universe works. Both a program simulating Maxwell’s equations would obviously win out over a program simulating Thor, but in terms of predicting the shape of a magnetic field in a certain spot, that depends on the current state of the universe (at least the parts of the universe relevant to the equation).
Though if this is an invalid line of reasoning for some reason, please let me know, thanks.
If you want to predict the exact state of a system five minutes into the future you need to know the current state of the system and the laws of that system. Call the current state s and the future state s’, the laws of the system are simulated by the Turing machine L. Instead of knowing the state of the system, we only know its laws (or rather we take them as a given).
Then any prediction we make about the future state of the system will restrict the range of value for s’ that will validate our prediction. The more specific we are about s’ the smaller the range of values it can be. In turn this restricts the range of possible values for s (as L(s) = s’) that will give s’.
Because we have no information about the current state of the system all possible states are equally likely, and as such the probability that the system will end up in a particular range of s’ is the same as the fraction of s (out of all possible s) that will map there.
This is not in relation to any hypothesis about the laws of the system, but instead the current state of the system. I hope this makes my original argument make more sense. If not I’m sorry; please highlight to me where my explanation is going wrong.
“So for every extra unit of disutility predicted the probability penalty due to not knowing enough about the current state of the universe becomes greater.”
Sure, but the probability shrinks slower than the disutility rises. A scenario in which 1000 times 3^^3 people are tortured has more probability that the probability that 3^^3 people are tortured, divided by 1000. Or more formally:
[P(Mugger tortures 1000*3^^3 people)] > [P(Mugger tortures 3^^3 people)]/1000
Read about Solomonoff Induction to find out why this is true.
How’s about that: the probabilities of torture of exact number of beings, got to sum to 1 or less?
A word of caution—Solomonoff induction applies to things like the laws of physics, not to all hypotheses. Otherwise, if you flipped a coin 100 times, you would expect to see 100 heads much more often than average, and we don’t.
If you flip a coin 15 times, this result:
HHHHHHHHHHHHHHH
is far more probable than this:
HTHTTHTHTTTHHTH
That’s because some coins are rigged, and it’s much easier to rig a coin to conform the first pattern than the second.
This is true, but doesn’t explain why we’re more surprised when we see the former than the latter.
I don’t think this is actually true. If MileyCyrus successfully predicted the exact sequence of coinflips HTHTTHTHTTTHHTH, wouldn’t you be more surprised than if it were HHHHHHHHHHHHHHH?
Of course. When I said “we’re more surprised” I was referring to the typical person who hasn’t read this discussion thread. In the absence of the above prediction, I would be far more surprised to see HHHHHHHHHHHHHHH than HTHTTHTHTTTHHTH. Once the prediction is made, I become extremely surprised if either sequence appears, but somewhat more surprised by HTHTTHTHTTTHHTH.
Oh, I see. In the case of the typical person, the answer is even easier: Lack of understanding of the conjunction rule of probability. HTHTTHTHTTTHHTH feels more representative of a random series of coin flips, so it is intuitively judged as more probable than HHHHHHHHHHHHHHH.
First reaction: I don’t know about “far” more probable. What’s the prior that a coin is rigged? I would have said less than 1⁄32768, but low confidence on that.
According to this, you can’t rig a coin to do that, which increases my confidence.
But you can rig your tossing, even by mistake; if it lands heads, and you balance it to flip with heads up again, then it’s slightly more likely to land heads. I remember hearing a figure of 51% for that; in which case H*15 has probability 1/24331 instead of 1⁄32768; about a third more probable. But that scenario (fifteen times) is itself unlikely… if we estimate P(next is heads | last was heads) = 0.505 (corresponding to keeping the same side up 3⁄4 of the time, I still feel that’s an overestimate), we get 1/28204, 16% more likely.
If we switched to dice, I would agree that 666666666666666 is far more probable than 136112642345553.
I suppose that isn’t all that unintuitive (though does this actually work if you start with a uniform prior over weights and do the math?). But does your intuitive model also predict the fact that HTHTHTHTHT is more probable than HTHHTHTHTT? :D
Well, it is the case that all the random sequences together have much larger probability than HHHHHHHHHHHH , and so we should expect the sequence to be one among the random sequences.
edit: interesting issue: suppose you assign some prior probability to each possible sequence. Upon seeing the actual sequence, with probability that your eyes deceived you 0.0001, how are you to update the probability of this particular sequence? Why would we assume sensory failure (or a biased coin) when we observe hundred heads, but not something random-looking? It should have to do with the sensory failure being much less likely for something random looking.
I’m treating the current state of the universe as a different thing entirely to the mugger’s implied hypothesis about how the universe works. Both a program simulating Maxwell’s equations would obviously win out over a program simulating Thor, but in terms of predicting the shape of a magnetic field in a certain spot, that depends on the current state of the universe (at least the parts of the universe relevant to the equation).
Though if this is an invalid line of reasoning for some reason, please let me know, thanks.
I have no idea where you’re going with this.
You use the word “both” but then refer to only one object. Did you forget to include something?
Sorry I’ll try to clarify:
If you want to predict the exact state of a system five minutes into the future you need to know the current state of the system and the laws of that system. Call the current state s and the future state s’, the laws of the system are simulated by the Turing machine L. Instead of knowing the state of the system, we only know its laws (or rather we take them as a given).
Then any prediction we make about the future state of the system will restrict the range of value for s’ that will validate our prediction. The more specific we are about s’ the smaller the range of values it can be. In turn this restricts the range of possible values for s (as L(s) = s’) that will give s’.
Because we have no information about the current state of the system all possible states are equally likely, and as such the probability that the system will end up in a particular range of s’ is the same as the fraction of s (out of all possible s) that will map there.
This is not in relation to any hypothesis about the laws of the system, but instead the current state of the system. I hope this makes my original argument make more sense. If not I’m sorry; please highlight to me where my explanation is going wrong.