Alice: How can we formalize the idea of “surprise”?
Bob: I think surprise is seeing an event of low probability.
Alice: This morning I saw a car whose license plate said 3817, and that didn’t surprise me at all!
Bob: Huh.
For everyone still wondering about that, here’s the correct answer! The numerical measure of surprise is information gain (Kullback-Leibler divergence) from your prior to your posterior over models after updating on the data. That gives the intuitive answer to the above puzzle, as long as none of your models assigned high probability to 3817 in advance. It also works for the opposite case, if you expected an ordered string but got a random one, or ordered in a different way.
This is actually well known, I just wanted to put it on LW.
Just to make sure I understand prior and posterior
over models, is the following about right?
Alice starts with a prior of 0.999 that
non-vanity
plates are generated basically randomly (according
to some rule of “N letters followed by M digits”
or whatever, and with rules e.g. preventing swear
words).
Alice sees “3817” (having seen many other 4-digit
plates previously).
Alice’s posterior probability over models is still
about 0.999 on the same model.
Wait. If you’re talking about surprise because you have said “update your model based on how surprised you are”, you can’t turn around and say “surprise is defined by how much you should update your model”. “update your model based on how much you should update your model” isn’t very helpful.
The intuitive sense of what surprise is corresponds well to the rules for updating your probability distribution over models, which we can therefore take as a formal definition of surprise.
Hmm, I thought about it some more and maybe it’s not that simple. If we formalize surprise like that, it’s easy to come up with situations where you expect to be very “surprised” no matter what data you see. That doesn’t seem right. Does anyone have better ideas?
Presumably, F folks talk about how “surprised” an element of a statistical model is, relative to observed data (maximum likelihood as minimizing surprise in KL sense). That’s about all I can think of.
Here’s an old puzzle:
Alice: How can we formalize the idea of “surprise”?
Bob: I think surprise is seeing an event of low probability.
Alice: This morning I saw a car whose license plate said 3817, and that didn’t surprise me at all!
Bob: Huh.
For everyone still wondering about that, here’s the correct answer! The numerical measure of surprise is information gain (Kullback-Leibler divergence) from your prior to your posterior over models after updating on the data. That gives the intuitive answer to the above puzzle, as long as none of your models assigned high probability to 3817 in advance. It also works for the opposite case, if you expected an ordered string but got a random one, or ordered in a different way.
This is actually well known, I just wanted to put it on LW.
Just to make sure I understand prior and posterior over models, is the following about right?
Alice starts with a prior of 0.999 that non-vanity plates are generated basically randomly (according to some rule of “N letters followed by M digits” or whatever, and with rules e.g. preventing swear words).
Alice sees “3817” (having seen many other 4-digit plates previously).
Alice’s posterior probability over models is still about 0.999 on the same model.
Yeah.
Wait. If you’re talking about surprise because you have said “update your model based on how surprised you are”, you can’t turn around and say “surprise is defined by how much you should update your model”. “update your model based on how much you should update your model” isn’t very helpful.
The intuitive sense of what surprise is corresponds well to the rules for updating your probability distribution over models, which we can therefore take as a formal definition of surprise.
Hmm, I thought about it some more and maybe it’s not that simple. If we formalize surprise like that, it’s easy to come up with situations where you expect to be very “surprised” no matter what data you see. That doesn’t seem right. Does anyone have better ideas?
How is a Frequentist surprised?
I’m missing a lot of knowledge to answer that. Can you?
Presumably, F folks talk about how “surprised” an element of a statistical model is, relative to observed data (maximum likelihood as minimizing surprise in KL sense). That’s about all I can think of.