True. I was thinking of “surprised” as “assigned a small probability to” rather than “did not assign 100% probability to”. If we use the latter interpretation, then there is more interesting contradiction. So much for my resolution.
Information theory has useful concepts for this situation (as usual). It uses the term “surprisal” as a way to quantify how surprising an outcome is. It is equal to the log of the inverse of the probability ( log (1/p) ) you had assigned to an event before you learn that it happened. [1]
What surprisal value should the judge’s statement be interpreted as meaning? One first approach would be to say that the judge means the prisoner will find the result more surprising than if he had simply assumed an equal probability to the seven days. Thus, the judge is saying that “the surprisal, or information gain, from learning your execution date will be greater than log(7).”
So, uh, how on earth are you supposed to move your probability distribution over execution days upon being given that kind of evidence? If you (wisely) start from a uniform probability distribution, you already have, in expectation, the maximum surprisal value. (Entropy is equal to the “expected” [i.e., probability-weighted] surprisal, and a uniform distribution is maximum entropy.)
No change in probability distribution can increase the expected surprisal—unless, of course, you deliberately skew your PD so that it decreases the weight on when you “really” expect to be executed. But then that brings up the messy issue of what you really believe vs. what you believe you believe.
[1]Consequently, it is equal to how much information you get upon observing the event—observing improbable events tells you more than observing probable ones. Intuitively, do you learn more from when a suspect says they’re guilty, or when they claim innocence?
True. I was thinking of “surprised” as “assigned a small probability to” rather than “did not assign 100% probability to”. If we use the latter interpretation, then there is more interesting contradiction. So much for my resolution.
Information theory has useful concepts for this situation (as usual). It uses the term “surprisal” as a way to quantify how surprising an outcome is. It is equal to the log of the inverse of the probability ( log (1/p) ) you had assigned to an event before you learn that it happened. [1]
What surprisal value should the judge’s statement be interpreted as meaning? One first approach would be to say that the judge means the prisoner will find the result more surprising than if he had simply assumed an equal probability to the seven days. Thus, the judge is saying that “the surprisal, or information gain, from learning your execution date will be greater than log(7).”
So, uh, how on earth are you supposed to move your probability distribution over execution days upon being given that kind of evidence? If you (wisely) start from a uniform probability distribution, you already have, in expectation, the maximum surprisal value. (Entropy is equal to the “expected” [i.e., probability-weighted] surprisal, and a uniform distribution is maximum entropy.)
No change in probability distribution can increase the expected surprisal—unless, of course, you deliberately skew your PD so that it decreases the weight on when you “really” expect to be executed. But then that brings up the messy issue of what you really believe vs. what you believe you believe.
[1]Consequently, it is equal to how much information you get upon observing the event—observing improbable events tells you more than observing probable ones. Intuitively, do you learn more from when a suspect says they’re guilty, or when they claim innocence?