Need an example? Sure! I have two dice, and they can each land on any number, 1-6. I’m assuming they are fair, so each has probability of 1⁄6, and the logarithm (base 2) of 1⁄6 is about −2.585. There are 6 states, so the total is 6* (1/6) * 2.585 = 2.585. (With two dice, I have 36 possible combinations, each with probability 1⁄36, log(1/36) is −5.17, so the entropy is 5.17. You may have notices that I doubled the number of dice involved, and the entropy doubled – because there is exactly twice as much that can happen, but the average entropy is unchanged.) If I only have 2 possible states, such as a fair coin, each has probability of 1⁄2, and log(1/2)=-1, so for two states, (-0.5*-1)+(-0.5*-1)=1. An unfair coin, with a ¼ probability of tails, and a ¾ probability of heads, has an entropy of 0.81. Of course, this isn’t the lowest possible entropy – a trick coin with both sides having heads only has 1 state, with entropy 0. So unfair coins have lower entropy – because we know more about what will happen.
I’ve had to calculate information entropy for a data compression course, so I felt like I already knew the concepts you were trying to explain here, but I was not able to follow your explanation at all.
the logarithm (base 2) of 1⁄6 is about −2.585. There are 6 states, so the total is 6* (1/6) * 2.585 = 2.585.
The total what? Total entropy for the two dice that you have? For just one of those two dice? log(1/6) is a negative number, so why do I not see any negative numbers used in your equation? There are 6 states, so I guess that sort of explains why you’re multiplying some figure by 6, but why are you dividing by 6?
If I only have 2 possible states, such as a fair coin, each has probability of 1⁄2, and log(1/2)=-1, so for two states, (-0.5*-1)+(-0.5*-1)=1.
Why do you suddenly switch from the notation 1⁄2 to the notation 0.5? Is that significant (they’re referring to different concepts who coincidentally happen to have equal values)? If they actually refer to the same value, why do we have the positive value 1⁄2, but negative value −0.5?
Suggestion:
Do fair coin first, then fair dice, then trick coin.
Point out that a fair coin has 2 outcomes when flipped, each with equal probability, so it has entropy [-1/2 log2(1/2)] + [-1/2 log2(1/2)] = (1/2) + (1/2) = 1.
Point out a traditional fair dice has 6 outcomes when rolled, each of equal probability, and so it has entropy ∑n=1 to 6 of −1/6 log2(1/6) =~ 6 * −1/6 * −2.585 = 2.585.
Point out that a trick coin that always comes up heads has 1 outcome when flipped, so it has entropy −1 log2(1/1) = 0.
Point out that a trick coin that always comes up heads 75% of the time has entropy [-3/4 log2(3/4)]+[-1/4 log2(1/4)] =~ 0.311 + 0.5 = 0.811.
Consistently use the same notation for each example (I sort of got lazy and used ∑ for the dice to avoid writing out a value 6 times). In contrast, do not use 6 * (1/6) * 2.585 = 2.585 for one example (where all the factors are positive) and then (-0.5*-1)+(-0.5*-1)=1 for another example (where we rely on pairs of negative factors to become positive).
Feedback:
I’ve had to calculate information entropy for a data compression course, so I felt like I already knew the concepts you were trying to explain here, but I was not able to follow your explanation at all.
The total what? Total entropy for the two dice that you have? For just one of those two dice? log(1/6) is a negative number, so why do I not see any negative numbers used in your equation? There are 6 states, so I guess that sort of explains why you’re multiplying some figure by 6, but why are you dividing by 6?
Why do you suddenly switch from the notation 1⁄2 to the notation 0.5? Is that significant (they’re referring to different concepts who coincidentally happen to have equal values)? If they actually refer to the same value, why do we have the positive value 1⁄2, but negative value −0.5?
Suggestion:
Do fair coin first, then fair dice, then trick coin.
Point out that a fair coin has 2 outcomes when flipped, each with equal probability, so it has entropy [-1/2 log2(1/2)] + [-1/2 log2(1/2)] = (1/2) + (1/2) = 1.
Point out a traditional fair dice has 6 outcomes when rolled, each of equal probability, and so it has entropy ∑n=1 to 6 of −1/6 log2(1/6) =~ 6 * −1/6 * −2.585 = 2.585.
Point out that a trick coin that always comes up heads has 1 outcome when flipped, so it has entropy −1 log2(1/1) = 0.
Point out that a trick coin that always comes up heads 75% of the time has entropy [-3/4 log2(3/4)]+[-1/4 log2(1/4)] =~ 0.311 + 0.5 = 0.811.
Consistently use the same notation for each example (I sort of got lazy and used ∑ for the dice to avoid writing out a value 6 times). In contrast, do not use
6 * (1/6) * 2.585 = 2.585
for one example (where all the factors are positive) and then(-0.5*-1)+(-0.5*-1)=1
for another example (where we rely on pairs of negative factors to become positive).