Well, it really is defined that way. Before doing math, it’s important to understand that entropy is a way of quantifying our ignorance about something, so it makes sense that you’re most ignorant when (for discrete options) you can’t pick out one option as more probable than another.
Okay, on to using the definition of entropy as the sum over event-space of -P log(P) of all the events. E.g. if you only had one possible event, with probability 1, your entropy would be 1 log(1) = 0. Suppose you had two events with different probabilities. If you changed the probability assignment so their probability gets closer together, entropy goes up. This is because the function -P log(P) is concave downwards between 0 and 1 - this means that the entropy is always higher between two points than you’d get by just averaging those points (or taking any weighted average, represented by a straight line connecting the two points).. So if you want to maximize entropy, you move all the points together as far as they can go.
Well, it really is defined that way. Before doing math, it’s important to understand that entropy is a way of quantifying our ignorance about something, so it makes sense that you’re most ignorant when (for discrete options) you can’t pick out one option as more probable than another.
Okay, on to using the definition of entropy as the sum over event-space of -P log(P) of all the events. E.g. if you only had one possible event, with probability 1, your entropy would be 1 log(1) = 0. Suppose you had two events with different probabilities. If you changed the probability assignment so their probability gets closer together, entropy goes up. This is because the function -P log(P) is concave downwards between 0 and 1 - this means that the entropy is always higher between two points than you’d get by just averaging those points (or taking any weighted average, represented by a straight line connecting the two points).. So if you want to maximize entropy, you move all the points together as far as they can go.