The entropy maxim for binary questions

Link post

The entropy of a message () is roughly proportional to the uniformity of its probability distribution ( for each possible ). If the message has just two possible values, is greater exactly insofar as the split between their probabilities is close to 5050.

(bits)
0.50.51
0.70.30.88
0.80.20.72
0.90.10.47
0.950.050.29

The entropy of a message is, intuitively, proportional to its information content. Thus you can learn more efficiently by seeking messages generated in higher-entropy ways.

Assuming that people ask questions to get information, and that questions are strictly yes-or-no, or otherwise have two main answers (as in “which direction — east or west — leads to our destination?”), the best questions are those which separate options of roughly equal prior probability to the questioner.

The prior probability does not necessarily match the intuitive (but kinda meaningless) “objective probability”. E.g. with no specific information for the scenario, the “which direction” question is a 5050 split, but if you’re near the east coast of an island containing an otherwise-unknown destination, your priors should be biased in favour of “west”.

There are at least two uses of this maxim. You can use it yourself to guide your choice of questions. You can assume others follow it and, when they seem to violate it by asking weird binary questions, find that at least one of the maxim’s assumptions are false:

  • the questioner doesn’t seek information (as in rhetorical questions)

  • much of the questioner’s response-probability goes to answers other than the main binary (as in the seemingly-binary “is the destination in place A, or place B?”, when they really expect that they might get the longer answer “actually, C”)

  • the questioner doesn’t know enough information theory

  • the questioner’s priors are very different from what you expect (as in “are you homo or hetero?”, when they have information that favours the less common option)

Alas, I don’t (yet) know the relative frequency of those listed confusion modes.