There is no jump, because “I don’t know” is the maximum entropy distribution. The maximum entropy distribution is the distribution over probabilities which creates the maximum information-theoretic entropy, while obeying the observed parameters of the system. This works because entropy is just the expected value of the information gained from measuring a system. You want the maximum entropy distribution because anything else is literally pulling information out of thin air. If you pick a lower entropy distribution when you can construct a higher entropy one consistent with the data, then you’re expecting less information to be given on a measurement, as if you already knew something about it.
The maximum entropy hypothesis on any yes/no question is a 50⁄50 chance. At those odds, cryonics are great!
However, they probably have information which adjusts their probability down. An actual “I don’t know” would be the result of a coinflip, whereas anything under than a 50% probability of cryonics working is based on information which makes you think it’s unlikely. So they have beliefs about it.
Whatever people mean by “I don’t know”, the way they think about it bears no resemblance to the way you discuss the maximum entropy distribution here I’m afraid. If that’s what they meant, they would mean something sensible, and I don’t think they do.
Next time someone uses “I don’t know” to try and justify not making a decision, I’ll try to see if I can explain the maximum entropy distribution, and convince them that that’s how it should be approached.
I anticipate that the main difficulty will be in convincing people that they have to assign a probability, and that even if they don’t they’re implicitly choosing one based on their actions.
There was a comment writer on LW who assumed that a probabilistic argument that referred to the word “bet” applied only to gambling wagers. He had no reply when someone pointed out that the probabilistic argument under consideration worked even when every decision by every agent is considered a bet.
Rhetorical tactics like using the word “bet” in a very inclusive sense strike me as more useful for the OP’s purpose than explaining the MAXENT prior.
I don’t think entropy quite works that way. For notational convenience, let Q(p) denote the entropy of p. Then just because Q(p) > Q(q), does not mean that q is strictly more informative than p. In other words, it is not the case that there is some total ordering on distributions, such that for any p,q with Q(p) > Q(q), I can get from p to q with Q(p)-Q(q) bits of information. The closest statement you can make would be in terms of KL divergence, but it is important to note that both KL(p||q) and KL(q||p) are positive, so KL is providing a distance, not an ordering.
Also note that entropy does not in fact decrease with more information. It decreases in expectation, and even then only relative to the subjective belief distribution. But this isn’t even a particularly special property. Jensen’s inequality together with conservation of expected evidence implies that, instead of Q(p) = E[-log(p(x))], we could have taken any concave function Q over the space of probability distributions, which would include functions of the form Q(p) = E[f(p(x))] as long as 2f’(z)+zf″(z) ⇐ 0 for all z.
[Proof of the statement about Jensen: Let p2 be the distribution we get from p after updating. Then E[f(p2) | p] ⇐ f(E[p2 | p]) = f(p), where ⇐ is Jensen applied to f and E[p2 | p] = p by conservation of expected evidence.]
EDIT: For the interested reader, this is also strongly related to Doob’s martingale convergence theorem, as your beliefs are a martingale and any concave function of them is a supermartingale.
I don’t think they really mean maximum entropy, though. There seems to be “I don’t know, it’s 50/50” and then “I don’t know, but it’s obviously skewed this way, and I have strong confidence that there are unknown-unknowns that will skew it further when they’re discovered”
About two...
There is no jump, because “I don’t know” is the maximum entropy distribution. The maximum entropy distribution is the distribution over probabilities which creates the maximum information-theoretic entropy, while obeying the observed parameters of the system. This works because entropy is just the expected value of the information gained from measuring a system. You want the maximum entropy distribution because anything else is literally pulling information out of thin air. If you pick a lower entropy distribution when you can construct a higher entropy one consistent with the data, then you’re expecting less information to be given on a measurement, as if you already knew something about it.
The maximum entropy hypothesis on any yes/no question is a 50⁄50 chance. At those odds, cryonics are great!
However, they probably have information which adjusts their probability down. An actual “I don’t know” would be the result of a coinflip, whereas anything under than a 50% probability of cryonics working is based on information which makes you think it’s unlikely. So they have beliefs about it.
Whatever people mean by “I don’t know”, the way they think about it bears no resemblance to the way you discuss the maximum entropy distribution here I’m afraid. If that’s what they meant, they would mean something sensible, and I don’t think they do.
Next time someone uses “I don’t know” to try and justify not making a decision, I’ll try to see if I can explain the maximum entropy distribution, and convince them that that’s how it should be approached.
I anticipate that the main difficulty will be in convincing people that they have to assign a probability, and that even if they don’t they’re implicitly choosing one based on their actions.
There was a comment writer on LW who assumed that a probabilistic argument that referred to the word “bet” applied only to gambling wagers. He had no reply when someone pointed out that the probabilistic argument under consideration worked even when every decision by every agent is considered a bet.
Rhetorical tactics like using the word “bet” in a very inclusive sense strike me as more useful for the OP’s purpose than explaining the MAXENT prior.
See my comment above which shows that the arguments surrounding maximum entropy are rather confused.
I don’t think entropy quite works that way. For notational convenience, let Q(p) denote the entropy of p. Then just because Q(p) > Q(q), does not mean that q is strictly more informative than p. In other words, it is not the case that there is some total ordering on distributions, such that for any p,q with Q(p) > Q(q), I can get from p to q with Q(p)-Q(q) bits of information. The closest statement you can make would be in terms of KL divergence, but it is important to note that both KL(p||q) and KL(q||p) are positive, so KL is providing a distance, not an ordering.
Also note that entropy does not in fact decrease with more information. It decreases in expectation, and even then only relative to the subjective belief distribution. But this isn’t even a particularly special property. Jensen’s inequality together with conservation of expected evidence implies that, instead of Q(p) = E[-log(p(x))], we could have taken any concave function Q over the space of probability distributions, which would include functions of the form Q(p) = E[f(p(x))] as long as 2f’(z)+zf″(z) ⇐ 0 for all z.
[Proof of the statement about Jensen: Let p2 be the distribution we get from p after updating. Then E[f(p2) | p] ⇐ f(E[p2 | p]) = f(p), where ⇐ is Jensen applied to f and E[p2 | p] = p by conservation of expected evidence.]
EDIT: For the interested reader, this is also strongly related to Doob’s martingale convergence theorem, as your beliefs are a martingale and any concave function of them is a supermartingale.
I don’t think they really mean maximum entropy, though. There seems to be “I don’t know, it’s 50/50” and then “I don’t know, but it’s obviously skewed this way, and I have strong confidence that there are unknown-unknowns that will skew it further when they’re discovered”
In that case, you should be able to use how strongly you anticipate the skewing to create a probability estimate.
I am not aware of any mathematical conversion between “I’m pretty sure you’re wrong” and a specific probability estimate.