One problem with mean squared error as a measure of communication accuracy is that it gives no incentive to accurately communicate the variance around the stated outcome. That is, suppose you are creating some precise electronics equipment where it is very important that you work with pure gold. In that case, you don’t just want the buyer to choose the description that minimizes the squared error (perhaps “1 kg pure gold”), but instead a description that minimizes e.g. negative log probability (perhaps “0.999-1.001 kg pure gold, 0.001 kg misc contaminants”).
Negative log probability is equivalent to squared error when working with univariate normal distributions with a fixed standard deviation/variance structure. (Proof: just take the logarithm of the normal distribution pdf and reduce the expression.) However, when working with distributions with varying variances, or with non-normal distributions, talking about negative log prob could be said to give better incentives for some aspects of information.
(There’s probably even better approaches if one e.g. takes into account the likely effects of the information. While your approach fails to give an incentive to accurately communicate uncertainty, your approach is better at incentivizing coming up with a categorization system that doesn’t lump things together if they behave differently (assuming the metric chosen in the beginning is sensible), perhaps taking into account the effects of the information could give the best of both worlds, incorporating both uncertainty and lumping.)
Most practical purposes that one might have for things probably satisfy some sort of convexity property: if 0.999 kg gold is OK, and 1.001 kg gold is OK, then 1.000 kg gold is also OK.
Insofar as this is true, restricting oneself to some notion of convex probability distribution might be useful.
One problem with mean squared error as a measure of communication accuracy is that it gives no incentive to accurately communicate the variance around the stated outcome. That is, suppose you are creating some precise electronics equipment where it is very important that you work with pure gold. In that case, you don’t just want the buyer to choose the description that minimizes the squared error (perhaps “1 kg pure gold”), but instead a description that minimizes e.g. negative log probability (perhaps “0.999-1.001 kg pure gold, 0.001 kg misc contaminants”).
Negative log probability is equivalent to squared error when working with univariate normal distributions with a fixed standard deviation/variance structure. (Proof: just take the logarithm of the normal distribution pdf and reduce the expression.) However, when working with distributions with varying variances, or with non-normal distributions, talking about negative log prob could be said to give better incentives for some aspects of information.
(There’s probably even better approaches if one e.g. takes into account the likely effects of the information. While your approach fails to give an incentive to accurately communicate uncertainty, your approach is better at incentivizing coming up with a categorization system that doesn’t lump things together if they behave differently (assuming the metric chosen in the beginning is sensible), perhaps taking into account the effects of the information could give the best of both worlds, incorporating both uncertainty and lumping.)
Most practical purposes that one might have for things probably satisfy some sort of convexity property: if 0.999 kg gold is OK, and 1.001 kg gold is OK, then 1.000 kg gold is also OK.
Insofar as this is true, restricting oneself to some notion of convex probability distribution might be useful.