Most of the weird stuff involving priors comes into being when you want posteriors over a continuous hypothesis space, where you get in trouble because reparametrizing your space changes the form of your prior, so a uniform “natural” prior is really a particular choice of parametrization. Using a discrete hypothesis space avoids big parts of the problem.
Basically, this shows that every term in a standard Bayesian inference, including the prior ratio, can be re-cast as a likelihood term in a setting where you start off unsure about what words mean, and have a flat prior over which set of words is true.
If the possible meanings of your words are a continuous one-dimensional variable x, a flat prior over x will not be a flat prior if you change variables to y = f(y) for an arbitrary bijection f, and the construction would be sneaking in a specific choice of function f.
Say the words are utterances about the probability of a coin falling heads, why should the flat prior be over the probability p, instead of over the log-odds log(p/(1-p)) ?
In my post, I didn’t require the distribution over meanings of words to be uniform. It could be any distribution you wanted—it just resulted in the prior ratio of “which utterance is true” being 1:1.
Using a discrete hypothesis space avoids big parts of the problem.
Only if there is a “natural” discretisation of the hypothesis space. It’s fine for coin tosses and die rolls, but if the problem itself is continuous, different discretisations will give the same problems that different continuous parameterisations do.
In general, when infinities naturally arise but cause problems, decreeing that everything must be finite does not solve those problems, and introduces problems of its own.
Most of the weird stuff involving priors comes into being when you want posteriors over a continuous hypothesis space, where you get in trouble because reparametrizing your space changes the form of your prior, so a uniform “natural” prior is really a particular choice of parametrization. Using a discrete hypothesis space avoids big parts of the problem.
Why wouldn’t this construction work over a continuous space?
If the possible meanings of your words are a continuous one-dimensional variable x, a flat prior over x will not be a flat prior if you change variables to y = f(y) for an arbitrary bijection f, and the construction would be sneaking in a specific choice of function f.
Say the words are utterances about the probability of a coin falling heads, why should the flat prior be over the probability p, instead of over the log-odds log(p/(1-p)) ?
In my post, I didn’t require the distribution over meanings of words to be uniform. It could be any distribution you wanted—it just resulted in the prior ratio of “which utterance is true” being 1:1.
Only if there is a “natural” discretisation of the hypothesis space. It’s fine for coin tosses and die rolls, but if the problem itself is continuous, different discretisations will give the same problems that different continuous parameterisations do.
In general, when infinities naturally arise but cause problems, decreeing that everything must be finite does not solve those problems, and introduces problems of its own.