William,
By considering models in the first place, one is already using Occam’s razor. With no preference for simplicity in the priors at all, one would start with uniform priors for all possible data sequences, not finite-parameter models of data sequences. If you formalize models as being programs for Turing machines which have a separate tape for inputting the program, and your prior is a uniform distribution over possible inputs on that tape, you exactly recover the 2^-k Occam’s razor law, where k is the number of program bits that the Turing machine reads during its execution (i.e. the Kolmogorov complexity).
Interestingly, the degree to which one can outperform this distribution is provably bounded. Suppose you are considering some distribution of priors and you want to see how much it outperforms the 2^-k distribution. It can do so if its prior for the true model is higher than 2^-k. So you compute the ratio of this prior to 2^-k for every model. If I remember correctly, there is a theorem which says that this set of ratios will have a finite supremum for any computable prior distribution (which is not obvious since the set of models is infinite). Of course, in some respects, this is an unfair comparison since the 2^-k distribution is itself uncomputable. Hopefully I am not misstating the theorem. I think it is stated and proved in the book by Li and Vitanyi.
To see why ignoring Occam’s razor is unthinkable (and to be amused), consider the following joke.
An astronaut visits a planet with intelligent aliens. These aliens believe in the reverse induction principle. That is, whatever has happened in the past is unlikely to be what will happen in the future. That the sun has risen every previous day is to them not evidence that it will rise tomorrow, but the contrary. Unsurprisingly, this causes all sorts of problems for the aliens, and they are starving and miserable. The astronaut asks them, “Why are you clinging to this belief, given that it obviously causes you so much suffering?” The aliens respond, “Well...it’s never worked well for us in the past!”
It seems to me that the cancellation is an artifact of the particular example, and that it would be easy to come up with an example in which the cancellation does not occur. For example, maybe you have previous experience with the mugger. He has mugged you before about minor things and sometimes you have paid him and sometimes not. In all cases he has been true to his word. This would seem to tip the probabilities at least slightly in favor of him being truthful about his current much larger threat.