Imagine that P (The WCB maximizer) does the same on all models except one, and does better on that one model.
I don’t think that’s possible. Bayes(M, P) for a fixed P is a continuous function of M, if you define the distance between two models as the total weight of sentences where they disagree. Or am I missing something?
The problem is that a limit of flat probability distributions is not necessarily flat.
I don’t think that’s possible. Bayes(M, P) for a fixed P is a continuous function of M, if you define the distance between two models as the total weight of sentences where they disagree. Or am I missing something?
Wait, how can that be? Do you have an example?