Yeah, you’re right. I assumed that Bayes(M, P) is continuous in both arguments, but we don’t know what that means or how to prove it.
It’s not clear how to define continuity in P for a given M, because pointwise convergence doesn’t cut it and we don’t have any other ideas. But continuity in M for a given P seems to be easier, because we can define the distance between two M’s as the total weight of sentences where they disagree. Under this definition, I think I’ve found an example where Bayes(M, P) is not continuous in M.
1) Let’s take a language consisting of sentences phi_k = “x=k” for each natural k, and all their combinations using logical connectives (and/or/not). This language has models M_k = “phi_k is true, all others are false” for each k, and also M_inf = “all phi_k are false”.
2) To define the weights, fix c<1 and set mu(phi_k) = c*2^-k for each k, and distribute the remaining 1-c of weight among the remaining sentences arbitrarily.
3) For every k, any statement where M_k and M_inf disagree must include phi_k as a component, because all other phi_n have the same truth value in M_k and M_inf. The total weight of sentences that include phi_k goes to zero as k goes to infinity, so the model M_inf is the limit of models M_k under the above metric.
4) Now let’s define a probability assignment P so that P(phi_k) = (2^(-2^k))/(1+2^(-2^k)) for each k, and P(anything else) = 1⁄2. You can check that Bayes(M_k, P) has the same value for each k, but Bayes(M_inf, P) has a different, higher value.
I actually thought of exactly this construction at our last MIRIx on Saturday. It made me sad. However here is a question I couldn’t answer. Can you do the same thing with a coherent P? That I do not know.
If Bayes is continuous in M for coherent P then we would be able to prove the first conjecture.
Something slightly easier, maybe we can show that if Bayes is not continuous at the point M for coherent P, then P must assign nonzero probability to M. I am also not sure if this would prove the conjecture, but haven’t really thought about this angle yet.
Another interesting thing to think about: What happens if we start with the P you described and then do the algorithm in the second conjecture.
I haven’t told you everything about this algorithm yet. One frustrating thing is that if you want to change a probability from p to q in one step, you can write down how much your bayes score increases. Since bayes score is bounded above, you can only do this finitely many times. However, if you update a probability from p to q in a series of very small steps, you could in theory only increase your bayes score by an arbitrarily small amount. :(
I just recently started trying to prove the first conjecture without the second conjecture, so I have more hope that there is something easy there I just missed.
I think you know most of what I know about the first conjecture. Abram gave me a sketch of a proof that if P is the WCB maximizer then WCB(P) is Exp(Bayes(M,P)) (when M is chosen according to P), but we haven’t really checked it. If you (or someone else who would like to try to find this proof) would like to know more about the second conjecture, you can pm me your email, I will send you my personal notes, and we can set up a skype chat?
The second conjecture implies the first, and If that is true, I would really like to know, because I think it makes the result much nicer.
Yeah, you’re right. I assumed that Bayes(M, P) is continuous in both arguments, but we don’t know what that means or how to prove it.
It’s not clear how to define continuity in P for a given M, because pointwise convergence doesn’t cut it and we don’t have any other ideas. But continuity in M for a given P seems to be easier, because we can define the distance between two M’s as the total weight of sentences where they disagree. Under this definition, I think I’ve found an example where Bayes(M, P) is not continuous in M.
1) Let’s take a language consisting of sentences phi_k = “x=k” for each natural k, and all their combinations using logical connectives (and/or/not). This language has models M_k = “phi_k is true, all others are false” for each k, and also M_inf = “all phi_k are false”.
2) To define the weights, fix c<1 and set mu(phi_k) = c*2^-k for each k, and distribute the remaining 1-c of weight among the remaining sentences arbitrarily.
3) For every k, any statement where M_k and M_inf disagree must include phi_k as a component, because all other phi_n have the same truth value in M_k and M_inf. The total weight of sentences that include phi_k goes to zero as k goes to infinity, so the model M_inf is the limit of models M_k under the above metric.
4) Now let’s define a probability assignment P so that P(phi_k) = (2^(-2^k))/(1+2^(-2^k)) for each k, and P(anything else) = 1⁄2. You can check that Bayes(M_k, P) has the same value for each k, but Bayes(M_inf, P) has a different, higher value.
I actually thought of exactly this construction at our last MIRIx on Saturday. It made me sad. However here is a question I couldn’t answer. Can you do the same thing with a coherent P? That I do not know.
If Bayes is continuous in M for coherent P then we would be able to prove the first conjecture.
Something slightly easier, maybe we can show that if Bayes is not continuous at the point M for coherent P, then P must assign nonzero probability to M. I am also not sure if this would prove the conjecture, but haven’t really thought about this angle yet.
Another interesting thing to think about: What happens if we start with the P you described and then do the algorithm in the second conjecture.
I haven’t told you everything about this algorithm yet. One frustrating thing is that if you want to change a probability from p to q in one step, you can write down how much your bayes score increases. Since bayes score is bounded above, you can only do this finitely many times. However, if you update a probability from p to q in a series of very small steps, you could in theory only increase your bayes score by an arbitrarily small amount. :(
I just recently started trying to prove the first conjecture without the second conjecture, so I have more hope that there is something easy there I just missed.
I think you know most of what I know about the first conjecture. Abram gave me a sketch of a proof that if P is the WCB maximizer then WCB(P) is Exp(Bayes(M,P)) (when M is chosen according to P), but we haven’t really checked it. If you (or someone else who would like to try to find this proof) would like to know more about the second conjecture, you can pm me your email, I will send you my personal notes, and we can set up a skype chat?
The second conjecture implies the first, and If that is true, I would really like to know, because I think it makes the result much nicer.