Imagine taking the proofs of all provable propositions that can be expressed in less than N characters (for some very large number N), writing them as conjunctions of trivial statements, then randomly arranging all of those trivial statements in one extremely long sequence.
Let Z be the conjunction of all these statements (randomly ordered as above).
Then Z is logically stronger than Y. But our subjective Bayesian cannot use Z to prove X (Finding Moby Dick in the Library of Babel is no easier than just writing Moby Dick oneself, and we’ve already assumed that our subject is unable to prove X under his own steam) whereas he can use Y to prove X.
If you say that then you’re conceding the point, because Y is nothing other than the conjunction of a carefully chosen subset of the trivial statements comprising Z, re-ordered so as to give a proof that can easily be followed.
Figuring out how to reorder them requires mathematical knowledge, a special kind of knowledge that can be generated, not just through contact with the external world, but through spending computer cycles on it.
Yes. It’s therefore important to quantify how many computer cycles and other resources are involved in computing a prior. There is a souped-up version of taw’s argument along those lines: either P = NP, or else every prior that is computable in polynomial time will fall for the conjunction fallacy.
If you want to make it a bit less unrealistic, imagine there are only, say, 1000 difficult proofs randomly chopped and spliced rather than a gazillion—but still too many for the subject to make head or tail of. Perhaps imagine them adding up to a book about the size of the Bible, which a person can memorize in its entirety given sufficient determination.
It’s not even guaranteed to be true (but you can verify that yourself much more easily than X directly).
Compare this with classical result of conjunction fallacy. In experiments people routinely believed that:
P(next year the Soviet Union will invade Poland, and the United States will break off diplomatic relations with the Soviet Union) > P(next year the United States will break off diplomatic relations with the Soviet Union).
Here X=the United States will break off diplomatic relations with the Soviet Union.
Y=the Soviet Union will invade Poland.
Wouldn’t your reasoning pretty much endorse what people were doing? (with Y—one possible scenario leading to X—being new information)
Hmmm, I now think the existence of Y is actually a distraction. The underlying process is:
produce estimate for P(X) ⇒ find proof of X ⇒ P(X) increases
If estimates are allowed to change in this manner, then of course they are allowed to change when someone else shows you a proof of X. (since P(X)=P(X) is also a law of probability) If they are not allowed to change in this manner, then subjective Bayesianism applied to mathematical laws collapses anyways.
From a purely human psychological perspective:
When someone tells me a proof of a theorem, it feels like I’ve learned something.
When I figure one out myself, it feels like I figured something out, as if I’d learned information through interacting with the natural world.
Are you meaning to tell me that no one learns anything in math class? Or they learn something, but the thing they learn isn’t information?
Caveat: Formalizing the concepts requires us to deviate from human experience sometimes. I don’t think the concept of information has been formalized, by Bayesians or Frequentists, in a way that deals with the problem of acting with limited computing time, aka the problem of logical uncertainty.
Would you agree that “P(X)” you’re describing is more like “some person’s answer when asked question X” than “probability of X”?
The main difference between two is that if “X” and “Y” are the same logical outcome, then their probabilities are necessarily the same, but an actual person can reply differently depending on how question was formulated.
If you’re interested in this subject, I recommend reading about epistemic modal logic—not necessarily for their solutions, but they’re clearly aware of this problem, and can describe it better than me.
The obvious solution:
Y is information. It is not information about the world, but it is information—information about math.
I don’t think that works.
Imagine taking the proofs of all provable propositions that can be expressed in less than N characters (for some very large number N), writing them as conjunctions of trivial statements, then randomly arranging all of those trivial statements in one extremely long sequence.
Let Z be the conjunction of all these statements (randomly ordered as above).
Then Z is logically stronger than Y. But our subjective Bayesian cannot use Z to prove X (Finding Moby Dick in the Library of Babel is no easier than just writing Moby Dick oneself, and we’ve already assumed that our subject is unable to prove X under his own steam) whereas he can use Y to prove X.
The Bayesian doesn’t know Z is stronger than Y. He can’t even read all of Z. Or if you compress it, he can’t decompress it.
P(Y|Z)<1.
If you say that then you’re conceding the point, because Y is nothing other than the conjunction of a carefully chosen subset of the trivial statements comprising Z, re-ordered so as to give a proof that can easily be followed.
Figuring out how to reorder them requires mathematical knowledge, a special kind of knowledge that can be generated, not just through contact with the external world, but through spending computer cycles on it.
Yes. It’s therefore important to quantify how many computer cycles and other resources are involved in computing a prior. There is a souped-up version of taw’s argument along those lines: either P = NP, or else every prior that is computable in polynomial time will fall for the conjunction fallacy.
Imagine he has read and memorized all of Z.
If you want to make it a bit less unrealistic, imagine there are only, say, 1000 difficult proofs randomly chopped and spliced rather than a gazillion—but still too many for the subject to make head or tail of. Perhaps imagine them adding up to a book about the size of the Bible, which a person can memorize in its entirety given sufficient determination.
Oh I see. Chopped and spliced. That makes more sense. I missed that in your previous comment.
The Bayesian still does not know that Z implies Y, because he has not found Y in Z, so there still isn’t a problem.
In which sense is Y information?
It’s not even guaranteed to be true (but you can verify that yourself much more easily than X directly).
Compare this with classical result of conjunction fallacy. In experiments people routinely believed that:
P(next year the Soviet Union will invade Poland, and the United States will break off diplomatic relations with the Soviet Union) > P(next year the United States will break off diplomatic relations with the Soviet Union).
Here X=the United States will break off diplomatic relations with the Soviet Union. Y=the Soviet Union will invade Poland.
Wouldn’t your reasoning pretty much endorse what people were doing? (with Y—one possible scenario leading to X—being new information)
Hmmm, I now think the existence of Y is actually a distraction. The underlying process is:
produce estimate for P(X) ⇒ find proof of X ⇒ P(X) increases
If estimates are allowed to change in this manner, then of course they are allowed to change when someone else shows you a proof of X. (since P(X)=P(X) is also a law of probability) If they are not allowed to change in this manner, then subjective Bayesianism applied to mathematical laws collapses anyways.
From a purely human psychological perspective: When someone tells me a proof of a theorem, it feels like I’ve learned something. When I figure one out myself, it feels like I figured something out, as if I’d learned information through interacting with the natural world.
Are you meaning to tell me that no one learns anything in math class? Or they learn something, but the thing they learn isn’t information?
Caveat: Formalizing the concepts requires us to deviate from human experience sometimes. I don’t think the concept of information has been formalized, by Bayesians or Frequentists, in a way that deals with the problem of acting with limited computing time, aka the problem of logical uncertainty.
I think we almost agree already ;-)
Would you agree that “P(X)” you’re describing is more like “some person’s answer when asked question X” than “probability of X”?
The main difference between two is that if “X” and “Y” are the same logical outcome, then their probabilities are necessarily the same, but an actual person can reply differently depending on how question was formulated.
If you’re interested in this subject, I recommend reading about epistemic modal logic—not necessarily for their solutions, but they’re clearly aware of this problem, and can describe it better than me.