I’m not convinced that example 1 works. “A and B and C” and “A or B or C” do have roughly the same complexity. It is then after we look at the content that we then update our estimates based on the logical form of the claims to give P(A and B and C) < P(A or B or C).
A prior—especially one as fundamental as the universal prior—must be coherent from the start. Inspecting your hypotheses more closely without looking at the universe should never cause “updating”.
To be fair, would you agree that this is a big weakness of probability theory? It can be really difficult to sort out some of those boolean combinations.
I’m pretty sure it doesn’t break probability theory per se, only limits its applicability. For example, Wei Dai wants UDT to use a “mathematical intuition module” that assigns degrees of belief to math statements, while I’m skeptical of the idea.
I argued here that it means that an AI can’t compute its own prior. That argument is about all mathematical statements. If we just restrict to boolean combinations on a finite sample space then I think a similar argument shows that an AI can’t compute its own prior in polynomial time, unless P = NP.
Edit: Wait, that last might be boneheaded. For given inputs we can compute Boolean expressions pretty efficiently, I think we just can’t efficiently determine whether they’re tautologies.
That depends on what statements are “expressible” by the AI. If it can use quantification (“this boolean formula is true for some/all inputs”), computing the prior becomes NP-hard. An even trickier case: imagine you program the AI with ZFC and ask it for its prior about the continuum hypothesis? On the other hand, you may use clever programming to avoid evaluating prior values that are difficult or impossible to evaluate, like Monte Carlo AIXI does.
Your argument looks correct: being able to compute the value of the prior for any sentence you can represent is a very strong condition. In the boolean satisfiability setting this corresponds to P=NP, and in stronger settings it corresponds to incompleteness (e.g. you believe in ZFC, what’s your prior for CH?) On the other hand, you may cleverly maneuver so that all prior values you end up calculating in practice will be tractable, like in Monte Carlo AIXI.
OP was talking about priors, which is the part before we look at the content. OP is saying, I think, that the prior should be the fraction of all possible worlds in which the proposition is true.
Whether you accept this example depends on how you interpret the phrase “possible worlds”. If “all possible worlds” is all possible values of a binary string, and “possible worlds in which P is true” is all binary strings in which position P = 1, then A or B or C gets a higher prior than A and B and C.
However, if you compute your “prior” not when you are a Cartesian blank slate with no evolutionary history, but when you already know something about the world, then “all possible worlds” could mean “all possible values of A, B, and C that are consistent with previous information”. In that case, I would expect that “A and B and C” and “A or B or C” would have more similar priors, provided that the set {A, B, C} was chosen deliberately. A person reasoning about A, B, and C probably chose them due to some perceived commonality, meaning there is some underlying rule about things in this set, and the hypothesis being tested is really “is this an ‘OR’ rule or an ‘AND’ rule?”
Consider that the complexity of “A and B” is a lot less than “(A and B) or (A and C) or (A and D)”, but P(A and B) < P((A and B) or (A and C) or (A and D))
I’m not convinced that example 1 works. “A and B and C” and “A or B or C” do have roughly the same complexity. It is then after we look at the content that we then update our estimates based on the logical form of the claims to give P(A and B and C) < P(A or B or C).
A prior—especially one as fundamental as the universal prior—must be coherent from the start. Inspecting your hypotheses more closely without looking at the universe should never cause “updating”.
To be fair, would you agree that this is a big weakness of probability theory? It can be really difficult to sort out some of those boolean combinations.
Edit: Why the downvote?
I’m pretty sure it doesn’t break probability theory per se, only limits its applicability. For example, Wei Dai wants UDT to use a “mathematical intuition module” that assigns degrees of belief to math statements, while I’m skeptical of the idea.
I argued here that it means that an AI can’t compute its own prior. That argument is about all mathematical statements. If we just restrict to boolean combinations on a finite sample space then I think a similar argument shows that an AI can’t compute its own prior in polynomial time, unless P = NP.
Edit: Wait, that last might be boneheaded. For given inputs we can compute Boolean expressions pretty efficiently, I think we just can’t efficiently determine whether they’re tautologies.
That depends on what statements are “expressible” by the AI. If it can use quantification (“this boolean formula is true for some/all inputs”), computing the prior becomes NP-hard. An even trickier case: imagine you program the AI with ZFC and ask it for its prior about the continuum hypothesis? On the other hand, you may use clever programming to avoid evaluating prior values that are difficult or impossible to evaluate, like Monte Carlo AIXI does.
Far out.
Sorry, my reply was off track, so I deleted it once again.
(...)
Your argument looks correct: being able to compute the value of the prior for any sentence you can represent is a very strong condition. In the boolean satisfiability setting this corresponds to P=NP, and in stronger settings it corresponds to incompleteness (e.g. you believe in ZFC, what’s your prior for CH?) On the other hand, you may cleverly maneuver so that all prior values you end up calculating in practice will be tractable, like in Monte Carlo AIXI.
OP was talking about priors, which is the part before we look at the content. OP is saying, I think, that the prior should be the fraction of all possible worlds in which the proposition is true.
Whether you accept this example depends on how you interpret the phrase “possible worlds”. If “all possible worlds” is all possible values of a binary string, and “possible worlds in which P is true” is all binary strings in which position P = 1, then A or B or C gets a higher prior than A and B and C.
However, if you compute your “prior” not when you are a Cartesian blank slate with no evolutionary history, but when you already know something about the world, then “all possible worlds” could mean “all possible values of A, B, and C that are consistent with previous information”. In that case, I would expect that “A and B and C” and “A or B or C” would have more similar priors, provided that the set {A, B, C} was chosen deliberately. A person reasoning about A, B, and C probably chose them due to some perceived commonality, meaning there is some underlying rule about things in this set, and the hypothesis being tested is really “is this an ‘OR’ rule or an ‘AND’ rule?”
Consider that the complexity of “A and B” is a lot less than “(A and B) or (A and C) or (A and D)”, but P(A and B) < P((A and B) or (A and C) or (A and D))