The main question in my mind is whether your idea of stopping logical analysis short just before the logic system would prove what the system’s action would be, leads to desirable properties.
Well, it leads to the right answer in simple examples. For example if A is an ambient control algorithm for the utility defined by “if A()=0 then U=u0, if A()=1 then U()=u1”, for ui numbers with short binary expansion the value of the choice “0″ will be u0 and the value of the choice “1” will be u1. This is because there is a simple proof of the statement “if A() = i then U=ui” whereas the actual value of A() cannot be deduced easily (with a short proof). If the ui have long/infinite binary expansion, the values will be approximations of the ui with precision depending on the depth of analysis for which A() becomes decidable.
It also does not address naturalistic self-trust.
Yes. The problem of self-trust resurfaces in my formulation as the problem of choosing F. Roughly speaking, G might not be able to use the power of F to full extent in its reasoning since F cannot prove its own soundness.
As you observe, it doesn’t seem to make any unprovable things converge to probability 1
I didn’t say it won’t make any unprovable things converge to probability 1. My guess is that if a statement of the form “P(n) for all n” is provable for any given n, then its probability should converge to 1 even if it’s unprovable. But I don’t have a proof.
I didn’t say it won’t make any unprovable things converge to probability 1. My guess is that if a statement of the form “P(n) for all n” is provable for any given n, then its probability should converge to 1 even if it’s unprovable. But I don’t have a proof.
If a convergent probability distribution has this property, then it will also have the property of converging to probability 0 for some true sentences. To be more precise: using the language of the arithmetic hierarchy, if true Pi_1 statements converge to 1, there will be some true Pi_2 statements which converge to 0. (This was proven by Will Sawin at the MIRI workshop I attended.)
Overall, if Benja’s distribution does converge, I would be surprised if it had this property—it seems too similar to mine, which does not have this property. I could be wrong, though. My true intuition is that Benja’s distribution does not converge.
Hmm. P(X) in Benja’s distribution is (if I’m not mistaken) the probability that a statement will be true after we’ve randomly assigned true or false to all the sentences in our finite consideration set, given that our random assignment is consistent by our finite check. (Let’s assume X is always in the set of sentences we are considering.)
Let’s assume that we’re expanding both our set of sentences and our consistency check, but let’s also assume (for sake of informal argument) that we’re expanding our consistency check fast enough that we don’t have to worry about inconsistent stuff (so, we’re expanding it incomputably faster than we’re expanding our set of sentences).
P(X) at any given stage is the probability that X is true in a 50-50 random assignment (to the sentences in our current set), given that the assignment is consistent. In other words, it’s the following ratio: the probability that an assignment including X is consistent, over the probability that any assignment is consistent.
The limit of both parts of this ratio is zero; the set of consistent assignments is a set of measure zero, since we are bound to eventually add some inconsistency. (We will definitely add in a contradiction at some point if we keep adding sentences with coin flips to determine their truth values.)
(Benja states that we’re dealing with a measurable set here, at least.)
My intuition is that this won’t converge, since we may infinitely often add a statement that significantly changes the ratio. I don’t see that we can put a bound on how much the ratio will change at any point (whereas with my prior, there is a bound, though an incomputable one, because the measure of any remaining inconsistencies drops monotonically).
I don’t see the need for 2 parameters. The way I formulated it, there is only 1 parameter: the depth of analysis D. I always consider all sentences. This makes sense because all but a finite set of sentences cannot figure in a contradiction of length ⇐ D, so all but a finite set of sentences get probability 1⁄2 for any given D.
Regarding convergence of probabilities of undecidable statements as D → infinity, well, I don’t know how to prove it, but I also don’t know how to disprove it. I can try to assign a probability to it… ;) Is the result by Sawin you mentioned published somewhere?
Is it important to the analysis whether the probabilities converge as D tends to infinity? Do you rely on this at any point?
If you need to make sure the probabilities converge, then you could consider something like the following.
First, split the sentences in your system F into “positive” sentences (the ones which have no leading “not” symbol, ~, or else which have an even number of leading “not” symbols) and “negative” sentences (the ones with an odd number of leading “not” symbols). Sort the positive sentences by length, and then sort them lexicographically within all sentences of the same length. This will give a list s1, s2 etc.
We will now build a growing subset S of sentences, and ensure that in the limit, S is consistent and complete.
At stage 0, S is empty. Call this S0.
At stage n: Toss a fair coin, and if it lands heads then add sn to S. If it lands tails then add ~sn to S.
Next, systematically search through all proofs of length up to the length of sn to see if there are any inconsistencies in S. If a proof of inconsistency is found, then list the subset of positive and negative sentences which create the inconsistency e.g. {sa, ~sb, ~sc, …, sz}; let k be the largest index in this list (the largest value of a, b, …, z). If sk was in S, then remove it and add ~sk instead, or if ~sk was in S then replace it by sk. Restart the search for inconsistencies. When the search completes without finding any further inconsistencies we have the set Sn.
We now take the obvious limit Sw of the sets Sn as n tends to infinity: s will belong to Sw if and only if it belongs to all but a finite number of Sn. That limit Sw is guaranteed to exist, because for each m, the process will eventually find a consistent choice of sentences and negations from within {s1, … ,sm} and then stick with it (any contradictions found later will cause the replacement of sentences sk or ~sk with k > m). Further, Sw is guaranteed to be consistent (because any contradictions would have been discovered and removed in some Sn). Finally Sw is guaranteed to be complete, since it contains either sm or ~sm for each m.
We can now define probabilities PD(s) = P(s is a member of SD) and take P(s) as the limit of PD(s) as D tends to infinity. The limit is always defined since it is just the probability that s is a member of Sw.
So, I could instead think of iteratively eliminating inconsistent sets of sentences from the measure M, and looking at the limit of M(X)/M(all). I will assume for convenience that we eliminate one inconsistency at a time (so, remove mass from one set of sentences at a time). Let’s call this M{d}, so M{d+1} has one more inconsistent set removed from the measure (which is to say, set to measure 0).
Each set eliminated may take some mass away from the numerator, and will definitely take some mass away from the denominator. (Yet the numerator will always remain less than or equal to the denominator.)
If we remove a set of sentences which does not include X and has no overlap with any of the other sets of sentences removed so far, then the mass of both the numerator and the denominator get multiplied by 1 - .5^n, where n is the number of sentences in the set. To make this more general, for M{0}(Y) we can think of ~Y as having already been removed from its mass to begin; therefore, we can say that for any M{d}(Y), if d+1 is based on an inconsistent set S not involving anything removed from M{d}(Y) previously, then M{d+1}(Y) = M{d}(Y) * (1 - .5^|S|).
If S does contain a sentence which has been involved in inconsistency previously, we remove a strictly smaller fraction. If a subset of S has already been removed, then nothing gets removed at all. But if a set S being removed from M{d}(Y) contains only Y and some other sentences which have never appeared before, then M{d+1}(Y) = M{d}(Y) * (1 - .5^|S-1|).
Let’s name the ratio M(X)/M(all) at depth d as R(d). Considering R(d)/R(d+1), we know its limit must be in the range [0,1], because M(X) cannot be more than M(all) (so R(d) cannot diverge upward). If the limit of R(d) is greater than zero, the limit of R(d)/R(d+1) must be 1.
Assume X and ~X are both consistent.
In some ordering of logical depth, we have that: infinitely often, we eliminate a set of the form {Z, Z->~X, X} where Z and Z->~X have both never been seen before in an elimination set. Thus, M{d+1}(X) = .75 M{d}(X).
M{d}(all) = M{d}(X) + M{d}(~X). Nothing will be removed from M{d}(~X). Since X and ~X are consistent, both have some measure. Therefore, M{d+1}(all) = .75 M{d}(X) + M{d}(~X) = M{d}(all) ( .75 R(d) + M{d}(~X)/M{d}(all)) = M{d}(all) ( .75 R(d) + 1 - R(d)) = M{d}(all) * (1 - .25 R(d)).
I didn’t follow but the conclusion doesn’t sound right to me. Your argument looks like it should apply to any language and proof system. So if I introduce “X” as a logical symbol that isn’t constrained by any axioms or inference rules, why would its probability go to either 0 or 1? It seems to converge to 1⁄2 since for any consistent truth assignment with X=T we have a symmetric consistent truth assignment with X=F.
Yea, I suspect that’s right: so if we introduce X, it must go to either zero or 1, but either option violates the symmetry between X and ~X. Therefore, it must not converge. But this needs to be formalized more before I’m sure.
To convey my argument without as much of the math:
Suppose that P(X) is 1⁄2 at some stage. Then there will be inconsistent sets yet to remove which will take it at least C away from 1⁄2, where C is a constant that does not go down as the process continues.
The intuition behind this is that removing an inconsistent sentence which has not appeared in any of the inconsistencies removed so far, reduces mass by 1⁄2. Thus, the mass is changing significantly, all the time. Now to make this into a proof we need to show that P(X) changes significantly no matter how far into the process we go; IE, we need to show that a significantly different amount of mass can be removed from the ‘top’ and the ‘bottom’ (in the fraction M(X) / M(all) at finite depth).
The inconsistency {Y, Y->X, ~X} is supposed to achieve this: it only removes mass from the bottom, but there are infinitely many sets like this (we can make Y arbitrarily complex), and each of them reduces the bottom portion by the same fraction without touching the top. Specifically, the bottom becomes 7/8ths of its size (if I’ve done it right), so P(x) becomes roughly .57.
The fraction can re-adjust by decreasing the top in some other way, but this doesn’t allow convergence. No matter how many times the fraction reaches .5 again, there’s a new Y which can be used to force it to .57.
Well, it leads to the right answer in simple examples. For example if A is an ambient control algorithm for the utility defined by “if A()=0 then U=u0, if A()=1 then U()=u1”, for ui numbers with short binary expansion the value of the choice “0″ will be u0 and the value of the choice “1” will be u1. This is because there is a simple proof of the statement “if A() = i then U=ui” whereas the actual value of A() cannot be deduced easily (with a short proof). If the ui have long/infinite binary expansion, the values will be approximations of the ui with precision depending on the depth of analysis for which A() becomes decidable.
Yes. The problem of self-trust resurfaces in my formulation as the problem of choosing F. Roughly speaking, G might not be able to use the power of F to full extent in its reasoning since F cannot prove its own soundness.
I didn’t say it won’t make any unprovable things converge to probability 1. My guess is that if a statement of the form “P(n) for all n” is provable for any given n, then its probability should converge to 1 even if it’s unprovable. But I don’t have a proof.
If a convergent probability distribution has this property, then it will also have the property of converging to probability 0 for some true sentences. To be more precise: using the language of the arithmetic hierarchy, if true Pi_1 statements converge to 1, there will be some true Pi_2 statements which converge to 0. (This was proven by Will Sawin at the MIRI workshop I attended.)
Overall, if Benja’s distribution does converge, I would be surprised if it had this property—it seems too similar to mine, which does not have this property. I could be wrong, though. My true intuition is that Benja’s distribution does not converge.
Hmm. P(X) in Benja’s distribution is (if I’m not mistaken) the probability that a statement will be true after we’ve randomly assigned true or false to all the sentences in our finite consideration set, given that our random assignment is consistent by our finite check. (Let’s assume X is always in the set of sentences we are considering.)
Let’s assume that we’re expanding both our set of sentences and our consistency check, but let’s also assume (for sake of informal argument) that we’re expanding our consistency check fast enough that we don’t have to worry about inconsistent stuff (so, we’re expanding it incomputably faster than we’re expanding our set of sentences).
P(X) at any given stage is the probability that X is true in a 50-50 random assignment (to the sentences in our current set), given that the assignment is consistent. In other words, it’s the following ratio: the probability that an assignment including X is consistent, over the probability that any assignment is consistent.
The limit of both parts of this ratio is zero; the set of consistent assignments is a set of measure zero, since we are bound to eventually add some inconsistency. (We will definitely add in a contradiction at some point if we keep adding sentences with coin flips to determine their truth values.)
(Benja states that we’re dealing with a measurable set here, at least.)
My intuition is that this won’t converge, since we may infinitely often add a statement that significantly changes the ratio. I don’t see that we can put a bound on how much the ratio will change at any point (whereas with my prior, there is a bound, though an incomputable one, because the measure of any remaining inconsistencies drops monotonically).
I don’t see the need for 2 parameters. The way I formulated it, there is only 1 parameter: the depth of analysis D. I always consider all sentences. This makes sense because all but a finite set of sentences cannot figure in a contradiction of length ⇐ D, so all but a finite set of sentences get probability 1⁄2 for any given D.
Regarding convergence of probabilities of undecidable statements as D → infinity, well, I don’t know how to prove it, but I also don’t know how to disprove it. I can try to assign a probability to it… ;) Is the result by Sawin you mentioned published somewhere?
Is it important to the analysis whether the probabilities converge as D tends to infinity? Do you rely on this at any point?
If you need to make sure the probabilities converge, then you could consider something like the following.
First, split the sentences in your system F into “positive” sentences (the ones which have no leading “not” symbol, ~, or else which have an even number of leading “not” symbols) and “negative” sentences (the ones with an odd number of leading “not” symbols). Sort the positive sentences by length, and then sort them lexicographically within all sentences of the same length. This will give a list s1, s2 etc.
We will now build a growing subset S of sentences, and ensure that in the limit, S is consistent and complete.
At stage 0, S is empty. Call this S0.
At stage n: Toss a fair coin, and if it lands heads then add sn to S. If it lands tails then add ~sn to S. Next, systematically search through all proofs of length up to the length of sn to see if there are any inconsistencies in S. If a proof of inconsistency is found, then list the subset of positive and negative sentences which create the inconsistency e.g. {sa, ~sb, ~sc, …, sz}; let k be the largest index in this list (the largest value of a, b, …, z). If sk was in S, then remove it and add ~sk instead, or if ~sk was in S then replace it by sk. Restart the search for inconsistencies. When the search completes without finding any further inconsistencies we have the set Sn.
We now take the obvious limit Sw of the sets Sn as n tends to infinity: s will belong to Sw if and only if it belongs to all but a finite number of Sn. That limit Sw is guaranteed to exist, because for each m, the process will eventually find a consistent choice of sentences and negations from within {s1, … ,sm} and then stick with it (any contradictions found later will cause the replacement of sentences sk or ~sk with k > m). Further, Sw is guaranteed to be consistent (because any contradictions would have been discovered and removed in some Sn). Finally Sw is guaranteed to be complete, since it contains either sm or ~sm for each m.
We can now define probabilities PD(s) = P(s is a member of SD) and take P(s) as the limit of PD(s) as D tends to infinity. The limit is always defined since it is just the probability that s is a member of Sw.
Convergence is not an issue as long as the utility function is computable.
Soon I’m going to write more about logical uncertainty in the context of the Loebian obstacle.
Ok, I see.
So, I could instead think of iteratively eliminating inconsistent sets of sentences from the measure M, and looking at the limit of M(X)/M(all). I will assume for convenience that we eliminate one inconsistency at a time (so, remove mass from one set of sentences at a time). Let’s call this M{d}, so M{d+1} has one more inconsistent set removed from the measure (which is to say, set to measure 0).
Each set eliminated may take some mass away from the numerator, and will definitely take some mass away from the denominator. (Yet the numerator will always remain less than or equal to the denominator.)
If we remove a set of sentences which does not include X and has no overlap with any of the other sets of sentences removed so far, then the mass of both the numerator and the denominator get multiplied by 1 - .5^n, where n is the number of sentences in the set. To make this more general, for M{0}(Y) we can think of ~Y as having already been removed from its mass to begin; therefore, we can say that for any M{d}(Y), if d+1 is based on an inconsistent set S not involving anything removed from M{d}(Y) previously, then M{d+1}(Y) = M{d}(Y) * (1 - .5^|S|).
If S does contain a sentence which has been involved in inconsistency previously, we remove a strictly smaller fraction. If a subset of S has already been removed, then nothing gets removed at all. But if a set S being removed from M{d}(Y) contains only Y and some other sentences which have never appeared before, then M{d+1}(Y) = M{d}(Y) * (1 - .5^|S-1|).
Let’s name the ratio M(X)/M(all) at depth d as R(d). Considering R(d)/R(d+1), we know its limit must be in the range [0,1], because M(X) cannot be more than M(all) (so R(d) cannot diverge upward). If the limit of R(d) is greater than zero, the limit of R(d)/R(d+1) must be 1.
Assume X and ~X are both consistent.
In some ordering of logical depth, we have that: infinitely often, we eliminate a set of the form {Z, Z->~X, X} where Z and Z->~X have both never been seen before in an elimination set. Thus, M{d+1}(X) = .75 M{d}(X).
M{d}(all) = M{d}(X) + M{d}(~X). Nothing will be removed from M{d}(~X). Since X and ~X are consistent, both have some measure. Therefore, M{d+1}(all) = .75 M{d}(X) + M{d}(~X) = M{d}(all) ( .75 R(d) + M{d}(~X)/M{d}(all)) = M{d}(all) ( .75 R(d) + 1 - R(d)) = M{d}(all) * (1 - .25 R(d)).
Thus, infinitely often, R(d+1) = {.75 M{d}(X)} / {M{d}(all) * (1 - .25 R(d))} = .75 R(d) / (1 - .25 R(d))
Let c be R(d+1) - R(d).
So, infinitely often, we have
R(d) + c = .75 R(d) / (1 - .25 R(d))
c = .75 R(d) / (1 - .25 R(d)) - R(d)
If c goes to zero, R(d) must go to:
0 = .75 R(d) / (1 - .25 R(d)) - R(d)
R(d) = .75 R(d) / (1 - .25 R(d))
1 = .75 / (1 - .25 R(d))
1 - .25 R(d) = .75
.25 = .25 R(d)
R(d) = 1
So we see that if the probability converges, it must go to either 0 or 1! Unless I’ve made some mistake.
I didn’t follow but the conclusion doesn’t sound right to me. Your argument looks like it should apply to any language and proof system. So if I introduce “X” as a logical symbol that isn’t constrained by any axioms or inference rules, why would its probability go to either 0 or 1? It seems to converge to 1⁄2 since for any consistent truth assignment with X=T we have a symmetric consistent truth assignment with X=F.
Yea, I suspect that’s right: so if we introduce X, it must go to either zero or 1, but either option violates the symmetry between X and ~X. Therefore, it must not converge. But this needs to be formalized more before I’m sure.
I think it will converge to 1⁄2 because the symmetry applies at each level of depth, not just at the limit (at least approximately)
To convey my argument without as much of the math:
Suppose that P(X) is 1⁄2 at some stage. Then there will be inconsistent sets yet to remove which will take it at least C away from 1⁄2, where C is a constant that does not go down as the process continues.
The intuition behind this is that removing an inconsistent sentence which has not appeared in any of the inconsistencies removed so far, reduces mass by 1⁄2. Thus, the mass is changing significantly, all the time. Now to make this into a proof we need to show that P(X) changes significantly no matter how far into the process we go; IE, we need to show that a significantly different amount of mass can be removed from the ‘top’ and the ‘bottom’ (in the fraction M(X) / M(all) at finite depth).
The inconsistency {Y, Y->X, ~X} is supposed to achieve this: it only removes mass from the bottom, but there are infinitely many sets like this (we can make Y arbitrarily complex), and each of them reduces the bottom portion by the same fraction without touching the top. Specifically, the bottom becomes 7/8ths of its size (if I’ve done it right), so P(x) becomes roughly .57.
The fraction can re-adjust by decreasing the top in some other way, but this doesn’t allow convergence. No matter how many times the fraction reaches .5 again, there’s a new Y which can be used to force it to .57.