1. You seem to be suggesting that the standard Bayesian framework handles logical uncertainty as a special case. (Here we are not exactly “uncertain” about sentences, but we have to update on their truth from some prior that did not account for it, which amounts to the same thing.) If this were true, the research on handling logical uncertainty through new criteria and constructions would be superfluous. I haven’t actually seen a proposal like this laid out in detail, but I think they’ve been proposed and found wanting, so I’ll be skeptical at least until I’m shown the details of such a proposal.
(In particular, this would need to involve some notion of conditional probabilities like P(A | A ⇒ B), and perhaps priors like P(A ⇒ B), which are not a part of any treatment of Bayes I’ve seen.)
2. Even if this sort of thing does work in principle, it doesn’t seem to help in the practical case at hand. We’re now told to update on “noticing” A ⇒ B by using objects like P(A | A ⇒ B), but these too have to be guessed using heuristics (we don’t have a map of them either), so it inherits the same problem it was introduced to solve.
I’m a bit confused by your mention of logical uncertainty. Isn’t plain old probability sufficient for this problem? If A and B are statements about the world, and you have a prior over possible worlds (combinations of truth values for A and B), then probabilities like P(A ⇒ B) or P(A | A ⇒ B) seem well-defined to me. For example, P(A ⇒ B) = P(A and B) + P(not A).
Let’s try to walk through the US and California example. At the start, you feel that A = “California will be a US state in 2100” and B = “US will exist in 2100″ both have probability 98% and are independent, because you haven’t thought much about the connection between them. Then you notice that A ⇒ B, so you remove the option “A and not B” from your prior and renormalize, leading to 97.96% for A and 99.96% for B. The probabilities are nudged apart, just like you wanted!
Of course you could say these numbers still look fake. The update for B was much stronger than the update for A, what’s up with that? But that’s because our prior was very ignorant to begin with. As we get more data, we’ll converge on the truth. Bayes comes out of this exercise looking pretty good, if you ask me.
Ah, yeah, you’re right that it’s possible to do this. I’m used to thinking in the Kolmogorov picture, and keep forgetting that in the Jaynesian propositional logic picture you can treat material conditionals as contingent facts. In fact, I went through the process of realizing this in a similar argument about the same post a while ago, and then forgot about it in the meantime!
That said, I am not sure what this procedure has to recommend it, besides that it is possible and (technically) Bayesian. The starting prior, with independence, does not really reflect our state of knowledge at any time, even at the time before we have “noticed” the implication(s). For, if we actually write down that prior, we have an entry in every cell of the truth table, and if we inspect each of those cells and think “do I really believe this?”, we cannot answer the question without asking whether we know facts such as A ⇒ B—at which point we notice the implication!
It seems more accurate to say that, before we consider the connection of A to B, those cells are “not even filled in.” The independence prior is not somehow logically agnostic; it assigns a specific probability to the conditional, just as our posterior does, except that in the prior that probability is, wrongly, not one.
Okay, one might say, but can’t this still be a good enough place to start, allowing us to converge eventually? I’m actually unsure about this, because (see below) the logical updates tend to push the probabilities of the “ends” of a logical chain further towards 0 and 1; at any finite time the distribution obeys Cromwell’s Rule, but whether it converges to the truth might depend on the way in which we take the limit over logical and empirical updates (supposing we do arbitrarily many of each type as time goes on).
I got curious about this and wrote some code to do these updates with arbitrary numbers of variables and arbitrary conditionals. What I found is that as we consider longer chains A ⇒ B ⇒ C ⇒ …, the propositions at one end get pushed to 1 or 0, and we don’t need very long chains for this to get extreme. With all starting probabilities set to 0.7 and three variables 0 ⇒ 1 ⇒ 2, the probability of variable 2 is 0.95; with five variables the probability of the last one is 0.99 (see the plot below). With ten variables, the last one reaches 0.99988. We can easily come up with long chains in the California example or similar, and following this procedure would lead us to absurdly extreme confidence in such examples.
I’ve also given a second plot below, where all the starting probabilities are 0.5. This shows that the growing confidence does not rely on an initial hunch one way or the other; simply updating on the logical relationships from initial neutrality (plus independences) pushes us to high confidence about the ends of the chain.
Yeah, if the evidence you see (including logical evidence) is filtered by your adversary, but you treat it as coming from an impartial process, you can be made to believe extreme stuff. That problem doesn’t seem to be specific to Bayes, or at least I can’t imagine any other method that would be immune to it.
Here’s a simple model: the adversary flips a coin ten times and reveals some of the results to you, which happen to be all heads. You believe that the choice of which results to reveal is independent of the results themselves, but in fact the adversary only reveals heads. So your beliefs about the coin’s bias are predictably pushed toward heads.
The usual Bayesian answer is that you should have nonzero probability that evidence is revealed adversarially. Then over time that probability will dominate. Similarly in our problem, you should have nonzero probability that someone is coming up with intermediate statements between A and Z and showing you only those, instead of other statements that would appear elsewhere in the graph and temper your beliefs a bit. That makes the model complicated enough that I can’t work it out on a napkin anymore, but I’m pretty sure it’s the only way.
To quote Abram Demski in “All Mathematicians are Trollable”:
The main concern is not so much whether GLS-coherent mathematicians are trollable as whether they are trolling themselves. Vulnerability to an external agent is somewhat concerning, but the existence of misleading proof-orderings brings up the question: are there principles we need to follow when deciding what proofs to look at next, to avoid misleading ourselves?
My concern is not with the dangers of an actual adversary, it’s with the wild oscillations and extreme confidences that can arise even when logical facts arrive in a “fair” way, so long as it is still possible to get unlucky and experience a “clump” of successive observations that push P(A) way up or down.
We should expect such clumps sometimes unless the observation order is somehow specially chosen to discourage them, say via the kind of “principles” Demski wonders about.
One can also prevent observation order from mattering by doing what the Eisenstat prior does: adopt an observation model that does not treat logical observations as coming from some fixed underlying reality (so that learning “B or ~A” rules out some ways A could have been true), but as consistency-constrained samples from a fixed distribution. This works as far as it goes, but is hard to reconcile with common intuitions about how e.g. P=NP is unlikely because so many “ways it could have been true” have failed (Scott Aaronson has a post about this somewhere, arguing against Lubos Motl who seems to think like the Eisenstat prior), and more generally with any kind of mathematical intuition — or with the simple fact that the implications of axioms are fixed in advance and not determined dynamically as we observe them. Moreover, I don’t know of any way to (approximately) apply this model in real-world decisions, although maybe someone will come up with one.
This is all to say that I don’t think there is (yet) any standard Bayesian answer to the problem of self-trollability. It’s a serious problem and one at the very edge of current understanding, with only some partial stabs at solutions available.
Two comments:
1. You seem to be suggesting that the standard Bayesian framework handles logical uncertainty as a special case. (Here we are not exactly “uncertain” about sentences, but we have to update on their truth from some prior that did not account for it, which amounts to the same thing.) If this were true, the research on handling logical uncertainty through new criteria and constructions would be superfluous. I haven’t actually seen a proposal like this laid out in detail, but I think they’ve been proposed and found wanting, so I’ll be skeptical at least until I’m shown the details of such a proposal.
(In particular, this would need to involve some notion of conditional probabilities like P(A | A ⇒ B), and perhaps priors like P(A ⇒ B), which are not a part of any treatment of Bayes I’ve seen.)
2. Even if this sort of thing does work in principle, it doesn’t seem to help in the practical case at hand. We’re now told to update on “noticing” A ⇒ B by using objects like P(A | A ⇒ B), but these too have to be guessed using heuristics (we don’t have a map of them either), so it inherits the same problem it was introduced to solve.
I’m a bit confused by your mention of logical uncertainty. Isn’t plain old probability sufficient for this problem? If A and B are statements about the world, and you have a prior over possible worlds (combinations of truth values for A and B), then probabilities like P(A ⇒ B) or P(A | A ⇒ B) seem well-defined to me. For example, P(A ⇒ B) = P(A and B) + P(not A).
Let’s try to walk through the US and California example. At the start, you feel that A = “California will be a US state in 2100” and B = “US will exist in 2100″ both have probability 98% and are independent, because you haven’t thought much about the connection between them. Then you notice that A ⇒ B, so you remove the option “A and not B” from your prior and renormalize, leading to 97.96% for A and 99.96% for B. The probabilities are nudged apart, just like you wanted!
Of course you could say these numbers still look fake. The update for B was much stronger than the update for A, what’s up with that? But that’s because our prior was very ignorant to begin with. As we get more data, we’ll converge on the truth. Bayes comes out of this exercise looking pretty good, if you ask me.
Ah, yeah, you’re right that it’s possible to do this. I’m used to thinking in the Kolmogorov picture, and keep forgetting that in the Jaynesian propositional logic picture you can treat material conditionals as contingent facts. In fact, I went through the process of realizing this in a similar argument about the same post a while ago, and then forgot about it in the meantime!
That said, I am not sure what this procedure has to recommend it, besides that it is possible and (technically) Bayesian. The starting prior, with independence, does not really reflect our state of knowledge at any time, even at the time before we have “noticed” the implication(s). For, if we actually write down that prior, we have an entry in every cell of the truth table, and if we inspect each of those cells and think “do I really believe this?”, we cannot answer the question without asking whether we know facts such as A ⇒ B—at which point we notice the implication!
It seems more accurate to say that, before we consider the connection of A to B, those cells are “not even filled in.” The independence prior is not somehow logically agnostic; it assigns a specific probability to the conditional, just as our posterior does, except that in the prior that probability is, wrongly, not one.
Okay, one might say, but can’t this still be a good enough place to start, allowing us to converge eventually? I’m actually unsure about this, because (see below) the logical updates tend to push the probabilities of the “ends” of a logical chain further towards 0 and 1; at any finite time the distribution obeys Cromwell’s Rule, but whether it converges to the truth might depend on the way in which we take the limit over logical and empirical updates (supposing we do arbitrarily many of each type as time goes on).
I got curious about this and wrote some code to do these updates with arbitrary numbers of variables and arbitrary conditionals. What I found is that as we consider longer chains A ⇒ B ⇒ C ⇒ …, the propositions at one end get pushed to 1 or 0, and we don’t need very long chains for this to get extreme. With all starting probabilities set to 0.7 and three variables 0 ⇒ 1 ⇒ 2, the probability of variable 2 is 0.95; with five variables the probability of the last one is 0.99 (see the plot below). With ten variables, the last one reaches 0.99988. We can easily come up with long chains in the California example or similar, and following this procedure would lead us to absurdly extreme confidence in such examples.
I’ve also given a second plot below, where all the starting probabilities are 0.5. This shows that the growing confidence does not rely on an initial hunch one way or the other; simply updating on the logical relationships from initial neutrality (plus independences) pushes us to high confidence about the ends of the chain.
Yeah, if the evidence you see (including logical evidence) is filtered by your adversary, but you treat it as coming from an impartial process, you can be made to believe extreme stuff. That problem doesn’t seem to be specific to Bayes, or at least I can’t imagine any other method that would be immune to it.
Here’s a simple model: the adversary flips a coin ten times and reveals some of the results to you, which happen to be all heads. You believe that the choice of which results to reveal is independent of the results themselves, but in fact the adversary only reveals heads. So your beliefs about the coin’s bias are predictably pushed toward heads.
The usual Bayesian answer is that you should have nonzero probability that evidence is revealed adversarially. Then over time that probability will dominate. Similarly in our problem, you should have nonzero probability that someone is coming up with intermediate statements between A and Z and showing you only those, instead of other statements that would appear elsewhere in the graph and temper your beliefs a bit. That makes the model complicated enough that I can’t work it out on a napkin anymore, but I’m pretty sure it’s the only way.
To quote Abram Demski in “All Mathematicians are Trollable”:
My concern is not with the dangers of an actual adversary, it’s with the wild oscillations and extreme confidences that can arise even when logical facts arrive in a “fair” way, so long as it is still possible to get unlucky and experience a “clump” of successive observations that push P(A) way up or down.
We should expect such clumps sometimes unless the observation order is somehow specially chosen to discourage them, say via the kind of “principles” Demski wonders about.
One can also prevent observation order from mattering by doing what the Eisenstat prior does: adopt an observation model that does not treat logical observations as coming from some fixed underlying reality (so that learning “B or ~A” rules out some ways A could have been true), but as consistency-constrained samples from a fixed distribution. This works as far as it goes, but is hard to reconcile with common intuitions about how e.g. P=NP is unlikely because so many “ways it could have been true” have failed (Scott Aaronson has a post about this somewhere, arguing against Lubos Motl who seems to think like the Eisenstat prior), and more generally with any kind of mathematical intuition — or with the simple fact that the implications of axioms are fixed in advance and not determined dynamically as we observe them. Moreover, I don’t know of any way to (approximately) apply this model in real-world decisions, although maybe someone will come up with one.
This is all to say that I don’t think there is (yet) any standard Bayesian answer to the problem of self-trollability. It’s a serious problem and one at the very edge of current understanding, with only some partial stabs at solutions available.