The context is *all* applications of probability theory. Look, when I tell you that A or not A is a rule of classical propositional logic, we don’t argue about the context or what assumptions we are relying on. That’s just a universal rule of classical logic. Ditto with conditioning on all the information you have. That’s just one of the rules of epistemic probability theory that *always* applies. The only time you are allowed to NOT condition on some piece of known information is if you would get the same answer whether or not you conditioned on it. When we leave known information Y out and say it is “irrelevant”, what that means is that Pr(A | Y and X) = Pr(A | X), where X is the rest of the information we’re using. If I can show that these probabilities are NOT the same, then I have proven that Y is, in fact, relevant.
You are simply assuming that what I’ve calculated is irrelevant. But the only way to know absolutely for sure whether it is irrelevant is to actually do the calculation! That is, if you have information X and Y, and you think Y is irrelevant to proposition A, the only way you can justify leaving out Y is if Pr(A | X and Y) = Pr(A | X). We often make informal arguments as to why this is so, but an actual calculation showing that, in fact, Pr(A | X and Y) != Pr(A | X) always trumps an informal argument that they should be equal.
Your “probability of guessing the correct card” presupposes some decision rule for choosing a particular card to guess. Given a particular decision rule, we could compute this probability, but it is something entirely different from “the probability that the card is a king”. If I assume that’s just bad wording, and that you’re actually talking about the frequency of heads when some condition occurs, well now you’re doing frequentist probabilities, and we were talking about *epistemic* probabilities.
But randomly awakening Beauty on only one day is a different scenario than waking her both days. A priori you can’t just replace one with the other.
Yes, in exactly the same sense that *any* mathematical / logical model needs some justification of why it corresponds to the system or phenomenon under consideration. As I’ve mentioned before, though, if you are able to express your background knowledge in propositional form, then your probabilities are uniquely determined by that collection of propositional formulas. So this reduces to the usual modeling question in any application of logic—does this set of propositional formulas appropriately express the relevant information I actually have available?
This is the first thing I’ve read from Scott Garrabant, so “otherwise reputable” doesn’t apply here. And I have frequently seen things written on LessWrong that display pretty significant misunderstandings of the philosophical basis of Bayesian probability, so that gives me a high prior to expect more of them.
I’m not trying to be mean here, but this post is completely wrong at all levels. No, Bayesian probability is not just for things that are space-like. None of the theorems from which it derived even refer to time.
So, you know the things in your past, so there is no need for probability there.
This simply is not true. There would be no need of detectives or historical researchers if it were true.
If you partially observe a fact, then I want to say you can decompose that fact into the part that you observed and the part that you didn’t, and say that the part you observed is in your past, while the part you didn’t observe is space-like separated from you.
You can say it, but it’s not even approximately true. If someone flips a coin in front of me but covers it up just before it hits the table, I observe that a coin flip has occurred, but not whether it was heads or tails—and that second even is definitely within my past light-cone.
You may have cached that you should use Bayesian probability to deal with things you are uncertain about.
No, I cached nothing. I first spent a considerable amount of time understanding Cox’s Theorem in detail, which derives probability theory as the uniquely determined extension of classical propositional logic to a logic that handles uncertainty. There is some controversy about some of its assumptions, so I later proved and published my own theorem that arrives at the same conclusion (and more) using purely logical assumptions/requirements, all of the form, “our extended logic should retain this existing property of classical propositional logical.”
The problem is that the standard justifications of Bayesian probability are in a framework where the facts that you are uncertain about are not in any way affected by whether or not you believe them!
1) It’s not clear this is really true. It seems to me that any situation that is affected by an agent’s beliefs can be handled within Bayesian probability theory by modeling the agent.
2) So what?
Therefore, our reasons for liking Bayesian probability do not apply to our uncertainty about the things that are in our future!
This is a complete non sequitur. Even if I grant your premise, most things in my future are unaffected by my beliefs. The date on which the Sun will expand and engulf the Earth is in no way affected by any of my beliefs. Whether you will get luck with that woman at the bar next Friday is in no way affected by any of my beliefs. And so on,
path analysis requires scientific thinking, as does every exercise in causal inference. Statistics, as frequently practiced, discourages it, and encouraged “canned” procedures instead.
Despite Pearl’s early work on Bayesian networks, he doesn’t seem to be very familiar with Bayesian statistics—the above comment really only applies to frequentist statistics. Model construction and criticism (“scientific thinking”) is an important part of Bayesian statistics. Causal thinking is common in Bayesian statistics, because causal intuition provides the most effective guide for Bayesian model building.
I’ve worked implementing Bayesian models of consumer behavior for marketing research, and these are grounded in microeconomic theory, models of consumer decision making processes, common patterns of deviation from strictly rational choice, etc.
I don’t believe that the term “probability” is completely unambiguous once we start including weird scenarios that fall outside the scope which standard probability was intended to address.
The intended scope is anything that you can reason about using classical propositional logic. And if you can’t reason about it using classical propositional logic, then there is still no ambiguity, because there are no probabilities.
You know, it has not actually been demonstrated that human consciousness can be mimicked by Turing-equivalent computer.
The evidence is extremely strong that human minds are processes that occur in human brains. All known physical laws are Turing computable, and we have no hint of any sort of physical law that is not Turing computable. Since brains are physical systems, the previous two observations imply that it is highly likely that they can be simulated on a Turing-equivalent computer (given enough time and memory).
But regardless of that, the Sleeping Beauty problem is a question of epistemology, and the answer necessarily revolves around the information available to Beauty. None of this requires an actual human mind to be meaningful, and the required computations can be carried out by a simple machine. The only real question here is, what information does Beauty have available? Once we agree on that, the answer is determined.
In these kinds of scenarios we need to define our reference class and then we calculate the probability for someone in this class.
No, that is not what probability theory tells us to do. Reference classes are a rough technique to try to come up with prior distributions. They are not part of probability theory per se, and they are problematic because often there is disagreement as to which is the correct reference class.
When Sleeping Beauty wakes up and observes a sequence, they are learning that this sequence occurs on a on a random day
Right here is your error. You are sneaking in an indexical here—Beauty doesn’t know whether “today” is Monday or Tuesday. As I discussed in detail in Part 2, indexicals are not part of classical logic. Either they are ambiguous, which means you don’t have a proposition at all, or the ambiguity can be resolved, which means you can restate your proposition in a form that doesn’t involve indexicals.
What you are proposing is equivalent to adding an extra binary variable d to the model, and replacing the observation R(y, Monday) or R(y, Tuesday) with R(y, d). That in turn is the same as randomly choosing ONE day on which to wake Beauty (in the Tails case) instead of waking her both times.
This kind of oversight is why I really insist on seeing an explicit model and an explicit statement (as a proposition expressible in the language of the original model) of what new information Beauty receives upon awakening.
All this maths is correct, but why do we care about these odds? It is indeed true that if you had pre-committed at the start to guess if and only if you experienced the sequence 111
We care about these odds because the laws of probability tell us to use them. I have no idea what you mean by “precommitted at the start to guess if and only if...” I can’t make any sense of this or the following paragraph. What are you “guessing”? Regardless, this is a question of epistemology—what are the probabilities, given the information you have—and those probabilities have specific values regardless of whether you care about calculating them.
Neal wants us the condition on all information, including the apparently random experiences that Sleeping Beauty will undergo before they answer the interview question. This information seems irrelevant, but Neal argues that if it were irrelevant that it wouldn’t affect the calculation. If, contrary to expectations, it actually does, then Neal would suggest that we were wrong about its irrelevance.
This isn’t just Neal’s position. Jaynes argues the same in Probability Theory: The Logic of Science. I have never once encountered an academic book or paper that argued otherwise. The technical term for conditioning on less than all the information is “cherry-picking the evidence” :-).
Unfortunately, Ksavnhorn’s post jumps straight into the maths and doesn’t provide any explanation of what is going on.
Ouch. I thought I was explaining what was going on.
But the development of probability theory and the way that it is applied in practice were guided by implicit assumptions about observers.
I don’t think that’s true, but even if it is an accurate description of the history, that’s irrelevant—we have justifications for probability theory that make no assumptions whatsoever about observers.
You seemed to argue in your first post that selection effects were not routinely handled within standard probability theory.
No, I argued that this isn’t a case of selection effects.
Certainly agreed as to logic (which does not include probability theory).
Why are you ignoring what I wrote about proofs that probability theory is either a or the uniquely determined extension of classical propositional logic to handle degrees of certainty? That places probability theory squarely in the logical camp. It is a logic.
in which we made certain implicit assumptions about observers
No, we made no such implicit assumptions. There are no assumptions, implicit or otherwise, about observers at all. If you think otherwise, show me where they occur in Cox’s Theorem or in my theorem.
I’m going to wait to address that until you clarify what you mean by applying standard probability theory, since you offered a fairly narrow view of what this means in your original post, and seemed to contradict it in your point 3 in the comment.
I have no idea what you’re talking about here.
My position is that “the information available” should not be interpreted as simply the existence of at least one agent making the same observations you are, while declining to make any inferences at all about the number of such agents (beyond that it is at least 1).
Um, there’s only one agent here, but if by “agent” you mean the pair (person, day), then the above is just wrong—it’s very clearly part of the model that if the coin comes up Heads, there is exactly one day on which the remembered observations could be made, and if the coin comes up Tails, there are exactly two days on which the remembered observations could be made. I even worked out the probabilities that the observations occurred on just Monday, just Tuesday, or both Monday and Tuesday.
Listen, if you want to argue against my analysis, you need to
1. Propose a different model of what Beauty knows on Sunday, and/or
2. Propose a different proposition that expresses the additional information Beauty has on Monday/Tuesday and that accounts for her altered probabilities. This proposition should be possible to sensibly state and talk about on Sunday, Monday, Tuesday, or Wednesday, by either Beauty or one of the experimenters, and mean the same thing in all these cases.
When Sleeping Beauty wakes up and observes a sequence, they are learning that this sequence occurs on a on a random day out of those days when they are awake.
That would be a valid description if she were awakened only on one day, with that day chosen through some unpredictable process. That is not the case here, though.
What you’re doing here is sneaking in an indexical—“today” is either Monday if Heads, and “today” is either Monday or Tuesday if Tails. See Part 2 for a discussion of this issue. To the extent that indexicals are ambiguous, they cannot be used in classical propositions. The only way to show that they are unambiguous is to show that there is an equivalent way of expressing that same thing that doesn’t use any indexical, and only uses well-defined entities—in which case you might as well use the equivalent expression that has no indexical.
Yes, that is shown in Part 2.
From the OP: “honor requires recognition from others.” That’s not a component of the notion of honor I grew up with. Nor is the requirement of avenging insults.
This is a very, very different concept of honor than the one I grew up with. I was taught that honor means doing what is right (ethical, moral), regardless of personal cost. It meant being unfailingly honest, always keeping your word, doing your duty, etc. How others perceived you was irrelevant. One example of this notion of honor is the case of Sir Thomas More, who was executed by Henry VIII because his conscience would not allow him to cooperate with Henry’s establishment of the Church of England. Another is the Dreyfus Affair and Colonel Georges Picquart, who suffered grave personal consequences for insisting on giving an honest report and refusing to go along with the framing of Alfred Dreyfus for espionage. (There’s a wonderful movie about this, called Prisoner of Honor.)
...the standard formalization of probability… was not designed with anthropic reasoning in mind. It is usually taken for granted that the number of copies of you that will be around in the future to observe the results of experiments is fixed at exactly 1, and that there is thus no need to explicitly include observation selection effects in the formalism.
1. Logic, including probability theory, is not observer-dependent. Just as the conclusions one can obtain with classical propositional logic depend only on the information (propositional axioms) available, and not on any characteristic or circumstance of the reasoner, epistemic probabilities also depend only on the information available. Logic—including probability theory—was designed to be fully general. If you want to argue that probability theory is not, in its standard formulation, suitable for anthropic reasoning, you need to point out the specific points in its rationale that are incompatible with anthropic effects. As I have shown (preprint), all you have to assume to get probability theory from classical propositional logic is that certain properties of propositional logic are retained in the extended logic.
2. No, neither classical logic nor probability theory as the extension of classical propositional logic assumes anything about observers, or their numbers, or experiments, or what may happen in the future.
3. Selection effects are routinely handled within the framework of standard probability theory. You don’t need to go beyond standard probability theory for this.