Foundations of Probability

Beginning of: Logical Uncertainty sequence

Suppose that we are designing a robot. In order for this robot to reason about the outside world, it will need to use probabilities.

Our robot can then use its knowledge to acquire cookies, which we have programmed it to value. For example, we might wager a cookie with the robot on the motion of a certain stock price.

In the coming sequence, I’d like to add a new capability to our robot. It has to do with how the robot handles very hard math problems. If we ask “what’s the last digit of the 3^^^3′th prime number?”, our robot should at some point give up, before the sun explodes and the point becomes moot.

If there are math problems our robot can’t solve, what should it do if we offer it a bet about the last digit of the 3^^^3′th prime? It’s going to have to approximate—robots need to make lots of approximations, even for simple tasks like finding the strategy that maximizes cookies.

Intuitively, it seems like if we can’t find the real answer, the last digit is equally likely to be 1, 3, 7 or 9; our robot should take bets as if it assigned those digits equal probability. But to assign some probability to the wrong answer is logically equivalent to assigning probability to 0=1. When we learn more, it will become clear that this is a problem—we aren’t ready to upgrade our robot yet.

Let’s begin with a review of the foundations of probability.

What I call foundations of probability are arguments for why our robot should ever want to use probabilities. I will cover four of them, ranging from the worldly (“make bets in the following way or you lose money”) to the ethereal (“here’s a really elegant set of axioms”). To use the word “probability” to describe the subject of such disparate arguments can seem odd, but keep in mind the naive definition of probability as that number that’s 16 for a fair die rolling 6 and 30% for clear weather tomorrow.

Dutch Books

The concretest of concrete foundations is the Dutch book arguments. A Dutch book is a collection of bets that is certain to lose you money. If you violate the rules of probability, you’ll agree to these certain-loss bets (or not take a certain-win bet).

For example, if you think that each side of the coin has a 55% chance of showing up, then you’ll pay $1 for a bet that pays out $0.98 if the coin lands heads and $0.98 if the coin lands tails. If taking bets where you’re guaranteed to lose is bad, then you’re not allowed to have probabilities for mutually exclusive things that sum to more than 1.

Similar arguments hold for other properties of probability. If your probabilities for exhaustive events add up to less than 1, you’ll pass up free money, which is bad. If you disobey the sum rule or the product rule, you’ll agree to a guaranteed loss, which is bad, etcetera. Thus, say the Dutch book arguments, our probabilities have to behave the way they do because we don’t want to take guaranteed losses or pass up free money.

There are many assumptions underlying this whole scenario. Our agent in these arguments already tries to decide using probability-like numbers, all we show is that the numbers have to follow the same rules as probabilities. Why can’t our agent follow a totally different method of decision making, like picking randomly or alphabetization?

One can show that e.g. picking randomly will sometimes throw away money. But there is a deeper principle here: an agent that wants to avoid throwing away money or passing up free money has to act as if it had numbers that followed probability-rules, and that’s a good enough reason for our agent to have probabilities.

Still, some people dislike Dutch book arguments because they focus on an extreme scenario where a malicious bookie is trying to exploit our agent. To avoid this, we’ll need a more abstract foundation.

You can learn more about Dutch book arguments here and here.

Savage’s Foundation

Leonard Savage formulated a basis for decision-making that is sort of a grown-up version of Dutch book arguments. From seven desiderata, none of which mention probability, he derived that an agent that wants to act consistently will act as if it had probabilistic beliefs.

What are the desiderata about, if not probability? They define an agent that has preferences, and is able to take actions, which are defined as things that lead to outcomes, and can lead to different outcomes depending on external possibilities in event-space. They require that the agent’s actions be consistent in commonsensical ways. These requirements are sufficient to show that assigning probabilities to the external events is the best way to do things.

Savage’s theorem provides one set of conditions for when we should use probabilities. But it doesn’t help us choose which probabilities to assign—anything consistent works. The idea that probabilities are degrees of belief, and that they are derived from some starting information, is left to our next foundation.

You can learn more about Savage’s foundation here.

Cox’s Theorem

Cox’s theorem is a break from justifying probabilities with gambling. Rather than starting from an agent that wants to achieve good outcomes, and showing that having probabilities is a good idea, Richard Cox started with desired properties of a “degree of plausibility,” and showed that probabilities are what a good belief-number should be.

One special facet of Cox’s desiderata is that they refer to plausibility of an event, given your information—what will eventually become P(event | information).

There are six or so desiderata, but I think there are three interesting ones: When you’re completely certain, your plausibilities should satisfy the rules of classical logic. Every rational plausibility has at least one event with that plausibility. P(A and B|X) can be found as a function of P(A|X) and P(B|A and X).

These desiderata are a motley assortment. The desideratum that there’s an infinite variety of events is the most strange, but it is satisfied if our universe contains a continuous random process or if we can flip a coin as many times as we want. If the desiderata obtain, Cox’s theorem shows that we can give pretty much any belief a probability. The perspective of Cox’s theorem is useful because it lets us keep talking straightforwardly about probabilities even if betting or decision-making has become nontrivial.

You can learn more about Cox’s theorem in the first two chapters of Jaynes here (in fact, the next few posts are parallel to the first two chapters of Jaynes), and also here. Jaynes includes an additional desideratum in this foundation, which we will cover in the next post.

Kolmogorov Axioms

At the far extreme of abstraction, we have the Kolmogorov axioms for probability. Here they are:

P(E) is a non-negative real number, E is an event that belongs to event-space F.

P(some event occurs)=1.

Any countable sequence of disjoint events (E1, E2...) satisfies P(E1 or E2 or...) = sum of all the P(E).

Though it was not their intended purpose, these can be seen as a Cox-style list of desiderata for degrees of plausibility. Their main virtue is that they’re simple and handy to mathematicians who like set theory.

You can learn more about Kolmogorov’s axioms here.

Look back at our robot trying to bet on the 3^^^3′th prime number. Our robot has preferences, so it can be Dutch booked. Its reward depends on the math problem and we want it to act consistently, so Savage’s theorem applies. Cox’s theorem applies if we allow our robot to make combined bets on math and dice. It even seems like the Kolmogorov axioms should hold. Resting upon these foundations, our robot should assign numbers to mathematical statements, and they should behave like probabilities.

But we can’t get specific about that, because we have a problem—we don’t know how to actually find the numbers yet. Our foundations tell us that the probabilities of the two sides of a coin will add to 1, but they don’t care whether P(heads) is 0.5 or 0.99999. If Dutch book arguments can’t tell us that a coin lands heads half the time, what can? Tune in next time to find out.

First post in the sequence Logical Uncertainty

Next post: Putting in the Numbers