# Probability and Politics

Follow-up to: Politics as Charity

Can we think well about courses of action with low probabilities of high payoffs?

Giving What We Can (GWWC), whose members pledge to donate a portion of their income to most efficiently help the global poor, says that evaluating spending on political advocacy is very hard:

Such changes could have enormous effects, but the cost-effectiveness of supporting them is very difficult to quantify as one needs to determine both the value of the effects and the degree to which your donation increases the probability of the change occurring. Each of these is very difficult to estimate and since the first is potentially very large and the second very small [1], it is very challenging to work out which scale will dominate.

This sequence attempts to actually work out a first approximation of an answer to this question, piece by piece. Last time, I discussed the evidence, especially from randomized experiments, that money spent on campaigning can elicit marginal votes quite cheaply. Today, I’ll present the state-of-the-art in estimating the chance that those votes will directly swing an election outcome.

Disclaimer

Politics is a mind-killer: tribal feelings readily degrade the analytical skill and impartiality of otherwise very sophisticated thinkers, and so discussion of politics (even in a descriptive empirical way, or in meta-level fashion) signals an increased probability of poor analysis. I am not a political partisan and am raising the subject primarily for its illustrative value in thinking about small probabilities of large payoffs.

Two routes from vote to policy: electing and affecting

In thinking about the effects of an additional vote on policy, we can distinguish between two ways to affect public policy: electing politicians disposed to implement certain policies, or affecting [2] the policies of existing and future officeholders who base their decisions on electoral statistics (including that marginal vote and its effects). Models of the probability of a marginal vote swaying an election are most obviously relevant to the electing approach, but the affecting route will also depend on such models, as they are used by politicians.

The surprising virtues of naive Fermi calculation

In my previous post I linked to Eric Schwitzgebel’s discussion of politics as charity, in which he guesstimated that the probability of a U.S. Presidential election being tied was 1/​n where n is the number of voters. So with an estimate of 100 million U.S. voters in presidential elections he gave a 1100,000,000 probability of a marginal vote swaying the election. This is a suspiciously available number. It seems to be derived from a simple model in which we imagine drawing randomly from all the possible divisions of the electorate between two candidates, when only one division would make the marginal vote decisive. But of course we know that voting won’t involve a uniform distribution.

One objection comes from modeling each vote as a flip of a biased coin. If the coin is exactly fair, then the chance of a tie goes with 1/​(sqrt(n)). But if the coin is even slightly removed from exact fairness, then the chance of a tie rapidly falls to neglible levels. This was actually one of the first models in the literature, and recapitulated by LessWrongers in comments last time.

However, if we instead think of the bias of the coin itself as sampled from a uniform distribution, then we get the same result as Schwitzgebel. In the electoral context, we can think of the coin’s bias as reflecting factors with correlated effects on many voters, e.g. the state of the economy, with good economic results favoring incumbents and their parties.
Of course, it’s clear that electoral outcomes are not uniformly sampled: we see few 90%-10% outcomes in national American elections. Electoral competition and Median Voter Theorem effects, along with the stability of partisan identifications, will tend to keep candidates roughly balanced and limit the quantity of true swing voters. Within that range, unpredictable large “wild card” influences like the economy will shift the result from year to year, forcing us to spread our probability mass fairly evenly over a large region. Depending on our estimates of that range, we would need to multiply Schwitzgebel’s estimate by a fudge factor c to get a probability of a tie of c/​n for a random election, with 1<c<100 if we bound from above based on the idea that elections are very unlikely fought in a band of 1% of the electorate.

Fermi, meet data

How well does this hold up against empirical data? In two papers from 1998 and 2009, Andrew Gelman and coauthors attempt to estimate the probability a voter going into past U.S. Presidential elections should have assigned to casting a decisive vote. They use standard models that take inputs like party self-identification, economic growth, and incumbent approval ratings to predict electoral outcomes. These models have proven quite reliable in predicting candidate vote share and no more accurate methods are known. So we can take their output as a first approximation of the individual voter’s rational estimates [3].

Their first paper considers:
… the 1952-1988 elections. For six of the elections, the probability is fairly independent of state size (slightly higher for the smallest states) and is near 1 in 10 million. For the other three elections (1964, 1972, and 1984, corresponding to the landslide victories of Johnson, Nixon, and Reagan [incumbents with favorable economic conditions]), the probability is much smaller, on the order of 1 in hundreds of millions for all of the states.
The result for 1992 was near 1 in 10 million. In 2008, which had economic and other conditions strongly favoring Obama, they found the following:

probabilities a week before the 2008 presidential election, using state-by-state election forecasts based on the latest polls. The states where a single vote was most likely to matter are New Mexico, Virginia, New Hampshire, and Colorado, where your vote had an approximate 1 in 10 million chance of determining the national election outcome. On average, a[n actual] voter in America had a 1 in 60 million chance of being decisive in the presidential election.
All told, these place the average value of c a little under the middle of the range given by the Fermi calculation above, and are very far from Pascal’s Mugging territory.

Voting vs campaign contributions
What are the implications for a causal decision theorist who wants to dedicate a modest effort to efficient do-gooding? The exact value of voting depends on many other factors, e.g. the value of policies, but we can at least compare ways to deliver votes.

Which has more bang per buck: voting in your jurisdiction or taking the hour or so to earn money and make campaign contributions? Last time I estimated a cost of \$50 to \$500 per vote from contributions, more in more competitive races (diminishing returns). So unless you have a high opportunity cost, you’d do better to vote yourself than contribute to a campaign in your own jurisdiction. The standard heuristic that everyone should vote seems to have been defended.

But let’s avoid motivated stopping. The above data indicate frequent differences of 1-2 orders of magnitude across jurisdictions. So someone in an uncompetitive New York district would often do better to donate less than \$50 (to a competitive race) than to vote. (On the other hand, if you live in a competitive district [4], replacing your vote with donations might cost a sizable portion of your charitable budget.)

When we take into differences between election cycles, usually another 1-2 orders of magnitude, the value of voting in a “safe” jurisdiction in an election which is not close winds up negligible (if your reaction to this fact is not independent of others’). For those spending on political advocacy, this provides a route for increased cost-effectiveness: by switching from an even distribution of spending to focus on the (forecast) closest third of elections, you can nearly double your expected effectiveness. Even more extreme “wait-in-reserve” strategies could pay off, but are limited by the imperfection of forecasting methods.

Ties, recounts, and lawyers

Does the possibility of recounts disrupt the above analysis?
It turns out that it doesn’t. In countries with reasonably clean elections, a candidate with a large enough margin of victory is almost certain to be declared the winner. Say that a “large enough” margin is 5,000 votes, and that a candidate is 99% likely to be declared the winner given that margin. Then Candace the Candidate must go from a 1% probability of victory to a 99% probability of victory as we consider vote totals from a 5,000 vote shortfall to a 5,000 vote lead. So, on average within that range, each marginal vote must increase her probability of victory by 0.0098%. Since there are 10,000 possibilities to hit within the range, so long as they have roughly similar prospective probabilities the expected value of the marginal vote will be almost the same as the single “deciding vote” model.

Summary

It is possible to make sensible estimates of the probability of at least some events that have never happened before, like tied presidential elections, and use them in attempting efficient philanthropy.

[1] At least for two-boxers. More on one-boxing decision theorists at a later date.

[2] There are a number of arguments that voters’ role in affecting policies is more important, e.g. in this Less Wrong post by Eliezer. More on this later.

[3] Although for very low values, the possibility that our models are fundamentally mistaken looms progressively larger. See Ord et al.

[4] Including other relevant sorts of competitiveness, e.g. California is typically a safe state in Presidential elections, but there are usually competitive ballot initiatives.

also, cross-posting from OB.

Is it potentially a good charity in a region where rule of law has essentially broken down to fund/​promote the dominant/​police/​stationary bandit side in a tug of war against the non-dominant/​mob/​roving bandit side? Personally, to me US politics looks like a fight between stationary bandits and roving bandits, and permanent near-total defeat of the roving bandits seems like a prerequisite to the reestablishment of real economic growth.

I don’t see myself as partisan, as I’d be happy to support a party of the right OR the left so long as they could offer credible hope for the total destruction of the Republican party as they currently exist. Ironically, this makes me think that altruists should support Palin, as she seems to be the person with the best chance of doing that, and also seems utterly incapable of actually holding power herself, though as a charity, I still prefer SIAI over Palin’s primary campaign by many orders of magnitude.

• Donations to change policy within the partisan subspace, however, only achieve good when they happen to be on the right side of partisan disagreements. Averaged over the disagreeing parties, such donations cannot on average achieve good unless there is a correlation between between donations, or donation effectiveness, and which sides are right. Even if you think you are right at the moment on your particular partisan policy opinions, you can’t think it good on average to encourage partisan donations, unless you think donations tend overall to go to the good or more donation-effective sides.

For every person that sincerely believes that the Flurb Party will change things for the better and donates \$100 to it, there’s someone who believes that the Bleeg Party will change things for the better and donates \$100 to it … so they cancel each other out. Allocating a bigger part of the economy to printing fliers and posters doesn’t seem like the best way to make the world a better place.

• Is it good to prevent bad? If so, should I donate to Flurb simply because I hope to cancel out someone donating to Bleeg?

• Only if Bleeg is truly so much worse than Flurb that the small tip in the chances of the election is smaller than the good your donation would make for a more worthwhile cause.

Also, advocating for partisan political donations in general in a context where the only effect of those donations is to tip the chances one way or the other is irrational (as opposed to advocating for donations to one side in particular, which could be reasonable if you’re certain enough that side is truly much better than the alternative).

As someone said in the comments to Robin’s post, the same goes for encouraging people in general to get out and vote.

• Yes, you should. But another point is that you probably overestimate the chances that Flurb is good and Bleeg is bad. The magnitude as well.

• “Strategic reallocation of political effort” and the additional factor of “strategic reallocation of voting that takes into account other people’s strategic reallocation of voting effort” seems both very complicated to calculate and likely to actually matter to what happens in real elections. I would expect quibbles with your conclusions in this area.

You have one sentence that handles the issue, but I’m not entirely sure how you handled it because your sentence involves two pathenthicals, two double negatives, and ambiguity inducing self reference to “this fact”. Here is the sentence:

When we take into differences between election cycles, usually another 1-2 orders of magnitude, the value of voting in a “safe” jurisdiction in an election which is not close winds up negligible (if your reaction to this fact is not independent of others’).

Here is an attempted rewrite that I think restates the same thing with less ambiguity:

From one election cycle to another, fluctuating global factors account for 1-2 orders of magnitude difference in first order tie estimates. In these situations the value of voting in a “safe” jurisdiction is negligible unless many other people in your “safe” jurisdiction reason identically so that the safe status functions as a secondary global factor that causes the probability of a tie to increase in districts where the “naive” probability of a tie is low.

Assuming this re-writing captures the same basic idea, I think the issue of self-awareness induced ties can be analyzed in terms of the number of people who think of voting as “siding with a winning or losing side” versus “a costly duty to act in a publicly beneficial way”. Voters who think of voting as a costly duty seem potentially subject to self-awareness induced ties. Voters who side with predicted winners seem likely to push dynamics away from these sorts of ties.

This suggests small scale experiments and real world polling where voters are measured to see whether they vote according to one, both, or neither of these dynamics and the numbers who do so are used to refine election predictions.

• The historical data already take into account the rough current distribution of such voters, and the efforts of national political organizations that try to put money into competitive races. If arguments like mine become more widespread in the future, they will change matters.

This post explicitly limits itself to causal decision theory to help avoid these issues, but I’ll discuss them in a future post on decision theory complications. The second parenthetical was an acknowledgment that there is more to say on it.

Experiments and studies like the ones you suggest do seem like they would be helpful in navigating those complications.

• I don’t like the coin model because it ignores replacement.

Assume there’s ten other people in a room. Six like red and four like blue. Four of them will go to the polls, and you’re trying to decide if you should, too. What’s the probability your vote will be the deciding factor?

It’s tempting to use the binomial distribution. p=0.5, n=4. Your vote matters if x=2.

So it’ll be tied without you about 35% of the time.

But this is incorrect. If the first person who votes casts a red ballot, then the probability the next vote is red falls to 59, and the probability the next vote is blue increases to 49. The correct model is the Hypergeometric model because it doesn’t assume replacement.

It computes a higher 43%.

As n increases from 10 to 300000000, I imagine the effect is more dramatic.

• Either way, with large electorates, the sampling error will be swamped (by orders of magnitude) by correlated changes across voters. For instance, the swings in voting behavior from economic conditions regularly move results by a number of percentage points.

• Move relative to what? Last year’s results?

I was imagining getting the probabilities a single voter would vote for candidate X from Gallop.

• I meant that that local stochastic things affecting individual voters are not important in the year-to-year variation in election outcomes, compared to systematic effects like the economy.

If you had an exact fraction of voters who would break for which candidate (which polling isn’t accurate enough to give), you still would face uncertainty about turnout.

• The standard error of polling is usually pretty small.

• Cool example. I’m still confused, though; why model our uncertainty about the electoral outcome as stemming form which folks will go to the polls (while assuming for simplicity that each person has fixed preferences), rather than as stemming from our uncertainty as to how a fixed set of voters will vote (while assuming for simplicity that the set of voters is fixed)?

ETA: Sorry, I edited this after it was replied to, without noticing the reply.

• I assume the randomness comes from sampling error, not from uncertainty about who people will vote for. My parents will always vote for Republicans, but they don’t always participate.

• Let me refocus on my point. I want to estimate the probability my vote will matter.

With population n, participation rate v, and pre-election polling showing r support for the policy, the probability your vote will matter is equal to:

(C[nv/​2,nr]C[nv/​2,n(1-r)])/​C[n,nv]

• The post compares taking roughly one hour to vote against using the hour to earn money and donating it to campaigns, on the basis of one vote versus an expected number of votes. But this ignores secondary effects of voting, such as communicating honestly or dishonestly with other voters, that may be more important than the vote itself.

• Note that the analysis holds for a single rational voter.

If many people decide using similar considerations, then donations go up, electoral turnout falls, and extremists (who can’t be swayed by advertising or campaigning) and non-rationalists (who do not apply the OP’s analysis) will be over-represented. This is a distorting influence.

If donations go up then candidates suffer who attract relatively little funding (of the normal type, not the type of donations which rationalists use to replace voting). This is a second distorting influence.

A drop in electoral turnout can be seen as decreasing the winner’s perceived legitimacy. This might be an unintended consequence.

• Yes. Carl’s post notes that he’ll assume CDT for this post, for simplicity, and will consider decision theories later.

But even if we go ahead and allow non-CDT complications: we’re considering elections here, and for elections, we have solid past data indicating how most people act. In such situations, even if one doesn’t assume CDT, reasoning on the present margin seems to be the correct thing to do. You know how many people, roughly, behave one way vs the other. It’s correct to ask about the benefits of moving the voters from [usual number] to [usual number + 1], or the campaign donations from [usual number] to [usual number + yours], and not to consider the rather different average changes that would be brought about in moving the current totals to a far-away and unlikely total.

For example, if I’m considering whether to be vegetarian or to donate to in vitro meat, I should ask about the benefits of one person doing so; the argument “but if everyone donated to in vitro meat, their ability to use money would be overwhelmed, and this would be less useful than everyone being vegetarian” is irrelevant.

The situation would be different if I was e.g. considering the action-shift in response to a national bestseller that advocated that action, or if I was otherwise being moved by considerations that might affect enough people to significantly change the margin, and, thus, the marginal impact.

• Yes. Carl’s post notes that he’ll assume CDT for this post, for simplicity, and will consider decision theories later.

Since we know that CDT is totally wrong in such situations, even if TDT/​UDT doesn’t help with quantitative analysis, “for simplicity” doesn’t quite side-step the flaw.

• Since we know that CDT is totally wrong in such situations, [assuming CDT] “for simplicity” doesn’t quite side-step the flaw.

We also know that frictionless planes are totally wrong in most situations. That doesn’t mean that assuming a frictionless plane “for simplicity” is not a reasonable first step when attempting a difficult analysis. As Polya teaches: when considering a problem that is too difficult, start with a similar problem related to your target.

Most people despair of calculating optimal philanthropy payoffs at all because the situation is so complicated. The result is a huge inefficiency of most philanthropic efforts. If we’re going to make headway, it will have to be by considering and expositing simple pieces and building up piece by piece, as Carl begins to do here.

• We also know that frictionless planes are totally wrong in most situations. That doesn’t mean that assuming a frictionless plane “for simplicity” is not a reasonable first step when attempting a difficult analysis.

If the problem is “continuous”, you’ll get sufficiently correct solution for sufficiently low-friction problems. In this sense the assumption of lack of friction is not “totally wrong” in the sense I used the term in my comment for CDT/​TDT voting analysis differences.

As Polya teaches: when considering a problem that is too difficult, start with a similar problem related to your target.

I agree with this observation: you learn about the structure of methods of solving the target problem by studying similar methods of solving simpler problems, even if solutions (answers) are unrelated (not similar).

However, I don’t see how CDT analysis with its deciding votes is at all similar to TDT analysis that involves no such concept, and so how this observation is relevant.

Most people despair of calculating optimal philanthropy payoffs at all because the situation is so complicated. The result is a huge inefficiency of most philanthropic efforts. If we’re going to make headway, it will have to be by considering and expositing simple pieces and building up piece by piece, as Carl begins to do here.

It’s often a reasonable strategy, but not if the “pieces” have nothing to do with the desired whole.

• Could you say more about how the TDT voting analysis would go, and what its pieces would be?

It seems to me that in the limit as the number of voters with “your algorithm” goes to zero, the TDT solution is the same as the CDT solution.

• That’s the more interesting topic, and it came up when I visited the NYC LW crew last week.

My take is that, if TDT really is superior to other decision theories, then a society of majority-TDTers should not lose out to “mindless drone decision theorists” (MDDTers) simply by all individually refusing to vote, while the MDDTers vote for stupid policies in unision.

The TDTers would, rather, recognize the correlation between their decisions, and reason that their own decision, in the relevant sense, sets the output of the other TDTers, so they have to count the benefit of voting as being more than just “my favored policies +1 vote”. I conclude that a TDTer would decide to vote, reasoning something like “If I deem it optimal to vote, so do decision makers similar to me.”

The others there disagreed that TDTers would vote in such an instance, claiming that other methods of influencing the outcome exceed the effectiveness of voting in all situations.

• The others there disagreed that TDTers would vote in such an instance, claiming that other methods of influencing the outcome exceed the effectiveness of voting in all situations.

This seems to suggest that a society of TDTers would quickly abandon democracy. What form of government would they move to?

• Elaborate on your reasoning there.

• Were you not talking about a society of TDTers that didn’t think it was worth voting? Or were you allowing for a sufficient number of irrational nuts in the system for the democratic process to be useful or necessary even though the majority (and all the rational people) do not use it?

• Well, the particular scenario I had in mind was a democratic one (where the MDDTers believe in democracy), and the eligible TDTers could win every election if they (nearly) all voted, and where the MDDTers vote in unison for stupid policies. And the questions is whether the TDT algorithm outputs “vote”; their decision not to vote is not an assumption (though perhaps they agree that, at least per CDT rules, voting is pointless).

If you’re asking what the proposed non-voting TDT-compliant alternative is, and if it would involve keeping a democratic system, then I’ll say what I should have earlier: I don’t know—that’s something I was trying to find out from those who disagreed with me. One of them said that any amount of effort spent voting would be better spent propagandizing, so there is no margin where the TDTer deems voting optimal.

I was skeptical: once you accept that TDTers “naturally” make correlated decisions (in this type of problem), your vote “controls” something much more effective (the decision of a majority of voters). Then, even under generous assumptions about alternate uses of your voting effort, and aggregating this across all TDTers, and recognizing the mind-shields that various levels of drones put up, it’s not clear why propagandizing is better.

To the extent that the drones are maximally mindless, your propaganda does nothing to change their minds, either on the object level (this election) or meta level (which political system is best). To the extent that the drones are “reasonable”, a certain fraction of their votes will go toward the TDT-favored policies anyway, further reducing the threshold TDTers have to meet to get good policies.

• I was skeptical: once you accept that TDTers “naturally” make correlated decisions (in this type of problem), your vote “controls” something much more effective (the decision of a majority of voters). Then, even under generous assumptions about alternate uses of your voting effort, and aggregating this across all TDTers, and recognizing the mind-shields that various levels of drones put up, it’s not clear why propagandizing is better.

That is approximately my thinking too.

To the extent that the drones are maximally mindless, your propaganda does nothing to change their minds, either on the object level (this election) or meta level (which political system is best). To the extent that the drones are “reasonable”, a certain fraction of their votes will go toward the TDT-favored policies anyway, further reducing the threshold TDTers have to meet to get good policies.

I suppose this depends just how open minded the TDTers are when it comes to considering alternative ways to enforce their influence over policy in the case of pointless propaganda ;)

• This analysis of consequences of your decisions doesn’t just say that other people who perform similar analysis are influenced by your decision. People who make their decisions differently can be (seen as) influenced as well.

• Could you say more about how the TDT voting analysis would go, and what its pieces would be?

I don’t know. I know that CDT commits irrecoverable error, but not how to understand the problem. (I can guess that my decision probably makes a difference of 0.01 to 20% in a two-choice vote of the typical kind, but this is not based on explicit analysis, hence wide interval.)

That I don’t know how to solve the problem doesn’t license me to privilege a “solution” that is known to be incorrect (even though it’s rigorous and popular).

It seems to me that in the limit as the number of voters with “your algorithm” goes to zero, the TDT solution is the same as the CDT solution.

Yes, but it’s an unreasonable assumption in case of voting, and I don’t see how to generalize in the direction of acausal-under-logical-uncertainty control from a solution performed under this assumption. From what I currently understand, the question is, what can you predict about all voters (how would you estimate the outcome), if you assume that you actually make a certain voting decision (estimate this for all possible decisions). Such assumption can even weakly inform you about probable decisions of other voters that are rather loosely related to you, with the estimated probability of voting by person X being controlled by your decision less if you are less similar to X, but with (your understanding of decisions of) all people controlled to some extent.