[not ongoing] Thoughts on Proportional voting methods

Jameson Quinn22 Jul 2020 2:46 UTC

32 points

About this post

This post was a work-in-progress, I (Jameson) updated it over the course of weeks/months, and the reposted it when it got substantial updates. I have now decided to leave it as it is for historical interest, as the next step would be a ground-up rewrite. I’m not sure when I’ll have the time to finish that ground-up rewrite but I’d say there’s a >80% chance something like that will happen within the next year; that is, before September 2021.

Status

I think I’m going to stop making large edits to the article as it stands. I’ll add a longer “post-mortem” at the end. In sum: I think that the process of writing this article has helped me sharpen my thinking and make several breakthroughs, but that making an article that best conveys these insights to others requires rewriting from scratch. I still think that it would be worthwhile to eventually write the unfinished parts of my initial outline, but that attaching them to this thinking-it-through prologue would not do them justice.

Epistemic status

This is largely a work of political philosophy with a mathematical bent. That is to say, it is working through the implications of assumptions; more specifically, it is an attempt to find a way to estimate the implications of certain normative assumptions in certain real-world situations. As such, it rests on a larger context of thought about those assumptions — whether they are realistic, whether they are desirable — which I cannot possibly fully explore here.

But leaving aside those connections to wider ideas and taking the assumptions I make here as axioms, how confident am I of the train of thought that is contained within this essay? Reasonably so. At various times I gesture towards or sketch an idea of a mathematical proof for a claim, but I do not and have not worked out those proofs fully, so I’m aware that they may not “compile”. Still, as somebody who has a PhD and so has the experience of following an idea more of the way to the ground, I would be pretty surprised (<10% subjective chance) if there were some mistake here so large that it would invalidate the main ideas (rather than just requiring minor adjustments). I would of course be a lot less surprised if someone found a better or simpler way to reach the same goals I’m pursuing here.

Last 3 updates

V0.9.1: Another terminology tweak. New terms: Average Voter Effectiveness, Average Voter Choice, Average Voter Effective Choice. Also, post-mortem will happen in a separate document. (This same version might later be changed to 1.0)

V0.9: I decided to make a few final edits (mostly, adding a summary and a post-mortem section, which is currently unfinished) then freeze this as is.

V0.7.3: Still tweaking terminology. Now, Vote Power Fairness, Average Voter Choice, Average Voter Effectiveness. Finished (at least first draft) of closed list/Israel analysis.

V 0.7.2: A terminology change. New terms: Retroactive Power, Effective Voting Equality, Effective Choice, Average Voter Effectiveness. (The term “effective” is a nod to Catherine Helen Spence). The math is the same except for some ultimately-inconsequential changes in when you subtract from 1. Also, started to add a closed list example from Israel; not done yet.

V 0.7.1: added a digression on dimesionality, in italics, to the “Measuring “Representation quality”, separate from power” section. Finished converting the existing examples from RF to VW.

Audience

I have been thinking about proportional voting methods, and have some half-baked thoughts. “Half-baked” means that I think the basic ingredients of some worthwhile new ideas are in there, but they’re not ready for general consumption yet. By the time you read these words, the thoughts below may well be more than half-baked.

I think it’s worthwhile to begin write these thoughts out with a real-world audience in mind. I’m choosing to do that here. The readership of this site is not the ideal audience, but it’s close enough.

Here are some assumptions/standpoints I’m going to take:

My readers are as well-informed as I could reasonably imagine them to be. That is, they’ve read and understood the three posts of my prior sequence here on voting theory; are familiar with some basic concepts of statistics such as variance, expectation, distribution, sampling, consistency, estimators and estimands, and Bayes theorem.
“Good democracy” is a normative goal. A big part of this is trying to define that more rigorously, but I won’t be focusing too much on defending it philosophically.
I may at times, especially later on, veer into the territory of politics and “theory of change”; that is, how it might be possible to actually put the ideas here into practice. At such times, I won’t hide the fact that I myself hold generally leftist/progressive views. Still, whenever I’m not explicitly discussing means, I believe that the democratic ends I focus on are defensible from many points of view. Obviously, democracy isn’t compatible with something like dogmatic anarchism or dogmatic monarchism, but I think that even a soft-edged anarchist or monarchist might learn something from these ideas.

...

Terminology note

I will discuss both democratic “systems” and voting “methods”, and these may appear to be interchangeable terms. They are not. “Methods” are specific algorithms which define an “abstract ballot” (set of valid vote types), and map from a tally of such votes to an outcome. “Systems” are broader; they include “methods”, but also rules for when and how to hold elections, who gets to vote, etc.

Here’s some simple notation/terminology I’ll use:

V, the total number of voters / ballots
S, the total number of seats to be elected in this election. (In other sources, this is often referred to as M, for “district (M)agnitude”, in cases where large elections are split up by district into various smaller elections, typically with only 3-5 seats per district.)
Hare Quota: V/S, the number of votes each winning representative would have if they all had exactly equal numbers and no votes were wasted.
Droop Quota: V/(S+1), the number of votes each winning representative would have if they all had an equal number but one quota of votes was wasted on losing candidates.

...

Goal/endpoint of this piece

My goal here is to begin to put the comparative study of proportional (multi-winner) voting methods on a footing that’s as solid as that of single-winner methods. In the case of single-winner methods, that solid footing is provided primarily by having a clear, unified metric: Voter Satisfaction Efficiency (VSE), which I’ll explain briefly below. Although this metric is in principle sensitive to assumptions about the distribution of electorates, it is in practice robust enough to provide at least some useful guidance as to which methods are better or worse, and as to which pathologies in given methods are likely to be common or rare. That makes it much more possible to have a useful discussion about the tradeoffs involved in voting method design.

This is not to say that any metric will ever be the be-all-and-end-all of mechanism design. In the single-winner case, even though we have VSE, we still have to answer questions like: “Is it worth advocating for a method with better VSE that’s more complex and/or harder to audit, or should we stick with something with a slightly worse VSE that’s simpler?” But without having a metric, that debate would very quickly devolve into self-serving bias on both sides; everybody would tend to weight the scenarios where their preferred method performed well more heavily and discount those where it performed poorly. In other words, without a clearly-defined metric, it’s almost impossible to properly “shut up and multiply”.

Defining a good metric for proportional methods is harder than it is for single-winner methods. I am not the first person to attempt this, but I find prior attempts fall short of what is needed. As I write these words, I’m not finished defining my metric, but I do have a clear enough idea that I think I can do better than what’s gone before. I also have a clear enough idea to know that, even when I’m done with this, my multi-winner metric will not be even as “mathematically clean” as VSE is. In fact, I suspect it may be provably impossible to create a mathematically clean metric with all the characteristics you’d like it to have; I suspect that it will be necessary to make some simplifying assumptions/approximations even though they are clearly violated. I hope it will be possible to do so in a way such that the measured quality of practical voting methods is robust to the assumptions, even if there might be methods that could intentionally exploit the assumptions to get unrealistic quality measures.

Once I’ve defined a metric, I’ll discuss practicalities: which proportional methods I believe may be better in which real-world contexts, and how one might get those adopted.

TLD̦R (Summary)

It’s well-known how to connect single-winner voting methods to utilitarian consequentialism. Doing the same for multi-winner voting methods has, until now, been harder. I provide a scheme for estimating a multi-winner voting method’s utilitarian loss, retrospectively, using only the ballots and knowledge of the method. The metric I propose is a number from 0% (bad) to 100% (ideal; in practice, unattainable). I call it Average Voter Effective Choice (AVEC), and it’s the product of Average Voter Effectiveness (AVE) times Average Voter Choice (AVC). I begin to give examples of AVEC calculations in a few situations, but stop before I finish all the voting methods one might look at.

What does it mean to say a system is “democratic”?

The archetypical case of democracy is a referendum. A new law is proposed; “all” the people get to vote “yes” or “no”; and “majority rules”. In practice, “all” tends to be limited by some eligibility criteria, but I’d say that most such criteria make the system less democratic, other than maybe age and residency. The normative case that this kind of democracy is a good idea tends to rest on the Condorcet jury theorem: that is, if the average voter is more likely to be “right” than “wrong” (for instance, more likely to choose the option which will maximize global utility), then the chances the referendum outcome will be “right” quickly converge towards 100% as the number of voters increases.

But of course, not every collective decision naturally reduces to just two options, and not “all” the people have time to vote on all the collective decisions. (Which decisions should be made collectively and which should be made individually is out of scope for these musings, but I believe there are at least some cases that fall clearly on each side.) So not all “democracy” will be like that archetypal case. Still, it’s pretty clear that any more-complex democratic system must be “similar” to the archetypical case in some sense.

When you’re choosing between a finite number of options that are known at the time of voting, you need a single-winner voting method. It’s pretty easy to define a way to measure “how good” such a method is: we can just use utilitarianism. (That doesn’t mean you have to entirely buy in to utilitarianism as an ethical/philosophical system; even if you have reservations about it in a broader sense, I think it’s pretty clear that it’s a useful model for the purposes of comparing voting methods in the abstract, in a way that other ethical systems aren’t.) A voting method is good insofar as it maximizes expected utility. To actually evaluate this expected utility, you’d need a distribution of possible electorates, where each possible electorate has a set of options and a set of voters. Each voter has some utility for each option, some knowledge about the other voters, and some rule for turning their utility and knowledge into a vote. As you might imagine, actually working this all out can get complex, but the basic idea is simple.

Choosing a single leader can be seen as a special case of choosing between various options. But electing a legislature is something different. How do you define the utility of a legislature that has a ⁶⁰⁄₄₀ split between two parties, versus one that has ⁷⁰⁄₃₀ split? There is (to me at least) an obvious intuition that a legislature is more democratic if it’s more representative, and more representative if it’s more proportional. With some additional assumptions, the drive for proportionality might be made more rigorous; if voters would be ⁶⁰⁄₄₀ on X vs Y but legislators are ⁷⁰⁄₃₀, there must probably be some X’ and Y’ such that voters would be ⁴⁹⁄₅₁ but legislators would be at least ⁵¹⁄₄₉.

Still, “representative” or “proportional” are still just words; it turns out that, outside of simple special cases where there are a few pre-defined parties and voters care only about partisanship, political scientists don’t tend to define these ideas rigorously. But we can do better. One way we might define “representative” comes from statistical sampling. Imagine that the legislature’s job is merely to reproduce, as faithfully as possible, direct democracy, without ordinary voters having to waste all that time choosing and voting. In a statistical sense, that would mean the loss function would be something like: the expected squared difference between the percent a proposal gets in the legislature, and the percent it would have gotten in a direct referendum, given that the proposal is drawn from some predefined distribution over possible proposals. Note that we can still use the Condorcet jury theorem to argue that this would maximize utility, but now not only are we piling assumption on top of assumption, we’re also making approximations of approximations. The legislative outcome is an approximation of the popular outcome which is an approximation of the ideal outcome; if the first step of approximating goes wrong, there’s no reason to believe the second step will fix it.

(This loss function is effectively [the expectation of] a metric between distributions: measuring a kind of distance between the voter distribution and the legislator distribution, using the proposal distribution as an underlying probability measure. To help get an intuition for what this means, imagine that ideology is a single dimension, with the voters somehow distributed along it, and each proposal is a cut point. Assuming the proposal distribution is continuous, we could without loss of generality rescale the ideology so that the proposal distribution is Uniform(0,1), and then graph those proposals on the x axis with the proportion of voters to the left of the cut on the y axis. We could superimpose a graph with the proportion of legislators to the left of the cut on the y axis; this loss function would then be the integral of the square of the vertical distance between those graphs.)

A sketch to help visualize the above

This is a rigorous definition of “representative”, but a deeply unsatisfying one. By this definition, one of the “best” possible voting methods would be sortition; just choose representatives uniformly at random from the population. We know from statistical sampling theory that in this case, the loss function stated above, for literally any possible question, is asymptotically consistent; it falls off as the inverse square root of the size of the legislature. (Such a simple random sample isn’t actually the best; you can improve it by doing variance-minimizing tricks such as stratifying on self-declared party or other salient political variables. But that’s a minor detail which changes the error by a roughly constant factor, not by changing the basic limiting behavior.)

But in the single-winner case, sortition (even stratified sortition) just means randomly choosing a dictator, which is clearly nowhere close to being among the most-democratic single-winner methods. Clearly, there’s some democratic ideal or ideals missing from the above loss function.

There are at least two characteristics that good representatives should have, besides having similar views collectively to those whom they represent: they should be exemplary, and they should be accountable.

“Exemplary” means that in some regards, you want them to be systematically different from the average of the people they represent; smarter, less ignorant, better at positive-sum negotiation, etc. For instance, a linguistic minority would want to be represented by somebody who was fluent in the language of business in the legislature, even if they themselves mostly were not. Sortition is not a good solution here.

“Accountable” means that there is at least some mechanism to help better align incentives between representatives (agents) and their “constituents” (principals). This generally means that voting rules work to aggregate preferences in a way that considers all votes equally. Again, sortition fails this badly.

Note that by adding “exemplary” and “accountable” to our list of goals, we’ve opened up the possibility for legislative outcomes to be not just as-good-as direct democracy, but actually better, in several ways. First, it may even be that there is some optimum size for reasoned debate and group rationality; too small and there is not enough diversity of initial opinions, too large and there is not enough bandwidth for consensus-seeking discussion. Second, exemplary representatives may be better than ordinary voters at understanding the impact of each decision, or at negotiating to find win-win compromises. It’s difficult to imagine formalizing this without making huge arbitrary assumptions, but it’s similar to the idea of “coherent extrapolated volition”; that is, exemplary representatives may be able to extrapolate voters’ volition in a way that the voter might not exactly agree with a priori but would see in retrospect to be better. Which brings us to the third way better outcomes might be possible: by making representatives accountable primarily not on a moment-by-moment basis, but on a once-per-election time cycle, we might nudge them to think from a slightly more future-oriented perspective.

This idea — that representative democracy doesn’t just save voters’ time; that it, in principle, can actually give better outcomes than direct democracy — is so important, it already has a name: republicanism. (I’ll be using that word as I continue this article; to avoid confusion, I’ll refer to the main US political parties as “GOP” and “Dems”.)

It’s clear that all three of these ideals — representativity, exemplariness, and accountability — have to do with democracy. One can imagine pairs of situations that differ on only one of these three axes, and still, one member of the pair will seem (in my strong intuition) more democratic than the other. For example, imagine the fictional planet mentioned in The Hitchhikers’ Guide to the Galaxy, where people vote to elect lizards. Though the complete lack of representativity makes this a poor democracy, it would clearly be even less democratic if the losing lizard committed electoral fraud.

...

Digression: dimensionality and representation

Above, we’ve defined “representativeness” as some kind of difference-in-distributions between the voters (that is, their support for any given proposal drawn randomly from the distribution of possible proposals) and the legislature (that is, the expected support for any given proposal, where the expectation is taken across any randomness in the voting method). ((Note that in some cases, when considering the legislature’s “expected” support for proposals, it may be convenient to condition, not on the ballots, but only on the voter’s views on all proposals; that is, to consider the possible variability in voters’ strategic behavior as both “random” and “part of the voting method”.))

Since the number of voters is usually quite large compared to the size of the legislature, we might imagine/approximate the voter distribution as a continuous “cloud”, and any given legislature as a set of points intended to represent that cloud. Since we’re interested in how well those points can possibly represent that cloud, one key variable is the effective dimension of that cloud.

To be specific: we know that it’s possible to represent the cloud spatially, with any possible proposal as one of the dimensions. This is potentially an infinite-dimensional space, but in practice, it will have some dimensions with high variance, and some dimensions with negligible variance.

You’ve probably seen “political compass” diagrams which project political ideologies onto two dimensions:

But of course, those aren’t the only two dimensions you could use to categorize voters:

If we use the dimensions that tend to align well with how voters would divide up on high-probability legislative proposals (the principal components), how many dimensions would we need? ((OK, “effective dimension” isn’t exactly that; it measures not only how many “relatively big” dimensions there are, but also how “relatively small” the rest are. I’m being deliberately vague about how precisely I’d define “effective dimension” because I suspect that unless you ignore variation below a certain noise threshold the ED is actually infinite in the limit of infinite voters.))

If the “effective dimension” is low, then a relatively small legislature can do quite a good job at representing even an infinite population. If the effective dimension is high, though, then the ideological distance between a randomly-chosen voter and the closest member of the legislature, will tend to be almost as high as the distance between any two randomly-chosen voters. In other words, there will be so many ways for any two people to differ, that the very idea that one person could “represent” another well begins to break down (unless the size of the legislature grows exponentially in the number of effective dimensions)

What is the effective dimension of political ideology in reality? It depends how you measure it. If you look at US incumbent politicians’ voting records using a methodology like DW-NOMINATE, you get a number between 1 and 2. If you look at all people’s opinions on all possible issues, you’d surely get a much higher number. I personally think that healthy political debate would probably settle at about 𝓮 effective dimensions — that is, a bit less than 3. For getting a sense of this, 5^𝓮≈80, 10^𝓮≈500, 20^𝓮≈3500; so, if I’m right, a well-chosen reasonably-sized legislature could do a reasonably good job representing an arbitrarily-large population.

But if I’m wrong, and the variation in political opinions/interests that are politically salient in an ideal world is much higher, AND it’s impossible to deal well with “one issue at a time” (that is, tradeoffs due to high-dimensional structure are meaningful), then the very “republican idea” of representative democracy is problematic. A large-enough legislature could still serve as a random polling sample, reproducing the opinions of the larger population on any arbitrary single question with low error. But as the legislature debated multiple issues, considering the balance of possible tradeoffs between several related choices, the chances that there would be some combination of points of view that exists in the population but not in the legislature would go to 100%.

Understanding dimensionality is especially important for analyzing voting methods that involve delegation. If politically-salient ideologies are relatively low-dimensional, then your favorite candidate’s second-choice candidate is likely to be someone you’d like. If they’re high-dimensional, that two-steps-from-you candidate is not significantly better from your point of view than a random choice of candidates. I’ll talk more about delegation methods, which can help simplify ballots and thus allow broader overall choice, later.

God’s proportional voting method

Say that an omniscient being with infinite processing power wants to select a legislature of size S to represent a population 𝒱 of V chosen ones. God knows the distribution of proposals that this legislature will vote on (and for simplicity, that distribution is independent of who is in the legislature), and she knows the distribution of utility that each outcome option will give to each person v in 𝒱. (For technical reasons, let’s assume that each chosen one’s distribution of utility for each option is absolutely continuous with regard to some common interval. If you don’t know what that means, you can safely ignore it.) Her goal is to use a process for choosing a legislature that will:

Have zero bias. For any option the legislature might consider, the expected distribution of utilities of the legislature is (effectively) the same as the expected distribution of utilities of the voters. (That is to say, the first moments of the two distributions are identical, and their higher moments are at-least-similar in a way that would be tedious to write out.)
Tends to choose more-pious legislators, if it doesn’t cause bias. Because she’s putting a higher priority on unbiasedness than on piety, she’ll have to focus on that component of a chosen-one’s piety that is independent of the utility they’d get from all options. (In the real world, of course, we’d substitute some other qualifications for piety. Note that these qualifications could be subjective; for instance, “how much voters who are similar to this candidate want to be represented by them”.)
Minimizes the variance of the distribution of distributions of utilities of legislators, as long as that doesn’t cause bias or reduce piety. In ordinary terms, minimizing the variance of the meta-distribution means she’s trying to get a legislature that represents the people well concretely, not just in expectation. In other words, if 90% of chosen ones like ice cream, around 90% of the legislators should like ice cream; it’s not good enough if it’s 100% half the time and 80% half the time for an average of 90%.

Of course, since this is God we’re talking about, she could come up with selection rules that are better than anything I could come up with. But if I had to advise her, here’s the procedure I’d suggest:

1. Divide the population up into equal-sized groups, which we’ll call “constituencies”, in a way that minimizes the total expected within-group variance for an option that’s randomly-chosen from the distribution of votable proposals. In essence: draw maximally-compact equal-population districts in ideology space.

2. Within each constituency, give each voter a score that is their piety, divided by the expected piety of a person in that group with their utilities. (I’m intentionally being a bit vague on defining that expectation; there are tighter or looser definitions that would both “work”. Tighter definitions would do less to optimize piety but would be unbiased in a stricter sense; vice versa for looser definitions.)

3. Choose one representative from each group, using probabilities weighted by score.

In other words: stratified sortition, using probabilities weighted by piety and “inverse piety propensity”.

What’s the point of even discussing this case? Obviously, if God were choosing, she wouldn’t need a legislature at all; she could just choose the best option each time. But I bring this up to illustrate several points:

Having individual representation, where each voter can point to the representative they helped elect, and each representative was elected by an equal number of voters, is good from several perspectives. Even without this example, it would have been obvious that individual representation ensures a form of proportionality, and that it is good for accountability. But this example shows that it can be seen as a variance-reduction technique: if our God had skipped step 1, she’d still have gotten a legislature that was unbiased and pious, but it would have had higher variance.
It’s possible (at least, if you’re omniscient) to design a method that optimizes for several different priorities. For instance, in this case, we could optimize for piety even while also ensuring unbiasedness and minimizing variance.

God’s MSE decomposition?

Let’s say God’s busy, so she asks the archangel Uriel to choose the legislature. Uriel isn’t omniscient, so he doesn’t know everyone’s utility for each proposal option, and even if he did, he couldn’t perfectly integrate that over the high-dimensional distribution of proposals the legislature might vote on. So he merely asks each chosen one to write down how they feel about each candidate on a little piece of paper, then, using the papers, does his best to choose a representative body. In other words, he holds an election.

When God returns, how will she judge Uriel’s election? First off, let’s say that she doesn’t just judge the particular outcome of one election, but the expected outcome of the method across some multiversal distribution of possibilities. She’s omniscient, so she directly knows the expected mean squared error (MSE) between the percentage supporting each proposal in an idealized version of direct democracy (correcting towards coherent extrapolated volition, as much as republicanism allows), and the percent supporting within the chosen legislature. But to help Uriel understand how he’s being judged, I expect she’d decompose this MSE into various components, following the same step-by-step logic as I outlined in my suggestion above for her own selection method.

Classically, MSE can be decomposed into squared bias plus variance. In many cases, each of these can be decomposed further:

Variance can be decomposed using “Eve’s law”, and in particular, breaking down into the sum of variances of *independent* summed components. When components are not independent, we can still break down into two separate terms plus one cross term.
Squared bias can be decomposed in a similar way; as the sum of squares of orthogonal components, or as the sum of squares of near-orthogonal components plus cross-term corrections for non-orthogonality.

So let’s go through the steps of God’s process; suggest a way to establish a correspondence between that and an arbitrary voting method; and see what components of MSE might have arisen at each stage.

))(I bet you missed the big reveal! That is to say, I think the most novel and philosophically-meaty idea in this essay was buried in clause 2 of the sentence just above. Establishing a step-by-step correspondence between an arbitrary voting method, and a single unrealizable “perfect” method, is, I believe, an important idea, as I’ll discuss further below. I would have said so earlier, but I didn’t see a way to get there directly without laying all the groundwork above.)((

God’s first step was to divide voters into S equal-sized “constituency” groups, minimizing the variability within each constituency. Multi-winner voting methods do not necessarily divide voters into S groups. However, they do, by definition, elect S winners. So if we want to assign voters into S (possibly-overlapping) constituencies, we have merely to trace the chain of responsibility backwards: for each of the winners, which voters were responsible for helping them to win?

In more quantitative terms: for each candidate c and voter v, we want to assign a quantity of responsibility r(c,v), such that the sum over v for any given c is always 1. I’ll be saying a LOT more about how this might work below.

There are several kinds of errors that could happen at this step:

Inequality: some voters could have higher “total voting power” — that is, sum over c of r(c,v) — than others. In fact, there are already voting methods built around minimizing the variance of one version of quantity. This would usually tend to lead to bias, as the views of voters with lower total voting power would tend to be underrepresented in the legislature.
- An important special case of this is that some votes are, in practice, wasted, with zero voting power; they did nothing to help elect a candidate. In practice, it’s impossible to create a voting method that elects S equally-weighted candidates without wasting some votes; typically at least somewhere between V/2S and V/(S+1) of them. Counting wasted/ineffective votes is one simple way to rate voting methods; in fact, a simple formula for estimating wasted votes, known as the “efficiency gap”, was at the heart of the plaintiff’s arguments in the US Supreme Court case Gill v. Whitford.
Imprecision: instead of dividing voters into S separate constituencies with one representative per group, the method might divide them into fewer groups with more than one representative each. This “overlap” would usually tend to lead to higher variance at the next step, as representatives selected from larger overlapping “ideological districts” would be idelogically-farther from the constituents they represent.
Inappropriateness: The S constituencies could be equal and non-overlapping (in ideological space), but still less compact than they could have been. That is, the voting method could have done a poor job of grouping similar voters together, so that no member from each group could do a good job representing that whole group. This could lead to bias, variance, or both.
(I should also mention that turning a complex voting method outcome into r(c,v) measurements inevitably loses some information; if it’s done poorly, any resulting measure of voting method quality may be misleading.)

Rough picture of the above issues. Not perfect; don’t @ me. Note that the space of the diagram is intended to be ideological, not geographical.

God’s second step was to choose one representative for each of the equal-sized groups, using an algorithm that balanced the goals of exemplary “piety” and ideological unbiasedness. Note that in rating a voting method, we’re doing this “backwards”: we constructed the “set of responsible voters” from the candidate, not vice versa. Still, we can use this “backwards” procedure to consider the inherent biases and variances of whatever arbitrary “forwards” voting method that Uriel actually used. These might include:

Possible bias in choosing from each constituency: Something about the voting method, or about the interaction between the voting method and voter behavior, could cause bias.
- Representatives might be systematically drawn from one side (eg “the north”) of their constituency, across the board
- Representatives might be systematically drawn from the centroid of their constituency, but the voter sets might be chosen in a way such that these “centroids” are biased in certain dimensions.
- Representatives might be chosen in a way that inappropriately ignores exemplary “piety” (that is, those dimensions on which they *should* be biased compared to the voter population).
Possible variance in choosing from each constituency: Representatives might be systematically far from the center of their constituency. Even in cases where this doesn’t cause bias, it’s still a problem.

I’ve listed various different possible problems, and suggested that they could be the basis of an MSE decomposition. That is to say, God could begin grading the outcome by using the above problems as a rubric; subtracting points separately for problems of each type. But of course, that kind of MSE decomposition only works if the different kinds of problem are orthogonal/independent. So in practice, God might end up adding or subtracting a less-effable “fudge factor” at the end.

Allocating responsibility: a tricky (and important) problem

A key step in “God’s MSE decomposition” for evaluating a multi-winner voting method is assigning responsibility; deciding, using the ballots, which voters should be considered “constituents” of which winning representative. That’s not at all as simple as it might seem. Even if you know the ballots, the voting method, and the winners, it’s not necessarily easy to make a function r(c,v) to say which ballots helped elect each of the winners and by how much. You know not only exactly the outcome in reality, but also the counterfactual outcomes for every other possible combination of ballots; still, how do you turn all that into the numbers you need?

Before going on, we should note that although we are interested in using the answer to this problem as one sub-step in evaluating multi-winner voting methods, the problem itself applies equally-well to single-winner voting methods: given that candidate X won, which voters are responsible, and by how much? Thus, until further notice, most of the concrete cases we’ll be considering come from the simpler domain of single-winner methods.

With single-winner plurality, it’s easy to allocate responsibility using symmetry: all the N people who voted for the winner did so equally, so each of them has 1/N responsibility. This is independent of how many voters there are or how many voted for other candidates; N could be 99%, or 50.01% in a two-way race, or even 1% in a 101-way race.

But that kind of symmetry argument doesn’t work for complex voting methods with many more valid ways that each voter could vote than there are candidates. In such cases, even briefly considering edge cases such as Condorcet cycles or impossibility results such as Arrow’s or Gibbard-and-Satterthwaite’s theorems suggests that allocating responsibility will be no simple task.

The problem of allocating responsibility has been considered before in the context of single-winner voting methods; there are existing formulas for “voter power” indices, and my eventual answer is similar to one of them. But this issue actually goes well beyond voting. There are many situations when an outcome has occurred, but even though the inputs and counterfactuals are more-or-less agreed-upon, people can still argue as to which of those inputs are responsible.

Here’s my plan for discussing this question:

Start with a story of how judges and lawyers have failed to resolve this question over thousands of years.
Give an example or two of why it’s difficult to definitively allocate responsibility, even when you know all the relevant facts and counterfactuals.
Tell an interesting story of how mathematicians struggled and ultimately succeeded in cleanly solving a problem that’s similar but different in key ways. I think that the solution they eventually came to for that problem is exactly wrong for this one, but the kind of cognitive leaps they had to make may help suggest approaches.
Sketch out what an ideal solution to allocating responsibility should look like.
Construct a solution that I think is as ideal as possible, even though it’s not entirely perfect or mathematically clean.
Argue that it may be impossible to do much better.

Consider the problem of legal liability. There are many cases of laws that where one of the criteria for a person being liable is that their actions “cause” a certain outcome. One traditional legal definition for “cause” in this sense is “sine qua non”, Latin for “without this, not”; in other words, X causes Y iff Y wouldn’t have happened without X. But what if two people independently set separate fires on the same day, and as a result my house burns down? Under a “sine qua non” standard, person A isn’t liable because even without them fire B would have burned my house, and vice versa.

This merely shows that the traditional legal standard of “sine qua non” is silly, not that it’s not easy to find a good standard. I know; the fact that lawyers and judges haven’t figured out how to allocate responsibility, doesn’t prove that doing so is truly hard. But it does at least show that it’s not trivial.

If all you care about is whether or not someone was partially responsible for an outcome, and not how much responsibility they deserve, there’s a more-or-less simple, satisfying answer (and the lawyers are just beginning to discuss it, though it’s not a settled matter): a factor is causal if it’s a necessary element of a sufficient set (NESS). (Actually, in the legal situation, there’s an additional complication needed for a good definition of responsible causes. If something it was “established/known” “after” a sufficient set already existed, it is not eligible to be a cause; and defining those terms in a way that gives satisfying answers is tricky. For instance, a fire set after my house burned down cannot be a cause, even if it is a NESS. But discussing this issue is out of scope here, because in the case of voting methods, only ballots matter, and all ballots are considered to be cast simultaneously.)

NESS answers the question of who shares responsibility, but it doesn’t answer that of how much responsibility they have. For instance, imagine that a group of people made stone soup with 1 stone, 1 pot, water, fire, and ingredients; and that in order to be a soup, it needed a pot, water, and at least 3 ingredients. NESS tells us that the person who brought the stone was not responsible for the soup, and that everyone else was; but how do we divide responsibility among the others? Simple symmetry shows that each ingredient gets the same responsibility, but there are many facially-valid ways you could set relative responsibility weights between an ingredient and the pot.

There’s also another flaw with NESS: similar outcomes. Consider a trolley problem: you just pulled the lever, saving 5 people by diverting the trolley to crush 1. By NESS, you are now a murderer; a necessary element of the set of actors leading to that one guy’s death. But, unlike the fools who let the trolley run away in the first place, you are not in a NESS for the event “somebody dies here today”. In proportional voting terms, we will need to create such composite events — “at least one of this subset of candidates is elected” — because otherwise, frequently, a sufficient set will not be complete until there are so few voters outside the set that they couldn’t tip the balance between two similar candidates.

Pascal’s Other Wager? (The Problem of Points)

This is beginning to look a lot like another famous problem from the history of mathematics: the “Problem of Points”. This question was originally posed by Luca Pacioli (most famed for inventing double-entry bookkeeping): say that two people are gambling, and they have agreed that a pot of money will go to the first player to win a total of N games. But for some reason they are interrupted after neither player has won; say that one of them has won P and the other Q, with P>Q. How should the pot be divided fairly? Clearly, player P should get more, because they were closer to winning; but how much more?

Like the retroactive allocation of responsibility that we’re considering, this is a case of allocating a fixed resource between various agents in a well-defined situation (one without any “unknown unkowns”). And similar to our case, solving it is deceptively hard. Several smart mathematicians of the time—such as Tartaglia and Cardano, rival co-discoverers of the cubic formula—proposed “solutions”, but there was something unsatisfying about all of them. For instance, Pacioli suggested that the two players should split the pot in proportion to the number of games they’d won; but this would give the entire pot to somebody who’d won just 1 game to 0 out of a tournament to 100, a tiny lead in context.

The problem was definitively solved in letters between Fermat and Pascal. Fermat got the right answer, and Pascal refined the argument to support it; Pascal’s letter of August 24, 1654 is considered one of the founding documents of a branch of math. I’ll tell that answer in a moment, but in case you don’t already know it, I’ll give you a moment to pause and see how you’d solve the problem.

....

The answer is: the pot should be split in proportion to the probability that each player will win it.

This answer may seem almost obvious and trivial to us, but to Fermat and Pascal, it required a leap of insight. This was arguably the first time people used math (probability) to reason about events in an explicitly counterfactual future, as opposed to simply generalizing about future events that closely resemble oft-repeated past ones. This key idea — that in order to better describe the universe as it is, we must sometimes rigorously analyze states that are not and will never be — is the foundation of risk analysis, which is itself key to modern society. (Here’s a video which tells this story, and makes this argument for its importance, at length.)

In that case, Pascal and Fermat were considering how to “justly” allocate a fixed resource: a pot of money. The problem had been posed without a clear definition of what “justly” would mean, but the solution they came up with was so mathematically beautiful that once you’ve understood it, it’s hard to imagine any other definition of justice in that case.

It’s clear that this is an interesting story, but why do I spend so much time on it, when the problem at hand is allocating retrospective responsibility, not prospective odds? Am I about to argue that odds and risk analysis are key to answering this problem, too?

No! In fact, quite the opposite. I think this example is relevant in part because the very arguments that pushed Pascal and Fermat to think probabilistically, should push us to avoid basing our answer primarily in probabilities. A key argument they used against earlier proposed “solutions” such as Pacioli’s was that Pacioli’s answer would always give the same share for given values of P and Q, independent of N; but clearly a player who leads by 7 games to 4 is in a much better position if the threshold to win the pot is 8 games than if the threshold is 100 games. However, in order to allocate responsibility retroactively, we already saw using symmetry arguments that in the case of plurality, responsibility is inversely proportional only to winning vote total, independent of the threshold it would have taken to win.

Interlude: Am I barking up the wrong tree? Is it even clear to you what tree I’m barking up?

So far in this essay, I’ve been explaining things I believe I understand pretty well. That is, most of what I’ve said so far, I think that if you asked me to, I could explain at greater length and with more rigor. This is about to change. This section, in particular, deals with ideas that I am not yet done chewing over. So I’ll make some claims here that I can’t entirely back up with concrete arguments.

But that doesn’t mean this is pure speculation, either. I’ve spent literally decades of my life thinking about this stuff, and that pursuit has included getting a doctorate in statistics. So if you, reading this, think you see a fatal flaw in my ideas, you may well be right; but if you think that the flaw is obvious and that I haven’t at least considered it, you’re very probably wrong. My counter-counter-argument may be handwavy and ultimately incorrect, but it at least exists.

If you look at previous attempts to solve retroactive responsibility problems, most rely implicitly or explicitly on the kind of probability-based arguments that you use for proactive problems like the problem of points. For instance, voter power indices might be based on the probability that a given (set of) actor(s) is pivotal, given some distribution of actions for the other actors. I believe that this logic is ultimately a dead end. Once you have a retroactive scenario, where actual voters each cast an actual ballot, reasoning that’s based on “what was the probability that those people would have acted differently” will inevitably be sensitive to assumptions in a way that undermines it. Furthermore, this problem is exponentially compounded by the need to consider more than two possible outcomes, and the issues that result from that — that is, Condorcet cycles, Arrow’s theorem, the Gibbard-Satterthwaite theorem, etc.

But I didn’t tell the story of the Problem of Points just to say what the answer to retroactively allocating responsibility isn’t. It was also a good example of a problem specified in loose practical terms (who “deserves” what share of the pot), where just translating into rigorous math terms turned out to be more than half the work of finding a solution that pretty much anyone can agree is “correct”. So, if the goal is to solve a similar loosely-defined problem, our first step should be to restate why we want a solution and what characteristics we know that solution should have.

We’re trying to estimate the expected mean squared error (squared bias plus variance) of a voting method, by looking at an outcome of that voting method. All we necessarily have is the ballots and the winners; we imagine that came from some kind of decision process that might be represented as voters and candidates located in an ideological space where voters tend to prefer candidates who are closer, but we don’t have enough information to reliably place voters and candidates in that space. So we model the voting system as having had two steps: grouping voters into (possibly-overlapping) weighted constituencies, and choosing a winner in each constituency. Assigning responsibility means giving each voter a weight in each constituency — how much power they exercised to help elect that winner. If we do a good job assigning responsibility, then it should be easy to find counterfactual outcomes where changing responsible votes changed the winner, but harder to find counterfactuals where changing non-responsible votes did so.

When I put it like that, it may sound as if I’m solving the wrong problem. I’m trying to find an “allocated responsibility” measure as a one-dimension-per-candidate approximation to a more-complex causal structure of the actual voting method; so that I can then combine that with some approximation of how candidates are selected within the constituencies; so I can use that to approximate the expected MSE between the voters’ (exemplariness-weighted) ideological distribution and that of the legislators; so I can use that to approximate the utility lost between those two things; when, to start out with, the voters’ ideological distribution was itself an approximation to the utility-maximizing outcome, justified by the Condorcet Jury Theorem. There’s already 5 different levels/types of approximation in there, and I’m finished yet; surely, there must be a better way?

I believe there probably isn’t. That is to say: I believe that each of the approximation steps so far is plausibly the lowest-error way to accomplish its task. Specifically, all of them are “neutral” in terms of ideological bias in a way that alternate approximation schemes often are not; and all of them are, plausibly, among the lowest-variance possible approximation strategies that have such neutrality. Just as in Churchill’s famous dictum that “democracy [that is, the last 2-3 approximation steps in that chain] is the worst form of Government except for all those other forms that have been tried from time to time”, I believe that responsibility-allocation may be the worst way to evaluate multiwinner voting methods except for all the other forms that have been tried (or have occurred to me over my years of thinking about this) from time to time.

Furthermore, while this concept of numerically allocating responsibility is used in this case as one step in a larger strategy of evaluating voting methods, I think that numerically-allocated responsibility could be useful in other contexts. I’ve already mentioned that it relates to legal liability. I also think it could be useful to individual voters in considering voting tactics or even the decision of whether it’s worth voting in the first place. I’ll say a bit more about that latter issue later.

Tentative answer

Thanks to Thomas Sepulchre’s comment below, I have a solution to this problem that I consider “correct”. I’ll copy his explanation here:

I think there’s actually one way to set relative responsibility weight which makes more mathematical sense than the others. But, first, let’s slightly change the problem, and assume that the members of the group arrived one by one, that we can order the members by time of arrival. If this is the case, I’d argue that the complete responsibility for the soup goes to the one who brought the last necessary element.

Now, back to the main problem, where the members aren’t ordered. We can set the responsibility weight of an element to be the number of permutations in which this particular element is the last necessary element, divided by the total number of permutations.

This method has several qualities : the sum of responsibilities is exactly one, each useful element (each Necessary Element of some Sufficient Set) has a positive responsibility weight, while each useless element has 0 responsibility weight. It also respects the symmetries of the problem (in our example, the responsibility of the pot, the fire and the water is the same, and the responsibility of each ingredient is the same)

In a subtle way, it also takes into account the scarcity of each ressource. For example, let’s compare the situation [1 pot, 1 fire, 1 water, 3 ingredients] with the situation [2 pots, 1 fire, 1 water, 3 ingredients]. In the first one, in any order, the responsibility goes to the last element, therefore the final responsibility weight is ¹⁄₇ for each element. The second situation is a bit trickier, we must consider two cases. First case, the last element is a pot (1/4 of the time). In this case, the responsibility goes to the seventh element, which gives ¹⁄₇ responsibility weight to everything but the pots, and ¹⁄₁₄ responsibility weight to each pot. Second case, the last element is not a pot (3/4 of the time), in which case the responsibility goes to the last element, which gives ¹⁄₆ responsibility weight to everything but the pots, and 0 to each pot. In total, the responsibilities are ¹⁄₅₆ for each pot, and ⁹⁄₅₆ for each other element. We see that the responsibility has been divided by 8 for each pot, basically because the pot is no longer a scarce ressource.

Anyway, my point is, this method seems to be the good way to generalize on the idea of NESS : instead of just checking whether an element is a Necessary Element of some Sufficient Set, one must count how many times this element is the Necessary Element of some permutation.

I believe that TS probably independently “invented” this idea (as a competitive programmer, I expect he’d have seen lots of problems and have a good toolbox of solutions and strategies for re-adapting those solutions to new problems). But this solution is actually a version of the classic (1954) Shapley-Shubik voter power index.

((Technical note, feel free to skip: the key difference is that originally S-S was made for measuring prospective voter power in single-winner elections; in the retroactive responsibility, multi-winner case, we only consider counterfactuals in which ballots support a given candidate “less” than they did in reality. This involves defining this concept of “less support”; ideally this would be done in such a way that the voting method was monotonic over ballots, but in cases of complex voting methods, it may be worthwhile to accept some non-monotonicity in order to get stricter and more-useful definitions of “less” support. ))

Interestingly, the ordinary Shapley-Shubik index can be interpreted as a Bayesian probability, by using the right prior (Straffin, P. D. The Shapley-Shubik and Banzhaf power indices as probabilities. in The Shapley value: essays in honor of Lloyd S. Shapley). From an individual voter’s perspective, what Straffin calls the “homogeneity assumption” can be stated as a Polya(1,1) prior; in practical terms, this could mean that the voter expects that their own decision is acausally related to all other voters’ decisions, and is using some acausal decision theory. (For more discussion and links on voting and acausal decision theories, see here.)

((The Polya parameters 1,1 there mean a relatively high correlation between the ego voters’ decision and that of other voters, though it is slightly weaker than with the Jeffreys prior of Polya(.5, .5). A strictly functional-decision-theory agent would probably expect a weaker correlation with other voters, as they would only consider correlations that came from other voters using very similar decision procedures. That would probably lead to priors more like Polya(total_polled_yesses_across_reputable_polls, total_polled_nos_across_reputable_polls), which would assign much lower power to most voters — more similar to the Banzhaf power index. Personally, I find Polya(1,1) to be a reasonable prior; aside from having an easy-to-calculate algorithm as given by Sepulchre, it feels “about right” for my personal decision theory.))

For the modified, retroactive Shapley-Shubik index we’re using here, I’m not entirely sure what prior it’s equivalent to, or even if there is such a prior. However, intuitively, it seems to me that it’s likely to be something like “you’re correlated to all the voters who are at least as supportive as you are of any candidate you help elect, using a Polya(1,1) prior”. This idea — that for purposes of decision theory, you should give more consideration to your acausal links to other voters if those other voters hold similar underlying preferences to you — is intuitively appealing to me.

Shorter “solution” statement

Say you have a full set of ballots ℬ and an outcome O. For each individual ballot b_i in ℬ, and each set of candidates 𝒸 containing exactly one winner c_j for O, take ℒ(b_i, 𝒸) to be the set of possible legal ballots which support all candidates in 𝒸 “less” than or equal to what ballot b_i does. Take k_1...k_V to be a random permutation of the integers 1...V; this is the only randomness here so all expectations are with respect to it. Take K(i) to be that integer n such that k_n = i. E(...) denotes expectation and I(...) denotes an indicator function of an event; that is, 1 if that event is true and 0 if it is false.

Voter i’s “retrospective power” (RP) to elect a member of the set 𝒸 is:

(V/S)*E( I(some candidate in 𝒸 wins for all elections such that b_{k_1} to b_{k_{K(i)}} are held constant, and b_{k_{K(i)+1}} to b_{k_V} are allowed to be replaced by ℒ(..., 𝒸) ) - I(some candidate in 𝒸 wins for all elections such that b_{k_1} to b_{k_{K(i)-1}} are held constant, and b_{k_{K(i)}} to b_{k_V} are allowed to be replaced by ℒ(..., 𝒸) ).

(To do: define a bit more notation up front so I can write that in clean LaTeX.)

When calculating the RP to elect an individual candidate c_j, use the power to elect a set, choosing 𝒸 as follows: of all the sets possible sets 𝒸 which include c_j, choose the one with the smallest measure among the ones which assign RP to the fewest voters.

The constant V/S (number of voters divided by number of seats) is added up front so that the average RP for each voter will, by construction, always be 1. Since the average is constant, minimizing the standard deviation will be the same as minimizing the mean of squares.

However, this mean of squares, while it is naturally relevant as an estimator of the utilitarian loss, is hard to explain as-is; for instance, its natural units are completely abstract. Thus, in the overall “Average Voter Effective Choice” metric below, I’ll find a (monotone) transformation from this into a more-interpretable version.

Equalizing voters’ retrospective power (RP)

Now that we have a defninition of retrospective voter power, we can use it as an ingredient in a metric of multi-winner voting methods. That is, all else equal, we might want to avoid the following:

Unequal voter power (high standard deviation of RP across voters), because this would lead to “loss” directly.
Too much overlap between the sets of voters responsible for electing different candidates, because this would tend to be indirectly associated with lower fidelity of representation. To visualize this in geographic terms, two representatives from a large district would tend to live farther from their voters than one representative from a district half the size.

I’m going to focus for now on just the first of these two. We’ll deal with “choice” (which allows voters to ensure their reps are “close” to them ideologically) separately. In particular, it’s possible that “cake-cutting-like” algorithms could simultaneously improve exemplariness, have little cost in terms of choice, and cause high overlap in measured voter power; so directly avoiding overlap is not necessarily a good thing.

Phragmén and Thiele

In fact, there is already a proportional voting method that is designed to ensure voters have roughly equal power, and that defines that power in a way that’s entirely compatible with, though less generally-applicable than, our definition above. And in fact, it was first considered in the 1890s by two of the earliest theorists of proportional voting, Scandinavian mathematicians Edvard Phragmén and Thorvald Thiele. However, since neither of them ultimately settled on this method as their favorite, it fell into obscurity; occasionally reconsidered or reinvented (by Saint-Laguë in 1910, Ebert in 2003, and Mora in 2016).

The method was invented in a Swedish context of weak parties and approval-style ballots. In the case of such approval ballots, there’s an obvious way to calculate each voter’s responsibility for electing each winner: 1/p, where p is the proportion of ballots that approved that winner. Thus, one of the methods Phragmén considered was to explicitly try to minimize the variance/standard deviation of voters’ total power. Since the sum of total power is constant, this is equivalent to minimizing the sum of each voters’ squared power; known as the “squared load”.

Note that this is an optimization problem, and a naive algorithm for finding the optimum might have to check a combinatorial explosion of possible winner sets. Thus, in practice, the method would probably done sequentially: at each step, elect one new candidate, minimizing the sum of squared loads.

I mention this method not because I think it’s better than other proportional methods, but because it shows that the idea of “minimizing variance in voter power” goes all the way back to the origins of proportional voting methods. Overall, I think that approval-ballot-based proportional methods tend to lead to too much “overlap” and thus to poor average voter choice, unless voter strategy is implausibly precise. Also, as Phragmén himself pointed out, it often isn’t “optimal” using this rule to pick a candidate who is approved by literally all the voters, as it leads to apparent “imbalance” in voter power.

Measuring “Voter Choice”, separate from power

I’ve already mentioned the idea of “choice” several times, in connection with voter power overlap.

Imagine two raw pizzas, where all the curls of shredded cheese are voting on which 3 pepperonis should represent them. On one pizza, the voting method gives each slice of ¹⁄₃ the power to elect one pepperoni from around the center of the slice. The cheese curls all have equal voting power, and the average distance between a cheese curl and its representative pepperoni is about as low as it can possibly be.

Now, on the other pizza, one of the ¹⁄₃ slices gets to elect its pepperoni, as before. But the other two slices combine to elect two pepperonis, both close to their combined center of gravity. Since those two pepperonis are both farther from the center of the individual slices, the average distance between a cheese curl and its representative pepperoni is higher. Clearly, this is a worse outcome, even though the RP is still perfectly even.

Clearly, what I’m driving at is that a voter’s degree of choice is somehow related to the average “distance” between a voter and their representative. But just using average distance would mean that the quality of elections would vary between a small pizza and a large one, even if the outcome was exactly homologous. So instead, we’ll define the voter choice for one voter i using a normalized/standardized quantity: the portion rho(v_i, r_j) of other voters who are farther from i’s representative than i is.

That’s all very well and good for artificial examples like the pizza, where we can somehow know exactly where each voter and candidate is located in ideology space. In the real world, though, that’s impossible, for numerous reasons. So we’ll have to use the ballots to infer distances, or at least, to infer the expected/average distance.

In order to make such inferences, we will have to make assumptions about the voter distribution. Perhaps the most fundamental of those assumptions, the one which will inform all the others, is our belief about the effective dimensionality of the voter space. In this case, I believe that it’s best to assume one-dimensionality; this is not necessarily realistic, but unlike any other dimensionality, it leads to reasonably-obvious and easy-to-visualize ways to make the rest of the assumptions. Also, it tends to be “friendly” to voting methods, giving a rough “best-case” analysis; I think this is fine.

Note that we’ve defined a voter’s choice as the percent of voters that are farther from the candidate, not closer, so that it’s 1 at best and 0 at worst.

Adding a digression: The definition we’ve given, in terms of “proportion of voters who are farther from the candidate”, can be thought of as candidate-centric; that is, assuming affinity between candidates and voters is symmetric, it prioritizes situations where every candidate gets “their favorite voters”. But that is not precisely what we’d want, in an ideal world. For instance, in a one-dimensional ideological space with voters distributed uniformly from 0 to 100, it does not distinguish between a two-winner election where the winners are at 0 and 100, from one where they’re at 25 and 75. Note that in this example, the problem arises when the candidates align with the most-extreme fraction of voters. With more winners, the number of winners in this potentially-extreme position would be lower.

There are two obvious ways we might try to address this, but either one has issues, so by default we’ll use neither.

The first way would be to make some rough “parametric” dimensional assumptions to back out from “number of voters that are farther from the given candidate than the given voter is” to “distance between the given candidate and the given voter”. For instance, if ideology space is 2-dimensional, we might assume the distance is (1-rho(v_i,r_j))^(1/2), because the number of voters who can fit in the space closer to the candidate is proportional to the square of the distance. In general, then, for d dimensions, and subtracting from 1 so that 100% is good, this would be 1-(1-rho(v_i,r_j))^(1/d). We could also use d=0.5, not to represent “half of a dimension”, but merely to get a kind of squared-distance metric that best favors finding a representative at the average ideology for their constituents. Still, all of this relies on some strong assumptions, and so we’ll use d=1 unless otherwise stated.

The second way would be to try to get a voter-centric measure; instead of “proportion of voters who are farther from the given candidate”, to measure “proportion of candidates who are farther from the given voter”. The problem with this is, it depends on the distribution of candidates. This makes it completely unworkable for real-world elections; but for in-silicio monte carlo experiments, we can simply create a candidate for each voter, ensure that all those candidates are ideal in terms of any non-ideological dimensions that all voters agree on, and then count. Note that while this counting itself will not be affected by the dimensionality of the ideological space, in order to do it, we will have had to create an entire in silicio election, with its own ideology space of some arbitrary dimension. Also note that higher-dimensional ideological spaces will require exponentially more seats in the legislature to get the effective choice above a given fixed threshold, even using an ideal voting method; this is not true for the candidate-centric view, as voronoi diagrams are possible in any dimensionality of space.

So, define VC_d(v_i, r_j) = 1-rho(v_i, r_j)^(1/d).

For example, say we had an IRV (aka single-winner ranked choice voting) election with 4 candidates and the following votes:

30 ballots: A>B>C>D

10 ballots: B>A>C>D

5 ballots: B>C>A>D

5 ballots: B>C>D>A

9 ballots: C>B>A>D

11 ballots: D>C>A>B

30 ballots: D>C>B>A

The outcome is that A wins, after C and B are eliminated. To find the average VC_1 of each group, we’d begin by lining up the groups from “closest to A” to “farthest from A”. Using the simple assumption that voters are closer to A the higher preference they give, and on average the same distance when the preference level is the same, gives the following order:

A>B>C>D (30 ballots total; so the group has an average VC_1 of 85%)

B>A>C>D (10 total, so group average VC_1 of 65%

B>C>A>D, C>B>A>D, and D>C>A>B (25 total, so cross-group average VC_1 of 47.5%)

B>C>D>A and D>C>B>A (35 total, average VC_1 of 17.5%)

Average Voter Effective Choice: a combined proportionality metric

We now have all the ingredients necessary for making an overall metric for multi-winner outcomes, which I’m dubbing “Average Voter Effective Choice” (AVEC).

To begin with, note that “standard deviation of RP” is monotonically related to “average squared load”, which in turn could be viewed as “weighted average lobbying power”. That is to say, average voting power, where the average is itself weighted by voting power.

Imagine a representative who was unsure how to vote on some question, so she decided to “ask a constituent”. She draws a voter at random, with probability proportional to that voter’s retrospective power to elect her, then listens to them. Furthermore, say that the legislator as a whole let a randomly-chosen representative have dictatorial say over this issue. So ultimately, one voter will be deciding, and the question is: what is the expected total retrospective voting power of that voter?

For instance, if half the voters split the voting power evenly, then the “average squared load”/”weighted average lobbying power” is 2²/2=2. The people with power, have twice as much power on average as if everybody were equal. And in this simple case, the reciprocal of that is the number of wasted votes.

What I’m saying is: we can think of the reciprocal of the average squared load as being the “Average Voter Effectiveness” (AVE). If AVE is, say, 60%, that about the same amount of overall inequality as if 40% of votes had been wasted.

We can then combine this with another form of vote wastage: for those votes that did have an impact, how much choice did they really have? How of the ideology space were they able to effectively screen out? This is VC. To avoid double-counting wasted votes, we can take the average VC, weighted by total voting power.

Thus, the overall metric is Average Voter Effective Choice (AVEC), which is the product from multiplying Average Voter Effectiveness (AVE) times Average Voter Choice (AVC).

For instance, in a single-winner plurality election won by a bare majority, half the voters split the voting power equally, so average voter power is just over 50%; and those whose votes do matter, on average like the winner ideologically more than 75% of the other voters, so that’s the average voter choice. Multiplying, we get 37.5% for the overall average vote effectiveness.

Where to, next?

So far, I’ve spent about 10,000 words to (mostly) define a feasible metric for evaluating the quality of voting methods. Next, I’m going to look at some actual multi-winner voting methods and try to estimate how they’d do by that metric. Note that this will involve some rough-and-ready approximations, but I believe it will at least be possible to class methods into at least two quality levels (with gerrymandered first-past-the-post far behind most alternatives). I also hope it will be possible to give numbers that undermine some specific bad arguments that people use in this area.

AVEC in single-winner plurality

To warm up, let’s calculate AVEC for some simple scenarios under single-winner plurality.

First off, the simplest possible case: a 2-candidate election. The winner will have X>.5 (that is, over 50%) of the votes. Retrospective voting power will be of course be 0 for the losing votes. For each of the winning ones it will be 1/X; that is, between 1 (for a unanimous outcome) and just under 2 (for the barest majority). That means average voter power is between .5 and 1. Average voter choice for represented voters will be between 50% (unanimous) and 75% (bare majority). All told, average voting effectiveness will be between 37.5% (bare majority) and 50% (unanimous).

Now, consider a 3-way race in which no candidate has a majority, and the margin between 2nd and 3rd place is more than twice the difference between 1st place and 50%. For an example, you could imagine a simplified version of the US 2000 presidential election, with national popular vote and Gore 49%, Bush 48%, and Nader 3% (thus, Gore wins). Imagine all the votes start out “face down”, and we turn them “face up” one at a time in a random order. As we do so, set variables G/B/N to be the face-up Gore/Bush/Nader votes as a percent of the total, and D to be the face-down votes. Gore is guaranteed to win iff G>B+D. Turning over a Gore vote increases the left hand side of that inequality by 1 (LHS+1) and similarly RHS-1. Turning over a Nader vote is LHS= and RHS-1. Turning over a Bush vote is LHS= and RHS=.

So, obviously, the Bush voters have no retrospective power to elect Gore. I still have to do the math more carefully, but I’m pretty sure that as the total number of voters grows, the retrospective power of the Nader voters converges to precisely one half that of the Gore voters, just as their power to move the difference betwen the LHS and RHS totals is half as big. (I’ve worked out one simple example fully but I still don’t have an inductive proof of this). Thus a Gore voter would have retrospective power 1/.505, and a Nader voter, ¹⁄₁.01. This leads to an AVE of 51.26%. (1/(.49 * 1/.505^2 + .03 * ¹⁄₁.01^2)).

The effective choice depends on how often a Bush voter is closer to Gore than a Gore voter is. In the “median voter theorem” case, where Gore and Bush were both ideologically close to the threshold between them, the average voter choice for Gore voters would be .53 and .06 for Nader voters. If Gore were ideologically like his median supporter, it would be .755 and .48. And if Nader voters were all “left wing” and Gore was left of his median supporter, it would be .765 and .505. Using the second (middle) assumptions, the overall AVEC would be 38.49% (1/(.49 * 1/.505^2 + .03 * ¹⁄₁.01^2)) * (.49*1/.505^2*.755 + .03 * ¹⁄₁.01^2 * .48)/(.49 * 1/.505^2 + .03 * ¹⁄₁.01^2).

None of this is at all surprising, but it’s comforting to be able to put it on a rigorous footing.

And some of the conclusions even for single-winner plurality might be a bit more surprising than that. For instance, in a 3-way election with totals 48%, 26%, 26%, voters for both of the losing candidates count as having had retrospective power to elect the winner; ¹⁄₂.44, or ¹⁄₄ as much as the winning voters, to be exact. (When the losers are nearly but not precisely tied, the power of those voters depends on the overall size of the electorate in a way that makes calculating exact powers tedious. The larger losing group will range somewhere between ¹⁄₄ and ¹⁄₂ the power per voter as the winning group.)

Real-world multi-winner voting methods (under construction)

I’m going to come back to all the ideas above. But it’s finally time to discuss some specific multi-winner voting methods.

Single-seat plurality (aka separate districts, as in US House, or parliament in Canada/UK/India):

This is not a proportional method, but we can still approximate its AVE.

Let’s say that the two-way vote share by district is drawn from a beta(3,3); this is a rough by-eye approximation to a few years of recent US data. That means that the average winning two-way margin is about 30 points, with a standard deviation of 10 points; that is, most districts are safe, with winning vote totals of around 65% or more. That would mean that an average district has 65% of voters with 1/.65=1.54 retrospective power, and 35% wasted votes with 0 voting power. At best, those 65% have an average 67.5% effective choice, distributed uniformly from 100% down to 35%. This multiplies out to an overall AVEC of 44%. (Note that it turns out that all the main steps in this example are linear, so the AVEC of the average precinct is the same as the average AVEC of all precincts).

Single transferrable vote (STV):

Each voter ranks the candidates in preference order. We define a quota, the number of votes you must beat to win; this may be either the Hare quota (tends to break ties in favor of smaller parties, because larger parties use up more votes on their early seats), or the Droop quota (opposite of above). Votes are always tallied for their top non-eliminated choice.

If a candidate has at least one quota of votes, they are elected (and thus “eliminated” from further consideration), and 1 quota worth of their tallied votes are “exhausted” (removed from all later tallies). This has the effect of transferring their excess votes.

If no candidate has a quota, and there are still seats left to fill, then the candidate with the lowest tally is eliminated (and so their votes are transferred).

To understand how the components of average vote effectiveness would work with STV, let’s take a simple scenario, with S=9, and using the Droop quota (10% in this case) as is most customary. Say the candidates are the capital letters, and we have the following ballots:

55%: A>B>C>D>E>F>G>H>.… (alphabetical)

36%: Z>Y>X>W>V>U>… (reverse alphabetical)

9%: E>T>A>O>I>N>.… (in order of the Linotype keyboard, based on letter frequency in English)

STV elects A and Z immediately, since both have more than a quota. Excess votes spill over to B and Y, also elected; then C and X, also elected; then W and D, of whom only D has a full quota; then E; and finally F. At this point, ABCDE and ZYX have been elected (5 seats from the beginning of the alphabet and 3 from the end), and F, W, and T have 5%, 6%, and 9%, respectively. At this point, all other letters are eliminated because their tallies are 0. Then F is eliminated, and those votes transfer to the next surviving candidate in alphabetical order, T, who then has a quota and gets the last seat. So the final winner set is ABCDETXYZ.

For candidate A, any 10% from the alphabetical voters would be a sufficient set, so those voters all had 1/(9*.55) power to elect. For candidate B, any 20% of that group would be a sufficent set, so those voters had the same 1/(9*.55) power to elect. Similarly for candidates C and D (with 30% and 40% being the SS magnitude). So far, those voters have ⁴⁄₉ * 1/.55 RP.

For candidate E, any set of 50% from the combined alphabetical and frequentist voters would be sufficient, so those voters each get 1/(9 * .64).

For candidate T, characterizing sufficient sets is a bit more complex. To elect T alone, a sufficient set would be simultaneously 60% from the combined alphabetical and frequentist voters, and 40% from the combined anti-alphabetical and frequentist voters; for instance, 52% alpha, 32% anti-alpha, and 8% freq.

But those sets are at least 91% in total magnitude. Can we get a smaller number of sufficient voters by expanding the set of “similar” candidates? Yes! To elect some member of the set {F...U} (that is, to ensure W doesn’t win the last seat), it suffices to have one quota (10%) of votes from the alphabetical or frequentist camps remaining after 5 alphabetical candidates have been elected. That means a total of 60% from that 64% of voters. Similarly, to elect some member of {G...V}, it would suffice to have 40% from the 45% combined anti-alphas and freqs. Since those are the smallest “sufficient” voter sets for a candidate set including T, we use that for our power calculation.

Assigning RP to the anti-alphabetical voters for ZYX is straightforward as for ABCD; those voters get a total of ³⁄₉ * 1/.36.

This leaves the following RPs:

55% Alphabetical: 4/9*1/.55 + ¹⁄₉ * 1/.64=0.98

36% Anti-alphabetical: 3/9*1/.36 + ¹⁄₉ * 1/.45=1.17

9% Frequentist: 1/9*1/.64 + ¹⁄₉ * 1/.45=0.42

I think that, given these votes, this method, and this outcome, this assignment of voter power is not quite ideal; I’d prefer an algorithm which gives the frequentists more credit for getting their second-choice T than it gives to the anti-alphabeticals for ensuring F loses. I can even imagine how to tweak the RP definition to improve that by explicitly reducing overlap in assigned voter power whenever possible. But I think this is close enough; the extra complexity of such tweaks (both in terms of definition, and, more crucially, in terms of computability) would not be worth it.

So this is 84.8% AVE (1/(.55*.98^2 + .36*1.17^2 + .9*0.42^2)).

Including the average voter choice, we get 1/(.55*.98^2 + .36*1.17^2 + .9*0.42^2) * (.55*.98^2*(1-.55/2) + .36*1.17^2*(1-.36/2) + .9*0.42^2*(1-.09/2))/(.55*.98^2 + .36*1.17^2 + .9*0.42^2) = 67.4% AVEC overall. (The formula I gave there took some shortcuts, neglecting to separately count the EC for the small slice of voting power that the anti-alphabeticals had to elect T, but this will not materially change the result.) This is already substantially better than the 44% we saw for single-seat districts. And note that, for simplicity, we used a scenario with large blocks of party-line voters; if, instead, the voters had a more-realistic variety in their first-choice preferences while still grouping into the same three overall blocs, AVEC might be as high as 80.5%.

Closed list proportional (a la Israel)

In this voting method, voters just choose a party. Each party which gets more than some minimum threshold of votes is assigned seats proportional to their votes. To decide who gets those seats, each party has pre-registered an ordered list, so if they get four seats, those go to the first four people on their list.

Let’s use the April 2019 Israeli legislative election as an example. I chose this one because it shows the effect of the threshold more than the two more-recent “rematch” elections; so the estimate of vote wastage that I get will be higher than it could have been. Here’s a spreadsheet calculating Average Voting Effectiveness (lower right cell) for all three of those elections.

The results are:

voting power varied from 1.15 for the party UTJ, down to 0.91 for Meretz, due to a seat that barely got allocated to the former instead of the latter
8.5% of voters had zero voting power because they voted for below-threshold parties. (This number dropped to 2.9% then to 0.8% in the two “rematch” elections that followed; apparently, voters do not want to waste their votes.)
Thus, average voter effectiveness was 91% overall.
Average voter choice was 92%; the loss is primarily due to the closed-list nature of the method, so that voters can merely choose parties, not candidates within parties.
The overall average voter effective choice (AVEC) was thus 84%

Several notes:

I’m assuming that each party’s candidates are drawn from the ideological center of the party, so that voter choice is 1-(party size * .5). If a party’s candidates are drawn from the party’s edge in 1-dimensional space, that .5 could be 1; in higher-dimensional space, it could be higher than 1. I think that realistically, .5 is a reasonable multiplier here; it might be .6 or .7 in practice, but .5 is simple and easy-to-justify.
I’m neglecting any “dead-weight loss” in voter choice from unqualified candidates who only win by being party insiders, because I don’t have data to model it. In theory, this could be arbitrarily bad; in practice, I wouldn’t be shocked if this took as much as 20% off of average voter choice.
This whole methodology tends to reward a method with a high effective number of parties. I think the upsides we’re measuring here are real, but the methodology neglects the downsides of short-sighted hyper-factionalism. It might be possible to quantify this if we extended this metric to encompass the process of choosing a prime minister and cabinet.

Phragmen’s method:

Ebert/Phragmen method:

Warren Smith’s “phi voting”:

MMP:

DMP:

Sections I’d originally planned to write, but have now decided not to:

Real-world proportional voting reform (empty)

Gerrymandering and “Fair Maps” (empty)

Problems with proportionality: Israel, Italy, etc. (empty)

Israel: Too Many Parties (empty)

Italy: Unstable rules (empty)

Wales: strategic voting / clone parties (empty)

Poland: high (party) thresholds (empty)

Godwin violation: not actually relevant (empty)

Novel methods (empty)

Modified Bavarian MMP (empty)

PLACE (empty)

Broad Gove (empty)

Prospects for change in US (empty)

Medford, MA (empty)

How you can help (empty)

Canada, UK, India… (empty)

Post-Mortem on this article (incomplete)

… This is the one part I still haven’t finished.

… and I’ve decided it will happen in a separate document.

What links here?