Systematizing Epistemics: Principles for Resolving Forecasts

In a previous post, I discussed many methods for resolving predictions. I want to argue that there is a systematic distinction between rules and principles which I think is valuable.

In short, when making rules, one can front-load intentions by writing details upfront, or back-load work by stating high-level principles and having procedures to decide on details on an as-needed basis*. American accounting systems rely on the former, and international accounting systems (and most law systems) focus more on the latter. I think that the question shouldn’t be implicitly decided by front loading assumptions, which is often the current default. More than that, I think the balance should be better and more explicitly be addressed.

Reframing the Problem

Ozzie Gooen’s new organization, QURI (pronounced the same as “query”) is interested in what he’s started to call “systematizing epistemics,” and he offered an analogy that I found very insightful—accounting. Just like keeping track of money is possible without accounting, keeping track of reality is possible without any systematic approach to epistemics—but it’s harder to communicate or agree about money without standardized accounting systems that talk about the same things the same way.

In the aforementioned post, I discussed a variety of ways to resolve predictions. Here, I want to present a more systematic argument about how to think about prediction systems and resolutions. To make this point, I plan to take a detour into accounting—but don’t worry, the post really is about predictions. I want to lay out the analogy between systematizing epistemics and systematizing accounting (even) more in a different post, but for now I’ll jump to the key point for writing prediction questions and resolving predictions.

Accounting Principles versus Accounting Rules

In financial accounting, which is only half as boring as it sounds, there is a conceptual disagreement between Rule-based or Principle-based methods.

A rule based accounting system has (millions of) rules that get updated and adapted to deal with all of the new ways that clever accountants devise to lie with accounting. That is, a rule based system tries to cover every eventuality. This creates complexity, but still makes sure everything is legible to those who have the necessary expertise to decipher accounting statements. On the other hand, every time someone thinks of a clever new interpretation or hack, it is exploited until new rules, which corporations will lobby against, are developed. Worse, even if there are no loopholes, every time a new law is passed or new financial instrument is created, new loopholes suddenly appear.

A principles based system has essentially the same goals, but instead of trying to account for every scenario and clever trick, it has perhaps a dozen guidelines for what accounting is supposed to do. These are things like consistency, full disclosure, good faith and honesty, dividing entries across appropriate periods of time, and accurate representation of a company’s financial position. The extra flexibility probably makes it harder to stop slight deviations from the best way to do things, and makes comparing financial statements a bit harder, but it also is far easier to do correctly, without a million rules you might have accidentally broken, and also makes it easier to deal with companies finding new and clever ways to cheat. So rules need frequent updates.

In practice, all systems are a combination of the two, but the ideal of each system is different. In a fully principles based system, accountants have far more flexibility, but they will get in trouble if they aren’t doing what they are supposed to do, with the boundaries somewhat vague. If they switch from FIFO to LIFO accounting one year to make the profits look better, they clearly broke the rules. Same thing if they end this financial year on December 18th, so they don’t need to include the big loss they took on December 23rd. Those things aren’t allowed in a rule-based system either, but only because the rules explicitly list things like you need to use LIFO, and all financial years must end on a specific date. The costs of compliance for the rule based system, and the complexity of interpreting most financial statement, are probably higher. On the other hand, the risks of ending up in a gray area are also likely higher.

Where do we use principles, and where do we use rules?

There seem to be two reasons for rule-based systems; trust, and predictability. Trust, because rules are useful when we can’t or don’t want to trust the people making decisions. Predictability, because we can’t translate principles into certainty about outcome, or computer code. And trust, with the attendant flexibility, must exist somewhere in any system. The question is whether it is all front-loaded.

Trust

Who do we trust? If your financial system trusts accountants to be honest, you can give them general guidelines and set them loose to accurately reflect the real financial situation at a company. That allows flexibility to exist towards the end of the process. The fact that companies exert pressure on accountants means that there are pressures to cheat to raise stock prices or to pay less tax. Giving the accountants more freedom to make decisions, when we can’t trust them, is going to be a bad strategy. So the trust is pushed to the earliest stages of systematizing accounting, in the rule development stages. It can still be subverted—accountants developing standards also have conflicts of interest—but it makes failures systemic rather than individual (which has its own downsides).

Somewhat similarly, in a detour I won’t expand upon in detail, we can look at legal systems. Because the legal system trusts judges to some extent, it can give them more latitude. To the extent Congress trusts judges, it can leave laws ambiguous. And to the extent that they are incompetent at writing laws, the same is true. But intentional or inadvertent, this is back-loading the flexibility. The laws can be unclear, which makes people uncertain what is allowed, and it is up to judges to clarify them post-hoc.

Legislators have a different option for front-loading flexibility. That is, to the extent that they trust regulators, they can pass along responsibility for creating detailed rules.

Finally, to the extent that the rulers of a society trust the public, they can just articulate what they think would be nice, and let the public decide. Social norms often operate this way—they change and are not spelled out, and people need to learn them implicitly. And as should be clear for both norms and laws, ambiguity doesn’t work when the group is large and heterogeneous—predictability is limited when you don’t know, or trust, the other people in the group. This leads us to the next point.

Predictability

Beyond questions of trust, we have a question of predictability. If your financial system is principle-based, accounting software is tricky. Not every firm needs to do things the same way, and there will be an unlimited number of customizations needed to manage systems. Even worse is trying to automate any type of review or fraud detection.

Similarly, it is harder to make policing decisions without clear rules. Speed limits are clear rules, banning “unsafe” driving is not. Similarly, speed cameras are easy because a camera can check a single number. Maximum BAC is a clear rule, “impaired” driving is not. You can guess which of these are more often used by police who don’t want to be called on to defend their subjective judgements in court.

But a tradeoff mentioned earlier applies here in spades—explicit rules are fragile, and if they are supposed to conform to the intent, need to be updated more often. And frequent updates push against predictability, since the predictions need to account for the fact that the rules can change. And in fact, it can be worse—they can give a false illusion of predictability.

Principles versus Rules for Predictions

There are (at least) two parties involved in predictions; the predictors, and the readers. Predictors usually want clear rules and no ambiguity. The readers of the prediction—including the writers, the sponsors of prediction questions, the general public, and in the case of a futarchy, the system being controlled—want fidelity with intent, not strict adherence to the letter of the law.

There are often places where the spirit and the letter conflict. When that happens, the clash is unfortunate and unintended. For example, a question may intend to forecast the number of cases of COVID-19 which occurred in the first half of 2020, but end up forecasting the speed of creating and deploying tests. (Or the insanity of the FDA in stopping people from doing so, as the case may be.)

The death of the author approach to forecasts is great for predictors. In that scenario, we have a presumption that the spirit of a question is irrelevant once it’s written down. But for prediction markets to be useful, there should be a balance between principles and rules.

But as happened in accounting in the 1800s, most of the effort for forecasting resolution so far has gone into making rules that work, with the principles being implicit. That’s fine, but better understanding of the role of principles and rules would be valuable.

Predictions Cannot Live by Principles Alone

The past, of course, was akin to a purely principle based system, where we trust informal resolutions and evaluations. Pundits might predict something like “there will be increased Chinese aggression this year,” and grade themselves highly, but they do so no matter what occurs. Prediction markets operationalize this into a rule-based resolution; “there will be a fatality in the South China Sea before the end of the year in a confrontation between different countries,” and resolving that is straightforward, relying on nothing but an object level event. Prediction markets fix the problem.

So we have a thesis, punditry, and an antithesis, predictions. I claim that we are waiting for a synthesis. In my view, that synthesis is creating a clearer principle-based approach for creating, understanding, interpreting, and applying the rules.

The question is what a set of principles that guide the rules for writing and resolving questions, and guide the interpretation in cases of ambiguity, should look like. But more clarity about what these principles look like is needed.

Forecasting Principles—Why, and Which Ones?

In forecasting, the implicit use of principles in place of rules means that interpretation is harder, and predictions are worse.

I think there is broad agreement about many of the principles, but they haven’t been formulated. For example, when writing a prediction question, we care about minimizing ambiguity, having a concrete outcome, relating the question to the actual uncertainty or outcome, consistency with other predictions, and so on. When resolving a question, we care about things like fidelity to the intent and the language of the prediction.

Below, I want to lay out both some of the principles, and the best practices and implications for how they apply.

Some Plausible Principles for Forecasting

  1. Predictions should be resolved.

    1. This requires that they be resolvable.

    2. Both the prediction period and the resolution time should be specified.

    3. The resolution method should be known.

  2. Predictions should be clear

    1. Predictors should be able to represent their actual beliefs

    2. Predictions should be concrete when possible, rather than verbal.

  3. Scoring should be clear

    1. As simple as practical

    2. Known to forecasters

    3. Incentive-compatible

  4. Questions should attempt to be useful.

    1. Parallel other similar questions.

    2. Match language and criteria from other sources

    3. Have standard formats where possible

Further Thoughts on Applying the Principles

Below, I have expanded on the principles and written commentary. I will be happy to update this section with comments from readers, which will (by default) be attributed to your username.

Predictions are not punditry, and without resolution the incentives for accuracy, and the feedback needed to improve, are hampered. Most of the criteria here are technical, but there are trade-offs between resolution and other valuable principles.

Predictions should be resolved.

  • This requires that they be resolvable.

    • The resolution criteria should be well-specified.

    • If relevant or possible, the intent of the question, or guidance for how resolution will occur, should be clear.

      • Ambiguity should be avoided, but by default, when (inevitable) ambiguity arises, intent should guide the resolution. Any guidance about the motive or intent of the question can therefore be an asset. This is especially true when resolutions are based on expert opinion.

    • By default, forecasts should be assumed to be about object-level issues.

In cases where ambiguity arises, tortuously interpreting the text in unintended ways is unhelpful. (Example; it says ‘reported/​estimated’ not ‘estimated/​reported’, so we can infer that if a reported number is available, that should be used instead of an estimate.)

And there are sometimes questions where the technical criteria are not fulfilled, but for reasons unrelated to the intent of the question. (Source X is discontinued and lists an old value, but recommends using alternate source Y in the future—but the resolution says it will use source X.) In such cases, the goal of predicting object level reality, rather than predicting meta-level reporting, should be the dominant concern. This is both useful for question designers, and (per Principle 4,) helps ensure that those using the forecast of the question are getting useful information.

  • Both the prediction period and the resolution time should be specified.

    • These times may not be the same.

    • It can be better to leave questions open past when the resolution is known.

Knowing when a question will close is valuable for planning, and for understanding the scoring. Especially in situations where the final prediction is counted for a significant portion of the overall score, if a question closes seconds after an event occurs, there is a windfall gain for anyone who happens to be furiously checking the news and updating just at the right moment. (Side note: there are problems with closing predictions early.)

  • The resolution method should be known.

    • A system or individual should be chosen as the final decision-maker beforehand.

      • Planning beforehand reduces the burden of resolving questions. There is also a practical issue with timely resolution and the burden on the market.

    • Automated resolution and scoring is usually better than manual or subjective decisions.

Making choices or automating resolutions in advance can be easier than needing to revisit the questions. And unambiguous, automated criteria can minimize dispute—but unambiguous does not always match the object level issue to be tracked, so a tradeoff exists. “The best published estimate” is fairly unambiguous, but may require arbitration. Despite that, it may be a better and more robust criterion than “the number published by source X”—sometimes, the source is discontinued, or better options for the resolution are discovered or created.

Predictions should be clear

  • Predictors should be able to represent their actual beliefs

    • Specific values are better than choosing ranges, and specifying prediction intervals and probabilities are better than binary triggers.

    • Fidelity to the true claimed distributions is valuable, rather than, for example, using pre-specified distributions

      • This desiderata is often difficult to reconcile with clear scoring, since complexity in forecasts generally requires complexity in scoring.

  • Predictions should be concrete when possible, rather than verbal.

    • “Higher” is less clear than “above the current value as of [date], which is [Value].”

    • Punditry is both less resolvable, and less valuable, than clear predictions.

Scoring should be clear

  • As simple as practical

    • As noted above, there is a tradeoff between allowing more expressive predictions and simple scoring rules.

  • Known to forecasters

    • Even when resolution criteria are allowed to be ambiguous, scoring should not be.

  • Incentive-compatible

    • Gaming of the system should be minimized.

      • Forecasters should not need to spend time gaming the system to have correct incentives. (Side note: Incentive compatibility is tricky when people are not looking only to maximize their score, or when there are non-trivial costs to predicting.)

      • Also, between being known, and reducing gaming, there are some critical issues with actually building incentive-compatible systems, since incentives differ among forecasters in ways that may make incentive-compatibility incompatible with uniform scoring.

Questions should attempt to be useful.

  • Parallel other similar questions.

    • There is a tension between precision and uniformity.

      • For purposes of this principle, in a US presidential election, “who will win the presidential election” is more uniform with similar question than “who will hold the office of President on Jan 21st.” Similarly, “which candidate will win states with the largest number of electoral college votes” is potentially difficult to compare with “Which candidate will win the electoral college”—for example, if there is a brokered convention.

      • If the tradeoff is significant, it may be better to have a question explicitly about the difference.

  • Match language and criteria from other sources

    • Using identical language and criteria makes both aggregation and resolution easier.

      • As with survey questions, there is a significant amount of variation about how a question is asked that can have important implications for comparing and aggregating predictions. Consistency across platforms and between questions is valuable for promoting this. That means it can and should be promoted unless other overriding concerns exist.

    • Clearly highlight when this is not the case.

      • Unless the differences in the edge cases are particularly relevant, a standard format and phrasing should be preferred. And as above, it may be better to have a question explicitly about the difference.

  • Have standard formats where possible

    • Clever new input methods are harder for forecasters to understand and use

    • Complex phrasing makes mistakes easier.

      • “Will not X happen” versus “Will X happen.”

These are not a final word, but I think they may be a useful basis for continued discussion. And thanks go to Nuño Sempere and Ozzie Gooen for helpful suggestions and additions.

*) I am grateful to Ozzie Gooen for helping me frame this more clearly.