Forecasting Newsletter: November 2020

Highlights

Index

  • Highlights

  • In The News

  • Prediction Markets & Forecasting Platforms

  • United States Presidential Elections Post-mortems

  • Hard To Categorize

  • Long Content

Sign up here or browse past newsletters here.

In the News

DeepMind claims a major breakthrough in protein folding (press release, secondary source)

DeepMind has developed a piece of AI software called AlphaFold that can accurately predict the structure that proteins will fold into in a matter of days.

This computational work represents a stunning advance on the protein-folding problem, a 50-year-old grand challenge in biology. It has occurred decades before many people in the field would have predicted. It will be exciting to see the many ways in which it will fundamentally change biological research.

Figuring out what shapes proteins fold into is known as the “protein folding problem”, and has stood as a grand challenge in biology for the past 50 years. In a major scientific advance, the latest version of our AI system AlphaFold has been recognised as a solution to this grand challenge by the organisers of the biennial Critical Assessment of protein Structure Prediction (CASP). This breakthrough demonstrates the impact AI can have on scientific discovery and its potential to dramatically accelerate progress in some of the most fundamental fields that explain and shape our world.

In the results from the 14th CASP assessment, released today, our latest AlphaFold system achieves a median score of 92.4 GDT overall across all targets. This means that our predictions have an average error (RMSD) of approximately 1.6 Angstroms, which is comparable to the width of an atom (or 0.1 of a nanometer). Even for the very hardest protein targets, those in the most challenging free-modelling category, AlphaFold achieves a median score of 87.0 GDT.

Crucially, CASP chooses protein structures that have only very recently been experimentally determined (some were still awaiting determination at the time of the assessment) to be targets for teams to test their structure prediction methods against; they are not published in advance. Participants must blindly predict the structure of the proteins.

The Organization of the Petroleum Exporting Countries (OPEC) forecasts slower growth and slower growth in oil demand (primary source, secondary source.) In particular, it forecasts long-term growth for OECD countries — which I take to mean that growth because of covid recovery is not counted — to be below 1%. On the one hand, their methodology is opaque, but on the other hand, I expect them to actually be trying to forecast growth and oil demand, because it directly impacts the amount of barrels it is optimal for them to produce.

Google and Harvard’s Global Health Institute update their US covid model, and publish it on NeurIPS 2020 (press release), aiming to be robust, interpretable, extendable, and to have longer time horizons. They’re also using it to advertise various Google products. It has been extended to Japan.

Prediction Markets & Forecasting Platforms

Gnosis announces the GnosisDAO (announcement, secondary source), an organization governed by prediction markets (i.e., a futarchy): “The mission of GnosisDAO is to successfully steward the Gnosis ecosystem through futarchy: governance by prediction markets.”

Metaculus have a new report on forecasting covid vaccines, testing and economic impact (summary, full report). They also organized moderator elections and are hiring for a product manager.

Prediction markets have kept selling Trump not to be president in February at $0.85 to $0.9 ($0.9 as of now, where the contract resolves to $1 if Trump isn’t president in February.) Non-American readers might want to explore PolyMarket or FTX, American readers with some time on their hands might want to actually put some money into PredictIt. Otherwise, some members of the broader Effective Altruism and rationality communities made a fair amount of money betting on the election.

CSET recorded Using Crowd Forecasts to Inform Policy with Jason Matheny, CSET’s Founding Director, previously Director of IARPA. I particularly enjoyed the verbal history bits, the sheer expertise Jason Matheny radiated, and the comments on how the US government currently makes decisions.

Q: Has the CIA changed its approach to using numbers rather than words?
A: No, not really. They use some prediction markets, but most analytic products are still based on verbiage.

As a personal highlight, I was referred to as “top forecaster Sempere” towards the end of this piece by CSET. I’ve since then lost the top spot, and I’m back to holding the second place.

I also organized the Forecasting Innovation Prize (LessWrong link), which offers $1000 for research and projects on judgemental forecasting. For inspiration, see the project suggestions. Another post of mine, Predicting the Value of Small Altruistic Projects: A Proof of Concept Experiment might also be of interest to readers in the Effective Altruism community. In particular, I’m looking for volunteers to expand it.

Negative Examples

Release of Covid-19 second wave death forecasting ‘not in public interest’, claims Scottish Government

The Scottish Government has been accused of “absurd” decision making after officials blocked the release of forecasting analysis examining the potential number of deaths from a second wave of Covid-19.

Officials refused to release the information on the basis that it related to the formulation or development of government policy and was “not in the public interest” as it could lead to officials not giving “full and frank advice” to ministers.

The response also showed no forecasting analysis had been undertaken by the Scottish Government over the summer on the potential of a second wave of Covid-19 on various sectors.

United States Presidential Election Post-mortems

Thanks to the Metaculus Discord for suggestions for this section.

Independent postmortems

  • David Glidden’s (@dglid) comprehensive spreadsheet comparing 538, the Economist, Smarkets and PredictIt in terms of Brier scores for everything. tl;dr: Prediction Markets did better in closer states. (see here for the log score.)

  • Hindsight is 2020; a nuanced take.

  • 2020 Election: Prediction Markets versus Polling/​Modeling Assessment and Postmortem.

    “We find a market that treated day after day of good things for Biden and bad things for Trump, in a world in which Trump was already the underdog, as not relevant to the probability that Trump would win the election.”

    Markets overreacted during election night.

    [On methodology: ] You bet into the market, but the market also gets to bet into your fair values. That makes it a fair fight.” [Note: see here for a graph through time, and here for the orginal, though less readable source]

    ...polls are being evaluated, as I’ve emphasized throughout, against a polls plus humans hybrid. They are not being evaluated against people who don’t look at polls. That’s not a fair comparison.

  • Partisans, Sharps, And The Uninformed Quake US Election Market. tl;dr: “I find myself really torn between wanting people to be more rational and make better decisions. And then also, like, well, I want people to offer 8-1 on Trump being in office in February.”

Amerian Mainstream Media

Mostly unnuanced.

FiveThirtyEight.

Andrew Gelman.

As we’ve discussed elsewhere, we can’t be sure why the polls were off by so much, but our guess is a mix of differential nonresponse (Republicans being less likely than Democrats to answer, even after adjusting for demographics and previous vote) and differential turnout arising from on-the-ground voter registration and mobilization by Republicans (not matched by Democrats because of the coronavirus) and maybe Republicans being more motivated to go vote on election day in response to reports of 100 million early votes.

Hard to Categorize

Forbes on how to improve hurricane forecasting:

...to greatly improve the hurricane intensity forecast, we need to increase the subsurface ocean measurements by at least one order of magnitude...

One of the most ambitious efforts to gather subsurface data is Argo, an international program designed to build a global network of 4,000 free-floating sensors that gather information like temperature, salinity and current velocity in the upper 2,000 meters of the ocean.

Argo is managed by NOAA’s climate office that monitors ocean warming in response to climate change. This office has a fixed annual budget to accomplish the Argo mission. The additional cost of expanding Argo’s data collection by 10 times doesn’t necessarily help this office accomplish the Argo mission. However, it would greatly improve the accuracy of hurricane forecasts, which would benefit the NOAA’s weather office — a different part of NOAA. And the overall benefit of improving even one major hurricane forecast would be to save billions [in economic losses], easily offsetting the entire cost to expand the Argo mission.

In wake of bad salmon season, Russia calls for new forecasting approach:

In late October, Ilya Shestakov, head of the Russian Federal Agency for Fisheries, met with Russian scientists from the Russian Research Institute of Fisheries and Oceanography (VNIRO) to talk about the possible reasons for the difference. According to scientists, the biggest surprises came from climate change.

“We have succeeded in doing a deeper analysis of salmon by the combination of fisheries and academic knowledge added by data from longstanding surveys,” Marchenko said. “No doubt, we will able to enhance the accuracy of our forecasts by including climate parameters into our models.”

Political Polarization and Expected Economic Outcomes (summary)

“87% of Democrats expect Biden to win while 84% of Republicans expect Trump to win”

“Republicans expect a fairly rosy economic scenario if Trump is elected but a very dire one if Biden wins. Democrats … expect calamity if Trump is re- elected but an economic boom if Biden wins.”

Dart Throwing Spider Monkey proudly presents the third part of his Intro to Forecasting series: Building Probabalistic Intuition

A gentle introduction to information charts: a simple tool for thinking about probabilities in general, but in particular for predictions with a sample size of one.

A youtube playlist with forecasting content h/​t Michal Dubrawski.

Farm-level outbreak forecasting tool expands to new regions

An article with some examples of Crime Location Forecasting, and on whether it can be construed as entrapment.

Why Forecasting Snow Is So Difficult: Because it is very sensitive to initial conditions.

Google looking for new ways to predict cyber-attackers’ behavior.

Long Content

Taking a disagreeing perspective improves the accuracy of people’s quantitative estimates, but this depends on the question type.

...research suggests that the same principles underlying the wisdom of the crowd also apply when aggregating multiple estimates from the same person – a phenomenon known as the “wisdom of the inner crowd”

Here, we propose the following strategy: combine people’s first estimate with their second estimate made from the perspective of a person they often disagree with. In five pre-registered experiments (total N = 6425, with more than 53,000 estimates), we find that such a strategy produces highly accurate inner crowds (as compared to when people simply make a second guess, or when a second estimate is made from the perspective of someone they often agree with). In explaining its accuracy, we find that taking a disagreeing perspective prompts people to consider and adopt second estimates they normally would not consider as viable option, resulting in first- and second estimates that are highly diverse (and by extension more accurate when aggregated). However, this strategy backfires in situations where second estimates are likely to be made in the wrong direction. Our results suggest that disagreement, often highlighted for its negative impact, can be a powerful tool in producing accurate judgments.

..after making an initial estimate, people can be instructed to base their additional estimate on different assumptions or pieces of information. A demonstrated way to do this has been through “dialectical bootstrapping” where, when making a second estimate, people are prompted to question the accuracy of their initial estimate. This strategy has been shown to increase the accuracy of the inner crowd by getting the same person to generate more diverse estimates and errors… …as a viable method to obtain more diverse estimates, we propose to combine people’s initial estimate with their second estimate made from the perspective of a person they often disagree with… …although generally undesirable, research in group decision-making indicates that disagreement between individuals may actually be beneficial when groups address complex problems. For example, groups consisting of members with opposing views and opinions tend to produce more innovative solutions, while polarized editorial teams on Wikipedia (i.e., teams consisting of ideologically diverse sets of editors) produce higher quality articles...

These effects occur due to the notion that disagreeing individuals tend to produce more diverse estimates, and by extension errors, which are cancelled out across group members when averaged. …we conducted two (pre-registered) experiments...

People who made a second estimate from the perspective of a person they often disagree with benefited more from averaging than people who simply made a second guess.

… However, although generally beneficial, this strategy backfired in situations where second estimates were likely to be made in the wrong direction. [...] For example, imagine being asked the following question: “What percent of China’s population identifies as Christian?”. The true answer to this question is 5.1% and if you are like most people, your first estimate is probably leaning towards this lower end of the scale (say your first estimate is 10%). Given the position of the question’s true answer and your first estimate, your second estimate is likely to move away from the true answer towards the opposite side of the scale (similar to the scale-end-effect45), effectively hurting the accuracy of the inner crowd.

We predicted that the average of two estimates would not lead to an accuracy gain in situations where second estimates are likely to be made in the wrong direction. We found this to be the case when the answer to a question was close to the scale’s end (e.g., an answer being 2% or 98% on a 0%-100% scale).

A 2016 article attacking Nate Silver’s model, key to understanding why Nate Silver is often so smug.

Historical Presidential Betting Markets, in the US before 2004.

...we show that the market did a remarkable job forecasting elections in an era before scientific polling. In only one case did the candidate clearly favored in the betting a month before Election Day lose, and even state-specific forecasts were quite accurate. This performance compares favorably with that of the Iowa Elec-tronic Market (currently [in 2004] the only legal venue for election betting in the United States). Second, the market was fairly efficient, despite the limited information of participants and attempts to manipulate the odds by political parties and newspapers. The extent of activity in the presidential betting markets of this time was astonishingly large. For brief periods, betting on political outcomes at the CurbExchange in New York would exceed trading in stocks and bonds.

Covering developments in the Wall Street betting market was a staple of election reporting before World War II. Prior to the innovative polling efforts of Gallup, Roper and Crossley, the other information available about future election outcomes was limited to the results from early-season contests, overtly partisan canvasses and straw polls of unrepresentative and typically small samples. The largest and best-known nonscientific survey was the Literary Digest poll, which tabulated millions of returned postcard ballots that were mass mailed to a sample drawn from telephone directories and automobile registries. After predicting the presidential elections correctly from 1916 to 1932, the Digest famously called the 1936 contest for Landon in the election that F. Roosevelt won by the largest Electoral College landslide of all time. Notably, although the Democrat’s odds prices were relatively low in 1936, the betting market did pick the winner correctly The betting quotes filled the demand for accurate odds from a public widely interested in wagering on elections. In this age before mass communication technologies reached into America’s living rooms, election nights were highly social events, comparable to New Year’s Eve or major football games. In large cities,crowds filled restaurants, hotels and sidewalks in downtown areas where newspapers and brokerage houses would publicize the latest returns and people withsporting inclinations would wager on the outcomes. Even for those who could not afford large stakes, betting in the run-up to elections was a cherished ritual. Awidely held value was that one should be prepared to “back one’s beliefs” either with money or more creative dares. Making freak bets—where the losing bettor literally ate crow, pushed the winner around in a wheelbarrow or engaged in similar public displays—was wildly popular

Gilliams (1901, p. 186) offered “a moderate estimate” that in the 1900 election “there were fully a half-million such [freak]bets—about one for every thirty voters.” In this environment, it is hardly surprising that the leading newspapers kept their readership well informed about the latest market odds.

The newspapers recorded many betting and bluffing contests between Col. Thomas Swords, Sergeant of Arms of the National Republican Party, and Democratic betting agents representing Richard Croker, Boss of Tam-many Hall, among others. In most but not all instances, these officials appear to bet in favor of their party’s candidate; in the few cases where they took the other side, it was typically to hedge earlier bets.

...In conclusion, the historical betting markets do not meet all of the exacting conditions for efficiency, but the deviations were not usually large enough to generate consistently profitable betting strategies using public information

The newspapers reported substantially less betting activity in specific contests and especially after 1940. In part, this reduction in reporting reflected a growing reluctance of newspapers to give publicity to activities that many considered unethical. There were frequent complaints that election betting was immoral and contrary to republican values. Among the issues that critics raised were moral hazard, election tampering, information withholding and strategic manipulation.

In response to such concerns, New York state laws did increasingly attempt to limit organized election betting. Casual bets between private individuals always remained legal in New York. However, even an otherwise legal private bet on elections technically disqualified the participants from voting—although this provision was rarely enforced—and the legal system also discouraged using the courts to collect gambling debts. Anti-gambling laws passed in New York during the late 1870s and the late 1900s appear to put a damper on election betting, but in both cases, the market bounced back after the energy of the moral reformers flagged. Ultimately, New York’s legalization of parimutuel betting on horse races in 1939 may have done more to reduce election betting than any anti-gambling policing. With horseracing, individuals interested in gambling could wager on several contests promising immediate rewards each day, rather than waiting through one long political contest.

The New York Stock Exchange and the CurbMarket also periodically tried to crack down. The exchanges characteristically did not like the public to associate their socially productive risk-sharing and risk-taking functions with gambling on inherently zero-sum public or sporting events. In the 1910s and again after the mid-1920s, the stock exchanges passed regulations to reduce the public involvement of their members. In May 1924, for example, both the New York Stock Exchange and the Curb Market passed resolutions expressly barring their members from engaging in election gambling. After that, while betting activity continued to be reported in the newspapers, the articles rarely named the participants. During the 1930s, the press noted that securities of private electrical utilities had effectively become wagers on Roosevelt (on the grounds that New Deal policy initiatives such as the formation of the Securities and Exchange Commission and the Tennessee Valley Authority constrained the profits of existing private utilities).

A final force pushing election betting underground was the rise of scientific polling. For newspapers, one of the functions of reporting Wall Street betting odds had been to provide the best available aggregate information [...] The scientific polls, available on a weekly basis, provided the media with a ready substitute for the betting odds, one not subject to the moral objections against gambling.

In summer 2003, word leaked out that the Department of Defense was considering setting up a Policy Analysis Market, somewhat similar to the Iowa Electronic Market, which would seek to provide a market consensus about the likelihood of international political developments, especially in the Middle East. Critics argued that this market was subject to manipulation by insiders and might allow extremists to profit financially from their actions.


Note to the future: All links are added automatically to the Internet Archive. In case of link rot, go there and input the dead link.


“I’d rather be a bookie than a goddamned poet.” — Sherman Kent, 1964, when pushing for more probabilistic forecasts and being accused of trying to turn the CIA into “the biggest bookie shop in town.”