Forecasting Newsletter: August 2021

Highlights

Despite forecaster consensus to the contrary, Kabul has fallen
CSET-Foretell attempts to make inroads into US decision-making mechanisms
The US’ CDC to launch a new outbreak analysis and forecast center

Index

Prediction Markets & Forecasting Platforms
Blog Posts
Long Content
In The News

Prediction Markets & Forecasting Platforms

Metaculus

SimonM (a) kindly curated the top comments from Metaculus this past August (a). They are:

johnnycaffeine (a) asks if the community prediction is a good indicator that people didn’t forsee the rapidity of the Taliban takeover
j.m. (a) did the (now moot) impeachment math for Cuomo: there would probably have been enough votes for an impeachment to go through.
Jgalt (a) flags up a rather poorly aged forecast by Biden.
alexrjl (a) points out a design flaw in Rootclaim’s Challenge (a): “I find it ironic in the extreme that Rootclaim makes repeated reference to the overconfidence of experts, but that their challenge requires you to “win a debate”, meaning that if you think they are overconfident but not directionally wrong (e.g. assigning 90% to something which you think should be assigned a 60% probability) there is no way for you to win the bet.”
j.m. (a) points out the internet was adopted slower than one might otherwise think.

Metaculus begins the next round of the Keep Virginia Safe Tournament (a), which has a $1,000 prize pool. They have also pushed a redesign of their frontpage (a).

Good Judgment & Good Judgment Open

Here are the top few best comments from Good Judgment Open (a), as curated by myself:

rjfmgy (a) compares and contrasts the current situation in Afghanistan with the Bashar al-Assad regime in Syria in 2015, on which he also forecasted at the time.
Anneinak (a) meets with the Dutch Ambassador to forecast “When will a new Dutch government be sworn in after the 2021 general election?”
RyanBeck (a) reads Merck’s Q2 earnings call document, and concludes that molnupiravir will most likely not be approved by the US FDA for use to treat COVID-19 before 1 October 2021. He also shares this report (a) by the US’s Office of the Director of National Intelligence on the origins of COVID-19.
keith-huggins (a) gives the lowdown on the power grab by the President of Tunisia.

In addition, Good Judgment Open introduced the Sky News Challenge (a). Questions are very UK-centric. But on the other hand, Sky News is a well-regarded major UK news channel (a) with a large viewership (a).

Sky News is asking forecasters for their latest judgments on political questions of consequence in the UK and beyond. Our ambition is to use probability changes in crowd forecasts to visualize how key issues in the news agenda are developing. Forecasts and reasoning that attract significant upvotes could be featured in our reporting.

CSET-Foretell

Foretell continues their work of making forecasts and forecasting methodologies more accessible and legible to US decision-makers. Most recentlyh, they are doing this through their “Issue campaigns”; a write-up of the idea can be seen here (a). One of their first issue campaigns is on the topic of the future of the of Defense-Silicon Valley Relationship (a).

Otherwise, CSET-Foretell (or rather, CultivateLabs, the company behind the webpage they use) has also added two new question formats (a). The first asks about 80% confidence intervals for more than one time period at a time, aggregating multiple sub-questions into one. The second format consists of rolling predictions, such that, e.g. “in the next six months” resets every month to refer to the next six months from the month of the forecast.

For example, a forecast on the question Will the Chinese military or other maritime security forces fire upon another country’s civil or military vessel in the South China Sea in the next six months? (a) made during the month of September refers to the September-February period, whereas a forecast made during October would refer to the October-April six month period.

Personally, while I’m glad to see experimentation with new formats, and I even intellectually admit that the new formats are in a sense superior, I still find them fairly unintuitive to forecast on.

Lastly, CSET-Foretell forecasts were quoted by Quartz (a) on on whether VC funding for tech startups will dry up (a) (warning: paywalled), and by SupChina (a) on the composition of the Politburo Standing Committee of the Chinese Communist Party.

Polymarket

Difficulties with The Graph—the service which Polymarket uses to bring the data from the blockchain to its webpage—have continued, with users seeing negative balances. Polymarket has also been hosting many sports markets recently. While they bring large amounts of volume, they make Polymarket less differentiated.

A community driven command-line trading tool (a) continues to get better.

Star Spangled Gamblers (a), a political betting podcast, has been producing some highly entertaining Polymarket related podcasts, e.g.: Betting to #FreeBritney (a) and Your California Recall Playbook is Here (a).

An unrelated project in the Matic chain with a similar sounding name, Poly Network (a), was hacked for $611 million, though most of the funds were later returned (a). Polymarket users pretended to be confused about this out of that perverse sense of humor prevalent in our time.

Metaforecast

I’ve pushed some major upgrades (a) to Metaforecast (a). Chiefly:

search is much better,
one can capture forecasts as images, making it easier to incorporate into blogposts,
questions have quality indicators (number of forecasters, volume traded, liquidity, etc.),
and I’ve added new platforms

In addition, I’ve written a Twitter bot which answers with the closest prediction in the database when @metaforecast is mentioned in a tweet (a):

This makes mentioning forecasts in casual twitter conversation pretty much trivial, so perhaps the sanity level of Twitter conversations could be raised ever so slightly.

Blog Posts

Statistical Modelling (a) discusses Forecast displays that emphasize uncertainty (a), on account of The Economist’s Forecasting Model for the 2021 German election (a) (unpaywalled version), which only offers 95% confidence intervals of parliament seats, rather than explicit probabilities of who will win.

Uncertainty can Defuse Logical Explosions (a) makes the point that the principle of explosion (a)—”P and ¬P, therefore anything follows”—does not apply for an agent with probabilistic beliefs. I thought that the point was very neat, but also that it could be formalized better.

Daniel Kokotajlo writes What 2026 looks like (Daniel’s Median Future) (a), extrapolating the performance of models like GPT-3 year by year.

The team behind Global Guessing (a) is starting a monthly newsletter focused on prediction markets: Crowd Money (a).

Long Content

The D-Squared Digest One Minute MBA—Avoiding Projects Pursued By Morons 101 (a). A blogger which correctly predicted that there would be no weapons of mass destruction in Iraq looks back at how and why.

Literally people have been asking me: “How is it that you were so amazingly prescient about Iraq? Why is it that you were right about everything at precisely the same moment when we were wrong?” No honestly, they have. I’d love to show you the emails I’ve received, there were dozens of them, honest. Honest. Anyway, I note that “errors of prewar planning” is now pretty much a mainstream stylised fact, so I suspect that it might make some small contribution to the commonweal if I were to explain how it was that I was able to spot so early that this dog wasn’t going to hunt. I will struggle manfully with the savage burden of boasting, self-aggrandisement and ego-stroking that this will necessarily involve. It’s been done before, although admittedly by a madman in the process of dying of syphilis of the brain. Sorry, where was I?

In the Abilene paradox (a), a group of people collectively decide on a course of action that is counter to the preferences of many or all of the individuals in the group. It involves a common breakdown of group communication in which each member mistakenly believes that their own preferences are counter to the group’s and, therefore, does not raise objections. A common phrase relating to the Abilene paradox is a desire to not “rock the boat”. h/t Chana.

In the News

The United Nations’ Intergovernmental Panel on Climate Change (a) has a new report (a) out. The report has been making the rounds; e.g., Boris Johnson described it as “sobering reading” (a). Interestingly, it uses probabilistic quantifiers:

Each finding is grounded in an evaluation of underlying evidence and agreement. A level of confidence is expressed using five qualifiers: very low, low, medium, high and very high, and typeset in italics, for example, medium confidence. The following terms have been used to indicate the assessed likelihood of an outcome or a result: virtually certain 99–100% probability, very likely 90–100%, likely 66–100%, about as likely as not 33–66%, unlikely 0–33%, very unlikely 0–10%, exceptionally unlikely 0–1%. Additional terms (extremely likely 95–100%, more likely than not >50–100%, and extremely unlikely 0–5%) may also be used when appropriate. Assessed likelihood is typeset in italics, for example, very likely. This is consistent with AR5. In this Report, unless stated otherwise, square brackets [x to y] are used to provide the assessed very likely range, or 90% interval

For example:

Human influence is very likely the main driver of the global retreat of glaciers since the 1990s and the decrease in Arctic sea ice area between 1979–1988 and 2010–2019 (about 40% in September and about 10% in March). There has been no significant trend in Antarctic sea ice area from 1979 to 2020 due to regionally opposing trends and large internal variability. Human influence very likely contributed to the decrease in Northern Hemisphere spring snow cover since 1950. It is very likely that human influence has contributed to the observed surface melting of the Greenland Ice Sheet over the past two decades, but there is only limited evidence, with medium agreement, of human influence on the Antarctic Ice Sheet mass loss.
It is virtually certain that the global upper ocean (0–700 m) has warmed since the 1970s and extremely likely that human influence is the main driver. It is virtually certain that human-caused CO2 emissions are the main driver of current global acidification of the surface open ocean. There is high confidence that oxygen levels have dropped in many upper ocean regions since the mid-20th century, and medium confidence that human influence contributed to this drop.

xkcd (a) showcases a very accurate prediction from Exxon (a), made back in 1982.

Kabul has fallen. One can laugh at Biden, and mention that he is “not a superforecaster”. But forecasters and superforecasters alike also failed to see this one coming. I’ve written a postmortem from a forecasting perspective here, available to Substack subscribers. Metaculus also has a post-mortem thread here (a).

CDC recruits outsiders to lead a new center on disease forecasting (a), with Marc Lipsitch (a) as Director of Science. The name might ring a bell for EA readers.

5G Wireless Could Interfere with Weather Forecasts (a). On the one hand, water absorbs electromagnetic radiation differently at different frequencies, and monitoring the 24 gigahertz frequency is apparently particularly informative. On the other hand, weather satellites use a 16 gigahertz band to communicate with stations on the ground. And proposed 5G bands might interfere with signals at either of those frequencies—reporting doesn’t make clear which—and thus deteriorate weather forecasting performance.

...the biggest issue involves a spectrum called 24 gigahertz, which weather satellites use to monitor natural microwave signals produced by water vapor at various levels in the atmosphere. The device they use is a microwave radiometer.
But the signals made by water vapor and other natural weather signatures become fainter in a cacophonous surge of phone signals. “If you have a large network of cellphone towers transmitting many orders of magnitude more power near the ground, some of that reflects upward and parts of the atmosphere will become very noisy,” Mahoney said.
...the most “insidious” impact of rising noise levels on a weather spectrum would emerge if they caused errors or gaps in the weather data that is undetected. The erroneous data might be included in computer models that scientists use for, among other things, predicting future climate behavior.
Just where the FCC will go next with its Frontier Spectrum policy on 5G is unclear. According to the House Science Committee, it has already taken in almost $2 billion from 29 winning bidders for space on the 24 gigahertz band.

The US military announces “Global Information Dominance“ experiments, using machine learning to automate analyzing and collecting intelligence (primary source (a), secondary source (a)). Some highlights and thoughts here (a).

Inflation Comes for Aluminum:

Demand is set to surge on the back of climate-change investment, and mega-producer China—which accounts for more than half of global output—is cracking down on smelting to reduce pollution and meet green targets.
It’s already jumped 26% this year to about $2,500 a ton, one the best performers on the London Metal Exchange. Goldman Sachs Group Inc. is among those seeing more gains ahead, forecasting record prices above $3,000 by late next year.
“It takes quite a mindset change—some viewed buying aluminum similar to buying groceries in the supermarket,” said Philippe Mueller, head of aluminum trading at Trafigura. “It’s not going to work like this anymore.”
The metal isn’t alone in facing short-term issues. The combination of soaring demand and spluttering supply after covid-19 disruption has upended many raw materials markets, all of which is feeding the global inflation scare that’s taken hold in some corners this year.

On the topic of inflation, and to finish this newsletter with a forecast, see:

Source: Will inflation be 0.4% or more from July to August?

Note to the future: All links are added automatically to the Internet Archive, using this tool (a). “(a)” for archived links was inspired by Milan Griffes (a), Andrew Zuckerman (a), and Alexey Guzey (a).

Good ideas do not need lots of lies told about them in order to gain public acceptance

Daniel Davies (a)