How much should the performance of the market change our opinion about
the viability of using prediction platforms to predict RCTs, and thus be
plausibly useful in selecting experiments to run and actions to perform?
One issue here is that we only observed two datapoints: Two logscores
from the markets on the Pomodoro method (-0.326) and the Vitamin D₃
experiment (-0.333). Qualitatively these are already fairly assuring
because they’re both far better than a random score of −0.69. But if we
want to quantify the amount of information we’ve gained, we can do that
by performing a Bayesian update.
For that, we need a prior. What prior to choose? I
tried fiddling around a bit with the exponential
distribution,
which is the maximum entropy
distribution
over possible logscores, only nailed down by the mean of the
distribution. It represents a state of minimal knowledge.
There’s also the Gamma
distribution
which is great because it’s a conjugate
prior assuming an
exponentially distributed underlying data generating process. So,
after updating on the datapoints we again get a Gamma-distribution as
the posterior.
But I didn’t go with that one because… I wasn’t getting the pretty
results I wanted[1].
With the exponential distribution, a thing that happened was that I’d
calculate the resulting distribution after two updates, but the first
update would be very aggressive and the second update would then update
away harder than the first update had updated towards the datapoints,
causing a net information loss. The exponential distribution also
has very long tails, resulting in a median of −1.0, which implies that the
median market has a logscore that is worse than chance. I don’t believe
that to be the case. (The maxent mean implies that the mean market is
only as good as chance, which I also wouldn’t believe a priori? I think?)
As for using the Gamma distribution as a prior, I simply don’t think that
the underlying data generating process is exponentially distributed, and
thus we don’t get any advantage through conjugacy. The Gamma distribution
also has two parameters, which is too much to be nailed down with only
two datapoints.
So I decided to pick a different prior, and landed on the half normal
distribution
with half-normally distributed variance (σ ~ HalfNormal(0.5)), which
has some nicer properties than the exponential distribution, especially
with its thin tails. But the half normal distribution can’t be updated
in a closed-form solution, so instead I had to write a short script
using pymc. Because of missing
conjugacy the resulting distribution is not a half-normal distribution,
but something a lot more complicated. I can’t be bothered to try to
calculate what it even is.
Summary visualization of the update:
The script[2] initializes the model with the
half normal prior, which in turn has a standard
deviation distributed
with HalfNormal("sigma", sigma=0.5). We then update on observed=[0.326, 0.333]:
with pm.Model() as adaptive_model:
σ = pm.HalfNormal('sigma', sigma=0.5)
obs = pm.HalfNormal('distances', sigma=σ, observed=distances)
trace = pm.sample(2000, tune=1000, chains=4, target_accept=0.95,
return_inferencedata=True)
We can then observe the samples for the new standard deviation
Many thanks to clippy
(twitter) for M500,
and Tetraspace
(twitter) for M1000, which I
used to subsidize the markets. Also many thanks to the manifold admin
Genzy for subsidizing each market
with M450.
Your funding of the sciences is greatly appreciated.
My gratitude also goes out to all the traders on the markets. You help
me prioritize, you help us gain knowledge.
This is very bad statistical practice. I’m doing this because I want a cutesy title with a positive number of of bits as an update, and because I wanted to learn how to do Bayesian updating using computers.
0.836 Bits of Evidence In Favor of Futarchy
So, I put up some prediction markets on the results of quantified self RCTs. I ran two of the experiments, and scored both markets on the results.
How much should the performance of the market change our opinion about the viability of using prediction platforms to predict RCTs, and thus be plausibly useful in selecting experiments to run and actions to perform?
One issue here is that we only observed two datapoints: Two logscores from the markets on the Pomodoro method (-0.326) and the Vitamin D₃ experiment (-0.333). Qualitatively these are already fairly assuring because they’re both far better than a random score of −0.69. But if we want to quantify the amount of information we’ve gained, we can do that by performing a Bayesian update.
For that, we need a prior. What prior to choose? I tried fiddling around a bit with the exponential distribution, which is the maximum entropy distribution over possible logscores, only nailed down by the mean of the distribution. It represents a state of minimal knowledge.
There’s also the Gamma distribution which is great because it’s a conjugate prior assuming an exponentially distributed underlying data generating process. So, after updating on the datapoints we again get a Gamma-distribution as the posterior.
But I didn’t go with that one because… I wasn’t getting the pretty results I wanted[1].
With the exponential distribution, a thing that happened was that I’d calculate the resulting distribution after two updates, but the first update would be very aggressive and the second update would then update away harder than the first update had updated towards the datapoints, causing a net information loss. The exponential distribution also has very long tails, resulting in a median of −1.0, which implies that the median market has a logscore that is worse than chance. I don’t believe that to be the case. (The maxent mean implies that the mean market is only as good as chance, which I also wouldn’t believe a priori? I think?)
As for using the Gamma distribution as a prior, I simply don’t think that the underlying data generating process is exponentially distributed, and thus we don’t get any advantage through conjugacy. The Gamma distribution also has two parameters, which is too much to be nailed down with only two datapoints.
So I decided to pick a different prior, and landed on the half normal distribution with half-normally distributed variance (
σ ~ HalfNormal(0.5)
), which has some nicer properties than the exponential distribution, especially with its thin tails. But the half normal distribution can’t be updated in a closed-form solution, so instead I had to write a short script using pymc. Because of missing conjugacy the resulting distribution is not a half-normal distribution, but something a lot more complicated. I can’t be bothered to try to calculate what it even is.Summary visualization of the update:
The script[2] initializes the model with the half normal prior, which in turn has a standard deviation distributed with
HalfNormal("sigma", sigma=0.5)
. We then update onobserved=[0.326, 0.333]
:We can then observe the samples for the new standard deviation
and calculate the log-likelihoods, the Bayes factor, and the number of bits in the update:
The whole script has this output:
Thus: 0.868 bits in favor of futarchy[3].
Acknowledgements
Many thanks to clippy (twitter) for
M500, andTetraspace (twitter) forM1000, which I used to subsidize the markets. Also many thanks to the manifold adminGenzy for subsidizing each market withM450.Your funding of the sciences is greatly appreciated.
My gratitude also goes out to all the traders on the markets. You help me prioritize, you help us gain knowledge.
This is very bad statistical practice. I’m doing this because I want a cutesy title with a positive number of of bits as an update, and because I wanted to learn how to do Bayesian updating using computers.
Code here. Thanks to Claude 4 Sonnet for writing the code and walking me through the process.
Under several favorably cherry-picked assumptions. Don’t @ me.