I, too, am excited.
rossry
However, the post assumes that 1) there is (or should be) one correct answer, 2) which is of the form: (1, 0, 0, 0) or a permutation thereof, and 3) the material is independent of the system (does not include probability, for example).
These are assumed for the sake of explanation, but none are necessary; in fact, the scoring rule and analysis go through verbatim if you have questions with multiple answers in the form of arbitrary vectors of numbers, even if they have randomness. The correct choice is still to guess, for each potential answer, your expectation of that answer’s realized result.
just because “I don’t want to see more of this” doesn’t mean it’s up to me to influence whether anyone else can see it.
I feel like this proves more than you want. For example, is it up to you to influence whether someone sees more of something, just because you want to see more of it?
Similarly, it’s also helpful to get a reason for up votes, but enforcing that a reason be given can reduce the amount of informationaggregation that will occur, on some margins. What justifies an asymmetry between how we aggregate positive information and how we aggregate negative information? Or would you also argue that up votes should come with reasons?
I mean a weighted sum where weights add to unity.
You need an exponentially increasing reward for your argument to go through. In particular, this doesn’t prove enough:
Since at each moment in time, you face the exact same problem (linearly increasing reward, αexponentially decaying survival rate)
The problem isn’t exactly the same, because the ratio of (linear) growth rate to current value is decreasing over time. At some point, the value equals (is the right expression, I think?), and your marginal value of waiting is 0 (and decreasing), and you sell.
If the ratio of growth rate to current value is constant over time, then you’re in the same position at each step, but then it’s either the St. Petersburg paradox or worthless.
Sorry, I’m writing pretty informally here. I’m pretty sure that there are senses in which these arguments can be made formal, though I’m not really interested in going through that here, mostly because I don’t think formality wins us anything interesting here.
Some notes, though: (still in a fairly informal mode)
My intuition that the only way to combine the two estimates without introducing a bias or assumed prior is by a mixture comes from treating each estimate (treated as a random variable) as a true estimate plus some idiosyncratic noise. Then any function of them yields an expression in terms of true estimate, each respective estimator’s noise, and maybe other constants. But “unbiased” implies that setting the noise terms to 0 should set the expression equal to the true estimate (in expectation). Without making assumptions about the actual distribution of true values, this needs to just be 1 times the true estimate (plusmaybe some other noise you don’t want, which I think you can get rid of). And the only way you get there from the noisy estimates is a mixture.
By “assembly”, I’m proposing to treat each estimate as a larger number of estimates with the same mean and larger variance, such that they form equivalent evidence. Intuitively, this works out if the count goes as the square of the variance ratio. Then I claim that the natural thing to do with many estimates each of the same variance is to take a straight average.
But they’re distributions, not observations.
Sure, formally each observer’s posterior is a distribution. But if you treat “observer 1′s posterior is Normally distributed, with mean and standard deviation ” as an observation you make as a Bayesian (who trusts observer 1′s estimation and calibration), it gets you there.
Ah, okay. In that case, here are a few attempts to ground the idea philosophically:

It’s the “priorfree” estimate with the least error. See that unbiased “priorfree” estimates must be mixtures of the (unbiased) estimates, and that biased estimates are dominated by being scaled to fit. So the best you can do is to pick the mixture that minimizes variance, which this is.

It actually is the point that maximizes the product of likelihoods (equivalently, the joint likelihood, since the estimate errors are assumed to be independent). You can see this by remembering that the Normal pdf is the inverse exponential quadratic, so you maximize the product of likelihoods by maximizing the sum of loglikelihoods, which happens where the loglikelihood slopes are each the negative of the other, which happens when distances are inversely proportional to the x^2 coefficients (or the weights are inversely proportional to the variances).

There’s a pseudofrequentist(?) version of this, where you treat each estimate as an assembly of (highervariance) estimates at the same point, notice that the count is inversely proportional to the variance, and take the total population mean as your estimator. (You might like the mean for its L2minimizing properties.)

A Bayesian interpretation is that, given the improper prior uniformly distributed over numbers and treating the two as independent pieces of evidence, the given formula gives the mode of the posterior (and, since the posterior is Normal, gives its mean and median as well).
Are any of those compelling?

Are you asking for a justification for averaging independent estimates to achieve an estimate with lower errors? “Blended estimate” isn’t a specific term of art, but the general idea here is so common that I’m not sure _what_ the most common term for it is.
And the theoretical justification—under assumptions of independent and Normal errors—is at the post, where the author demonstrates that there’s a lower error from the weighted average (and that their choice of weights minimizes the error). Am I missing something here?
Arimaa is the(?) classic example of a chesslike board game that was designed to be hard for AI (albeit from an age before “AI” mostly meant ML).
From David Wu’s paper on the bot that finally beat top humans in 2015:
Why is Arimaa computerresistant? We can identify two major obstacles.
The first is that in Arimaa, the perturn branching factor is extremely large due to the combinatorial possibilities produced by having four steps per turn. Even after identifying equivalent permutations of steps as the same move, on average there are about 17000 legal moves per turn (Haskin, 2006). This is a serious impediment to search.
Obviously, a high branching factor alone doesn’t imply computerresistance, particularly if the standard of comparison is with human play: high branching factors affect humans as well. However, Arimaa has a property common to many computerresistant games: that “per amount of branching” the board changes slowly. Indeed, pieces move only one orthogonal step at a time. This makes it possible to effectively plan ahead, cache evaluations of local positions, and visualize patterns of good moves, all things that usually favor human players.
The second obstacle is that Arimaa is frequently quite positional or strategic, as opposed to tactical. Capturing or trading pieces is somewhat more difficult in Arimaa than in, for example, Chess. Moreover, since the elephant cannot be pushed or pulled and can defend any trap, deadlocks between defending elephants are common, giving rise to positions sparse in easy tactical landmarks. Progress in such positions requires good longterm judgement and strategic understanding to guide the gradual maneuvering of pieces, posing a challenge for positional evaluation.
It’s easy to play armchair statistician and contribute little, but I want to point out that the empirics cited here are effectively just anecdotes. The paper studies 13 pairs and 13 individuals in three assignments in one class at UUtah. Its estimate of relative time costs is only significant to ~ because development time has variance of (if I backsolved correctly) 65%, which...seems about right. Still, it seems like borderline abuse of frequentist statistics to argue that a twotailed p<0.05 should be required to reject the hypothesis that pairs finish projects in half the wallclock time of individuals (which is the null the analysis assumes).
That said, the author correctly identifies that quality matters significantly more than speed. The quality metric, however, is “assignment tests passed” in throwaway academic projects, eliding the questions of what quality failures would or wouldn’t be caught by the review / CI workflows that an industrial project would be going through anyway.
So, finger to the wind, this study feels like it suggests that a pair spends 15% more personhours (once they get used to each other) before turning their schoolwork in, and do 15% more of the work of the assignment than a student working alone. Consistent with the higher reported workenjoyment numbers! Definitely a stronger showing than I would have guessed! But definitely not wellabstracted by “no significant result for time; significant improvement for quality”.
What am I missing here?
(continued, to address a different point)
B and C seem like arguments against “simple” (i.e., evenodds) bets as well as weird (e.g., “70% probability”) bets, except for C’s “like bets where I’m surer...about what’s going on”, which is addressed by A (sibling comment).
Your point about differences in wealth causing different people to have different thresholds for meaningfulness is valid, though I’ve found that it matters much less than you’d expect in practice. It turns out that people making upwards of $100k/yr still do not feel good about opening up their wallet you give you $3. In fact, it feels so bad that if you do it more than a few times in a row, you really feel the need to examine your own calibration, which is exactly the success condition.
I’ve found that the small ritual of exchanging pieces of paper just carries significantly more weight than would be implied by their relation to my total savings. (For this, it’s surprisingly important to exchange actual pieces of paper; electronic payments make the whole thing less real, ruining the whole point.)
Finally, it’s hard to argue with someone’s utility function, but I think that some rationalists get this one badly wrong by failing to actually multiply real numbers. For example, if you make a $10 bet (as defined in my sibling comment) every day for a year at the true probabilities, the standard decision of your profit/loss on the year is <$200, or $200/365 per day, which seems like a very small annual cost to practice being better calibrated and evaluate just how wellcalibrated you are.
Hi! I’ve done a fair amount of betting beliefs for fun and calibration over the years; I think most of these issues are solvable.
A is a solved problem. The formulation that I (and my local social group) prefer goes like “The buyer pays $X*P% to the seller. The seller pays $X to the buyer if the event comes true.”
The precise payoffs aren’t the important part, so long as they correspond to quoted probabilities in the correct way (and agreed sizes in a reasonable way). So this convention makes the probability you’re discussing an explicit part of the bet terms, so people can discuss probabilities instead of confusing themselves with payoffs (and gives a clear upper bound for possible losses). Then you can work out exact payoffs later, after the bet resolves.
(As a worked example, if you thought a probability was less than 70% and wanted to bet about $20 with me, if you “sold $20 at 70%” in the above convention, you’d either win $2070%=$14 or lose $20($2070%)=$6. But it’s even easier to see that you selling a liability of $20p(happens) for $2070% is good for you if you think p(happens)<70%.)
You’ve right that odds are a terrible convention for betting on probabilities unless you’re trying to hide the actual numbers from your counterparties (which is the norm in retail sports betting).
I also think that if the “sixth friend” donates $10k in line with each other friend’s values and beliefs (as a result of social expectation, not contract), then there’s no particular benefit to being the one who has to handle the money, and you don’t need to trust in multiyear commitments.
Your suggestion is correct, though it seemed too messy (and nonessential) to explain for the sake of an offthecuff proposal. I added a footnote to clarify this above, though.
Proposal: Five friends in this situation write $10k checks[1] to a sixth. They all have a long chat about their altruist values and beliefs. The sixth donates $60k to a variety of EA causes.
Question: Just how likely / unpleasant would the ensuing IRS audit be?
(There’s also a microdonorlottery version of this, except the individual contributions are personal gifts and the full $60k is a charitable donation.)
[1] Actually, you want this to be something like $7k, since the tax deduction from donating is worth [your marginal income tax rate] on the amount, roughly 30%. Formally, $10k less the tax benefits from donating $10k.
That’s exactly correct. It’s a standard taxationbegetsmisallocation scenario.
For reference, PI’s current rules have this effect to roughly 03% per contract, potentially adding across multiple contracts in a bundle. Prices closer to 50% are worse (though prices further away have their own biases, as Zvi explains).
Yeah, Zvi is (unsurprisingly) right; the change in margining rules (after I wrote that post) makes it much better to sell the lowvalue contracts, and the withdrawal fees amortize if you’re in for the longer term.
To new rules, and on the back of my envelope, Zvi’s 12% “arbitrage” is something like a few percent good: maybe it covers withdrawal fees on its own, and likely will do so after a few rounds. The opportunity cost of capital is a whole ’nother issue...
I also strongly endorse the punchline that trading (even on the margins of trading costs) is some of the best rationalist training you can find.
Huh, I hadn’t noticed that they didn’t tie up the potential fees on your winnings. Hypotheses:
bug introduced when they moved from gross margining to net margining years and didn’t reconsider fees withholding
doesn’t actually matter; they don’t give up ~anything by letting some people carrying small balances make free trades
it’s really hard to abuse this into free trades repeatedly
the withholding here is too complicated and feelbad to explain
other
Ah, that makes sense.
Separately, I’m not entirely convinced by that second bullet point—it seems like a nonomniscient state planner in a nonstationary environment would benefit from being able to determine the desired level of redistribution after the wealthy have accrued their income as wealth, rather than needing to get it right as they earned it.
(I’m assuming away the confiscatory impulse here, naturally; in practice, the political economy of confiscation causes serious issues for deferred decisions about distribution like this.)
In what sense are you using the word “trilemma”? I’m either not familiar with the usage or missing a big message of the post.
(The common definition of “trilemma” I’m most familiar with presents three desiderata, of which it’s possible to achieve at most two.)