A bit more explanation on what the Kelly Criterion is, for those who haven’t seen it before: suppose you’re making a long series of independent bets, one after another. They don’t have to be IID, just independent. They key insight is that the long-run payoff will be the product of the payoff of each individual bet. So, from the central limit theorem, the logarithm of the long-run payoff will converge to the average logarithm of the individual payoffs times the number of bets.
This leads to a simple statement of the Kelly Criterion: to maximize long-run growth, maximize the expected logarithm of the return of each bet. It’s quite general—all we need is multiplicative returns and some version of the central limit theorem.
I’m not really convinced by this argument. Yes, Newcomen’s specific design needed precise manufacturing capability. But I would expect that, if there had been demand for steam engines earlier, someone would have found a design which could work with lower-precision manufacturing. Newcomen just used what was available.
Also, I intended Newcomen as an example of an early steam engine which failed to catch on, because it wasn’t very profitable yet.
Test is easy: have the inputs become cheaper and/or the outputs become more expensive, compared to alternative technologies? In other words, is it more profitable now?
I’ve been chewing on that one a lot. I don’t have a satisfying answer yet. The sheer size/density of the population is one hypothesis, and crop yields are another (rice vs wheat). But I don’t feel like I understand it yet.
Here’s an alternative hypothesis for why the Chinese didn’t adopt the press, even after the introduction of paper. It also explains why the Chinese didn’t adopt wind/water mills, artillery, the slave trade, and ultimately automation: the cost of capital relative to labor was much higher in China than Europe. Across the board, we see much lower Chinese adoption of capital-intensive technology in favor of labor-intensive alternatives, even when the technical prerequisites were met centuries earlier.
Yes! I was thinking about adding a couple paragraphs about this, but couldn’t figure out how to word it quite right.
When you’re trying to create solid theories de-novo, a huge part of it is finding people who’ve done a bunch of experiments with it, looking at the outcomes, and paying really close attention to the places where they don’t match your existing theory. Elinor Ostrom is one of the best examples I know: she won a Nobel in economics for basically saying “ok, how do people actually solve commons problems in practice, and does it make sense from an economic perspective?”
In the case of a wheel with weights on it, that’s been nailed down really well already by generations of physicists, so it’s not a very good example for theory-generation.
But one important aspect does carry over: you have to actually do the math, to see what the theory actually predicts. Otherwise, you won’t notice when the experimental outcomes don’t match, so you won’t know that the theory is incomplete.
Even in the wheel example, I’d bet a lot of physics-savvy people would just start from “oh, all that matters here is moment of inertia”, without realizing that it’s possible to shift the initial gravitational potential. But if you try a few random configurations, and actually calculate how fast you expect them to go, then you’ll notice very quickly that the theory is incomplete.
I think this is related to a general class of mistakes, so I just wrote up a post on it.
This case is a bit different from what that post discusses, in that you’re not focused on a non-critical assumption, but on a non-critical method. We can use VNM rationality for decision-making just fine without computing full utilities for every decision; we just need to compute enough to be confident that we’re making the higher-utility choice. For that purpose we can use tricks like e.g. changing the unit of valuation on the fly, making approximations (as long as we keep track of the error bars), etc.
No matter what decision you make, it seems that you will inevitably regret it.
It’s not exactly a puzzle that game theory doesn’t always give pure solutions. This puzzle should still have a solution in mixed strategies, assuming the genie can’t predict quantum random number generators.
Bernstein-Von Mises Theorem. It is indeed not always true, the theorem has some conditions.
An intuitive example of where it would fail: suppose we are rolling a (possibly weighted) die, but we model it as drawing numbered balls from a box without replacement. If we roll a bunch of sixes, then the model thinks the box now contains fewer sixes, so the chance of a six is lower. If we modeled the weighted die correctly, then a bunch of sixes is evidence that’s it’s weighted toward six, so the chance of six should be higher.
Takeaway: Bernstein-Von Mises typically fails in cases where we’re restricting ourselves to a badly inaccurate model. You can look at the exact conditions yourself; as a general rule, we want those conditions to hold. I don’t think it’s a significant issue for my argument.
We could set up the IRL algorithm so that atom-level simulation is outside the space of models it considers. That would break my argument. But a limitation on the model space like that raises other issues, especially for FAI.
Problem is, if there’s a sufficiently large amount of sufficiently precise data, then the physically-correct model’s high accuracy is going to swamp the complexity penalty. That would be a ridiculously huge amount of data for atom-level physics, but there could be other abstraction levels which require less data but are still not what we want (e.g. gene-level reward functions, though that doesn’t fit the driving example very well).
Also, reliance on limited data seems like the sort of thing which is A Bad Idea for friendly AGI purposes.
Wouldn’t the reward function “maximize action for this configuration of atoms” fit the data really well (given unrealistic computational power), but produce unhelpful prescriptions for behavior outside the training set? I’m not seeing how IRL dodges the problem, other than the human manipulating the algorithm (effectively choosing a prior).
Chapter 6 of Cover & Thomas’ “Elements of Information Theory” gives good info on the Kelly criterion, how to derive it, and the relations between prices/probabilities and entropy/rate of return.
For math finance, the class I took back in college used Shreve’s “Stochastic Calculus for Finance II”. I wouldn’t necessarily recommend that just to learn about this, but it’s a good source for brownian motion, some basic measure theory, and the core theory of asset pricing.
Typically complete markets come up in discussing the fundamental theorem of asset pricing. The first part of the theorem says that any arbitrage-free set of asset prices has a “risk-neutral measure”, i.e. a market-implied set of probabilities. The second part says those probabilities are unique iff the market is complete—if some bets can’t be placed, then there are multiple possible market-implied probabilities. Any book which covers the fundamental theorem should have at least some coverage of complete markets.
Finally, if you’re looking for something more applied, Hull’s “Options, Futures and Other Derivatives” is the usual starting point.
Tl;dr: The problem is that we have no way to bet on joint outcomes. If we add bets on joint outcomes, then the market is complete, we can combine the two outcomes into a single joint outcome, and Kelly criteria should work. To properly break Kelly, we need bets which resolve at different times.
This hits on a critical point which is fundamental to mathematical finance, but virtually unknown outside of it: complete markets. A “complete market” is one in which we can place any possible bet on whatever random variables are involved.
For instance, if we have a stock market with nothing but a single stock, and we’re betting on the stock’s price in the next time-step, then that’s an incomplete market: we have no way to place a bet which pays $1 if the price ends up within some window, and $0 otherwise. On the other hand, if we add in the full option chain (call options at every possible price), then the market is complete. We can pick a portfolio of options to make any possible bet on the stock’s price next timestep.
Mathematically, incomplete markets are a mess. You can’t get the bet you actually want, so you’re stuck trying to approximate it with the available bets, and that approximation gets messy.
On the other hand, if you do have complete markets, then you can combine everything into a single random variable and just use the Kelly criterion.
From the wording of this post it sounds like you made up the term “Definition-Theorem-Proof”? That would be quite amusing, because that’s the standard term used for this style of textbooks.
There is a great schism in mathematics between mathematical physicists/applied mathematicians/intuitionists, and pure mathematicians/Bourbaki. The DTP style is strongly characteristic of the latter, and much-bemoaned by the former.
Also sorry I didn’t actually answer your main question. It’s actually something I’ve thought about quite a bit, but usually in the context of “not enough data to map out this very-high-dimensional space” rather than “not enough data to detect a small change”. The problem is similar in both cases. I’ll probably write a post or two on it at some point, but here’s a very short summary.
Traditional probability theory relies heavily on large-number approximations; mainstream statistics uses convergence as its main criterion of validity. Small data problems, on the other hand, are much better suited to a Bayesian approach. In particular, if we have a few different models (call them Mi) and some data D, we can compute the posterior P[Mi|D] without having to talk about convergence or large numbers at all.
The trade-off is that the math tends to be spectacularly hairy; P[Mi|D] usually involves high-dimensional integrals. Traditional approaches approximate those integrals for large numbers of data points, but the whole point here is that we don’t have enough data for the approximations to be valid.