This is not financial advice.

Quantitative finance (QF) is the art of using mathematics to extract money from a securities market. A security is a fungible financial asset. Securities include stocks, bonds, futures, currencies, cryptocurrencies and so on. People often use the techniques of QF to extract money from prediction markets too, particularly sports betting pools.

Expected return is future outcomes weighted by probability. A trade has edge if its expected return is positive. You should never make a trade with negative expected return. It is not enough just to use expected return. Most peoples’ value functions curve downward. The marginal value of money decreases the more you have. Most people have approximately logarithmic value functions.

A logarithmic curve is approximately linear when you zoom in. Losing 1% of your net worth hurts you slightly more than earning 1% of your net worth helps you. But the difference is usually small enough to ignore. The difference between earning 99% of your net worth and losing 99% of your net worth is not ignorable.

When you gain or lose 1% of your net worth, the expected change to the logarithm of your wealth is a tiny −0.01%. When you gain or lose 99% of your net worth the expected change to the logarithm of your wealth is −400%.

$\begin{matrix} log (1.01) + log (0.99) & \approx & - 0.0001 log (1.99) + log (0.01) & \approx & - 4 \end{matrix}$

This is called a risk premium. For every positive edge you can use the Kelly criterion to calculate a bet small enough such that the you edge exceeds your risk premium. In practice traders tend to use fractional Kelly.

Minimum transaction costs are often constant. It is not sufficient for your edge to merely exceed your risk premium. It must exceed your risk premium plus the transaction cost. Risk premium is defined as a fraction of your net worth but transaction costs are often constant. If you have lots of money then you can place larger bets while keeping your risk premium constant. This is one of the reasons hedge funds like having large war chests. Larger funds can harvest risk-adjusted returns from smaller edges.

Getting an Edge

The only free lunch in finance is diversification. If you invest in two uncorrelated assets with equal edge then your risk goes down. This is the principle behind index funds. If you know you’re going to pick stocks with the skill of a monkey then you might as well maximize diversification by picking all the stocks. As world markets become more interconnected they become more correlated too. The more people invest in index funds, the less risk-adjusted return diversification buys you. Nevertheless, standard investment advice for most^[1] people is to invest in bonds and index funds. FEMA recommends you add food and water.

All of the above is baseline. Baseline rents you can extract by mindlessly owning the means of production is called beta $β$ . Earning money in excess of beta by beating the market is called alpha $α$ .

There are three ways to make a living in this business: be first, be smarter or cheat.

―John Tuld in Margin Call

You can be first by being fast or using alternative data. Spread Networks laid a $300 million fiber optic cable in close to a straight line from New York City to Chicago. Being fast is expensive. If you use your own satellites to predict crop prices then you can beat the market. Alternative data is expensive too.

If you want to cheat go listen to Darknet Diaries. Prison is expensive.

Being smart is cheap.

Science will not save you

Science [ideal] applies Occam’s Razor to distinguish good theories from bad. Science [experimental] is the process of shooting a firehose of facts at hypotheses until only the most robust survive. Science [human institution] works when you have lots of new data coming in. If the data dries up then science [human institution] stops working. Lee Smolin asserts this has happened to theoretical physics.

If you have two competing hypotheses with equal prior probability then you need one bit of entropy to determine which one is true. If you have four competing hypotheses with equal prior probability then you need two bits of entropy to determine which one is true. I call your prior probability weighted set of competing hypotheses a hypothesis space. To determine which hypothesis in the hypothesis space is true you need training data. The entropy of your training data must exceed the entropy of your hypothesis space.

The entropy of $n$ competing hypotheses with equal prior probability is $log n$ . Suppose your training dataset has entropy $T$ . The number of competing hypotheses you can handle grows exponentially as a function of $T$ .

$\begin{matrix} log n & = & T n & = & e^{T} \end{matrix}$

The above equation only works if all the variables in each hypothesis are hard-coded. A hypothesis $y = 2.2 x + 3.1$ counts as a separate hypothesis from $y = 2.1 x + 3.1$ .

A hypothesis can instead use tunable parameters. Tunable parameters eat up the entropy of our training data fast. You can measure the entropy of a hypothesis by counting how many tunable parameters it has. A one-dimensional linear model $y = a x + b$ has two tunable parameters. A one-dimensional quadratic $y = a x^{2} + b x + c$ model has three tunable parameters. A one-dimensional cubic model $y = a x^{3} + b x^{2} + c x + d$ has four tunable parameters. Suppose each tunable parameter has $e$ bits of entropy. The total entropy needed to collapse a hypothesis space with $m$ tunable parameters equals $m$ . The entropy of a hypothesis space with $m$ tunable parameters equals $m$ .

We can combine these equations. Suppose your hypothesis space has $n$ separate hypotheses each with $m$ tunable parameters. The total entropy $J$ equals the entropy necessary to distinguish hypotheses from each other plus the entropy necessary to tune a hypothesis’s parameters.

$J = m + log n$

Logarithmic functions grow slower than linear functions. The number of hypotheses $n$ is inside the logarithm. The number of tunable parameters $m$ is outside of it. The entropy of our hypothesis space is dominated by $m$ . The number of competing hypotheses we can distinguish grows exponentially slower than the entropy of our training data. You can distinguish competing hypotheses from each other by throwing training data at a problem if they have few tunable parameters. If you have tunable parameters then the entropy required to collapse your hypothesis space goes up fast.

If you have lots of entropy in your training data then you can train a high-parameter model. Silicon Valley gets away with using high-parameter models to run its self-driving cars and image classifiers because it is easy to create new data. There is so much data available that Silicon Valley data scientists focus their attention on compute efficiency.

Wall Street is the opposite. Quants are bottlenecked by training data entropy.

Past performance is not indicative of future results

If you are testing a drug, training a self driving car or classifying images then past performance tends to be indicative of future results. If you are examining financial data then past performance is not indicative of future results. Consider a financial bubble. The price of tulips goes up. It goes up some more. It keeps going up. Past performance indicates the price ought to keep going up. Yet buying into a bubble has negative expected return.

Wikipedia lists 25 economic crises in the 20th century plus 20 in the 21st century to date for a total of 45. Financial crises matter. Hedge funds tend to be highly leveraged. A single crisis can wipe out a firm. If a strategy cannot ride out financial crises then it is unviable. Learning from your mistakes does not work if you do not survive your mistakes.

When Tesla needs more training data to train its self-training cars they can drive more cars around. If a hedge fund needs 45 more financial crises to train its model then they have to wait a century. World conditions change. Competing actors respond to the historical data. New variables appear faster than new training data. You cannot predict financial crises just by waiting for more training data because the entropy of your hypothesis space outraces the entropy of your training data.

You cannot predict a once-in-history event by applying a high-parameter model to historical data alone.

↩︎
If your government subsidizes mortgages or another kind of investment then you may be able to beat the market.

Why quantitative finance is so hard

Getting an Edge

Science will not save you

Past performance is not indicative of future results