Possible takeaways from the “hypothesis space” principle, if I’m understanding it correctly:
Keep m as small as you can. Trading strategies should be as simple as reasonably possible, with just a few simple rules. The more complex you make it, the more likely you’re deceiving yourself by over-fitting to noise. More rules are more ways to mess up.
Most big-data techniques choke on this much noise. You can’t just throw a machine-learning algorithm at market data and expect it to find much. (The first time I tried this, the strategy it came up with was “HODL!”)
Make n as big as you can. Gather as much data as you can. It can help to trade ensembles of securities if they show a similar inefficiency in their price data. You can profit from an edge like this even if you don’t know exactly where it is. (I’m currently trading a forex strategy this way.)
When backtesting a strategy, you’re trying to prove your hypothesis wrong, to see if perturbations make the “edge” disappear, not trying to optimize on past data. Any monkey can overfit to noise and make a backtest look good.
Pairs, baskets or index funds seem to be easier to trade profitably than their individual components.
Yes. This bit is counterintuitive—especially after we have established that you want to keep m as small as possible. By making n as big as you can you are riding the profitable end of a logarithmic curve. By increasing m you are crashing into a combinatorial explosion.
Possible takeaways from the “hypothesis space” principle, if I’m understanding it correctly:
Keep m as small as you can. Trading strategies should be as simple as reasonably possible, with just a few simple rules. The more complex you make it, the more likely you’re deceiving yourself by over-fitting to noise. More rules are more ways to mess up.
Most big-data techniques choke on this much noise. You can’t just throw a machine-learning algorithm at market data and expect it to find much. (The first time I tried this, the strategy it came up with was “HODL!”)
Make n as big as you can. Gather as much data as you can. It can help to trade ensembles of securities if they show a similar inefficiency in their price data. You can profit from an edge like this even if you don’t know exactly where it is. (I’m currently trading a forex strategy this way.)
When backtesting a strategy, you’re trying to prove your hypothesis wrong, to see if perturbations make the “edge” disappear, not trying to optimize on past data. Any monkey can overfit to noise and make a backtest look good.
Pairs, baskets or index funds seem to be easier to trade profitably than their individual components.
Yes. This bit is counterintuitive—especially after we have established that you want to keep m as small as possible. By making n as big as you can you are riding the profitable end of a logarithmic curve. By increasing m you are crashing into a combinatorial explosion.