In the wider sense, MML still works on the dataset {stock prices, newspapers, market fear}. Regardless of what work has presently been done to compress newspapers and market fear, if your hypothesis is efficient then you can produce the stock price data for a very low marginal message length cost.
You’d write up the hypothesis as a compressor-of-data; the simplest way being to produce a distribution over stock prices and apply arithmetic coding, though in practice you’d tweak whatever state of the art compressors for stock prices exist.
Of course the side effect of this is that your code references more data, and will likely need longer internal identifiers on it, so if you just split the cost of code across the datasets being compressed, you’d punish the compressors of newspapers and market fear. I would suggest that the solution is to deploy shapely value, with the value being the number of bits saved overall by a single compressor working on all the data sets in a given pool of cooperation.
In the wider sense, MML still works on the dataset {stock prices, newspapers, market fear}. Regardless of what work has presently been done to compress newspapers and market fear, if your hypothesis is efficient then you can produce the stock price data for a very low marginal message length cost.
You’d write up the hypothesis as a compressor-of-data; the simplest way being to produce a distribution over stock prices and apply arithmetic coding, though in practice you’d tweak whatever state of the art compressors for stock prices exist.
Of course the side effect of this is that your code references more data, and will likely need longer internal identifiers on it, so if you just split the cost of code across the datasets being compressed, you’d punish the compressors of newspapers and market fear. I would suggest that the solution is to deploy shapely value, with the value being the number of bits saved overall by a single compressor working on all the data sets in a given pool of cooperation.