Thanks for your detailed reply. (And sorry I couldn’t format the below well—I don’t seem to get any formatting options in my browser.)
“It is rarely too difficult to specify the true model...this means that “every member of the set of models available to us is false” need not hold”
I agree we could find a true model to explain the economy, climate etc. (presumably the theory of everything in physics). But we don’t have the computational power to make predictions of such systems with that model—so my question is about how should we make predictions when the true model is not practically applicable? By “the set of models available to us”, I meant the models we could actually afford to make predictions with. If the true model is not in that set, then it seems to be that all of these models must be false.
‘”different processes may become important in future” is not actually a problem for Ockham’s razor per se. That’s a problem for causal models’
To take the climate example, say scientists had figured out that there were a biological feedback that kicks in once global warming has gone past 2C (e.g. bacteria become more efficient at decomposing soil and releasing CO2). Suppose you have one model that includes a representation of that feedback (e.g. as a subprocess) and one that does not but is equivalent in every other way (e.g. is coded like the first model but lacks the subprocess). Then isn’t the second model simpler according to metrics like the minimum description length, so that it would be weighted higher if we penalised models using such metrics? But this seems the wrong thing to do, if we think the first model is more likely to give a good prediction.
Now the thought that occurred to me when writing that is that the data the scientists used to deduce the existence of the feedback ought to be accounted for by the models that are used, and this would give low posterior weight to models that don’t include the feedback. But doing this in practice seems hard. Also, it’s not clear to me if there would be a way to tell between models that represent the process but don’t connect it properly to predicting the climate e.g. they have a subprocess that says more CO2 is produced by bacteria at warming higher than 2C, but then don’t actually add this CO2 to the atmosphere, or something.
“likelihoods are never actually zero, they’re just very small”
If our models were deterministic, then if they were not true, wouldn’t it be impossible for them to produce the observed data exactly, so that the likelihood of the data given any of those models would be zero? (Unless there was more than one process that could give rise to the same data, which seems unlikely in practice.) Now if we make the models probabilistic and try to design them such that there is a non-zero chance that the data would be a possible sample from the model, then the likelihood can be non-zero. But it doesn’t seem necessary to do this—models that are false can still give predictions that are useful for decision-making. Also, it’s not clear if we could make a probabilistic model that would have non-zero likelihoods for something as complex as the climate that we could run on our available computers (and that isn’t something obviously of low value for prediction like just giving probability 1/N to each of N days of observed data). So it still seems like it would be valuable to have a principled way of predicting using models that give a zero likelihood of the data.
“the central challenge is to find rigorous approximations of the true underlying models. The main field I know of which studies this sort of problem directly is statistical mechanics, and a number of reasonably-general-purpose tools exist in that field which could potentially be applied in other areas (e.g. this).”
Yes I agree. Thanks for the link—it looks very relevant and I’ll check it out. Edit—I’ll just add, echoing part of my reply to Kenny’s answer, that whilst statistical averaging has got human modellers a certain distance, adding representations of processes whose effects get missed by the averaging seems to add a lot of value (e.g. tropical thunderstorms in the case of climate). So there seems to be something additional to averaging that can be used, to do with coming up with simplified models of processes you can see are missed out by the averaging.
On causality, whilst of course correcting this is desirable, if the models we can afford to compute with can’t reproduce the data, then presumably they are also not reproducing the correct causal graph exactly? And any causal graph we could compute with will not be able to reproduce the data? (Else it would seem that a causal graph could somehow hugely compress the true equations without information loss—great if so!)
OK, I made some edits. I left the “rational” in the last paragraph because it seemed to me to be the best word to use there.