Could you defend worst-case reasoning a little more? Worst cases can be arbitrarily different from the average case—so maybe having worst-case guarantees can be reassuring, but actually choosing policies by explicit reference to the worst case seems suspicious. (In the human context, we might suppose that worst case, I have a stroke in the next few seconds and die. But I’m not in the business of picking policies by how they do in that case.)
You might say “we don’t have an average case,” but if there are possible hypotheses outside your considered space you don’t have the worst case either—the problem of estimating a property of a non-realizable hypothesis space is simplified, but not gone.
Anyhow, still looking forward to working my way through this series :)
Infra-Bayesianism doesn’t consider the worst case, since, even though each hypothesis is treated using the maximin decision rule, there is still a prior over many hypotheses[1]. One such hypothesis can upper bound the probability you will get a stroke in the next few seconds. An infra-Bayesian agent would learn this hypothesis and plan accordingly.
We might say that infra-Bayesianism assumes the worst only of that which is not only unknown but unknowable. To make a somewhat informal analogy with logic, we assume the worst model of the theory and thereby make any gain that can be gained provably.
One justification often given for Solomonoff induction is: we live in a simple universe. However, Solomonoff induction is uncomputable, so a simple universe cannot contain it. Instead, it might contain something like bounded Solomonoff induction. However, in order to justify bounded Solomonoff induction, we would need to assume that the universe is simple and cheap, which is false. In other words, postulating an “average-case” entails postulating a false dogmatic belief. Bounded “infra-Solomonoff” induction solves the problem by relying instead on the following assumption: the universe has some simple and cheap properties that can be exploited.
Like in the Bayesian case, you can alternatively think of the prior as just a single infradistribution, which is the mixture of all the hypotheses it is comprised of. This is an equivalent view.
Could you defend worst-case reasoning a little more? Worst cases can be arbitrarily different from the average case—so maybe having worst-case guarantees can be reassuring, but actually choosing policies by explicit reference to the worst case seems suspicious. (In the human context, we might suppose that worst case, I have a stroke in the next few seconds and die. But I’m not in the business of picking policies by how they do in that case.)
You might say “we don’t have an average case,” but if there are possible hypotheses outside your considered space you don’t have the worst case either—the problem of estimating a property of a non-realizable hypothesis space is simplified, but not gone.
Anyhow, still looking forward to working my way through this series :)
Infra-Bayesianism doesn’t consider the worst case, since, even though each hypothesis is treated using the maximin decision rule, there is still a prior over many hypotheses[1]. One such hypothesis can upper bound the probability you will get a stroke in the next few seconds. An infra-Bayesian agent would learn this hypothesis and plan accordingly.
We might say that infra-Bayesianism assumes the worst only of that which is not only unknown but unknowable. To make a somewhat informal analogy with logic, we assume the worst model of the theory and thereby make any gain that can be gained provably.
One justification often given for Solomonoff induction is: we live in a simple universe. However, Solomonoff induction is uncomputable, so a simple universe cannot contain it. Instead, it might contain something like bounded Solomonoff induction. However, in order to justify bounded Solomonoff induction, we would need to assume that the universe is simple and cheap, which is false. In other words, postulating an “average-case” entails postulating a false dogmatic belief. Bounded “infra-Solomonoff” induction solves the problem by relying instead on the following assumption: the universe has some simple and cheap properties that can be exploited.
Like in the Bayesian case, you can alternatively think of the prior as just a single infradistribution, which is the mixture of all the hypotheses it is comprised of. This is an equivalent view.