Since model M has much of P’s distribution’s probability mass, P(d) is approximately equal to the probability of M if M computes d (call this M→d), and zero otherwise.
I found this sentence confusing.
By the total probability rule, we can say that the probability of the data being seen in the future is the sum of these two numbers:
The probability that model M is true and that the data is seen.
The probability that model M is false and that the data is seen.
If we assume for the sake of argument that the probability of model M being true is 1, then the second number becomes zero. By the definition of conditional probability, the first number is the product of the probability of the data being seen given that model M is true and the probability that model M is true. Since we are assuming that the probability of model M being true is one, this reduces to the probability of the data being seen given that model M is true.
So could we rewrite that sentence as
Since model M has much of P’s distribution’s probability mass, P(d) is approximately equal to the probability of d given M.
Or
Since model M has much of P’s distribution’s probability mass, P(d) is approximately equal to the probability of M computing d.
In any case, it seems to me that the formalism does not add much here, and you could communicate the same idea by saying something like
The goal of the predictor is to make predictions that come true. However, the predictor must take into account the effect that sharing its predictions has on the course of events. Therefore, to make accurate predictions, the predictor must make predictions that continue to be true even after they are shared.
In particular, at any given time it will make the prediction that has the greatest likelihood of continuing to be true even after it is shared, in order to best achieve its goal. This will lead to the predictor behaving somewhat like the Oracle in Greek mythology, making whatever prophecy is maximally self-fulfilling, sometimes with disastrous results.
The goal of the predictor is to make predictions that come true.
It isn’t clear that this applies to prediction systems like Solomonoff induction or Levin search—which superficially do not appear to be goal-directed.
This will lead to the predictor behaving somewhat like the Oracle in Greek mythology, making whatever prophecy is maximally self-fulfilling, sometimes with disastrous results.
The oracle delivered to Oedipus what is often called a “self-fulfilling prophecy”, in that the prophecy itself sets in motion events that conclude with its own fulfilment.
I found this sentence confusing.
By the total probability rule, we can say that the probability of the data being seen in the future is the sum of these two numbers:
The probability that model M is true and that the data is seen.
The probability that model M is false and that the data is seen.
If we assume for the sake of argument that the probability of model M being true is 1, then the second number becomes zero. By the definition of conditional probability, the first number is the product of the probability of the data being seen given that model M is true and the probability that model M is true. Since we are assuming that the probability of model M being true is one, this reduces to the probability of the data being seen given that model M is true.
So could we rewrite that sentence as
Or
In any case, it seems to me that the formalism does not add much here, and you could communicate the same idea by saying something like
It isn’t clear that this applies to prediction systems like Solomonoff induction or Levin search—which superficially do not appear to be goal-directed.
I found an example:
It is also discussed here.