There’s a minor error in the formula giving the cross entropy: you need a minus sign on the RHS so that it reads E[- log P[X|M_2] | M_2]
The preceding text is
“Of course, we could be wrong about the distribution—we could use a code optimized for a model M2 which is different from the “true” model M1. In this case, the average number of bits used will be”
There’s a minor error in the formula giving the cross entropy: you need a minus sign on the RHS so that it reads E[- log P[X|M_2] | M_2]
The preceding text is “Of course, we could be wrong about the distribution—we could use a code optimized for a model M2 which is different from the “true” model M1. In this case, the average number of bits used will be”
Fixed. Good catch, thanks!