Generate an image randomly with each pixel black with 51% chance and white with 49% chance, independently. The most likely image? Totally black. But virtually all the probability mass is on images which are ~49% white. Adding correlations between neighbouring pixels (or, in 1D, correlations between time series events) doesn’t remove this problem, despite what you might assume.
The core problem is that the mode of a high-dimensional probability distribution is typically degenerate. (Aside, it also causes problems for parameter estimation of unnormalized energy-based models, an extremely broad class, because you should sample from them to normalize; maximum probability estimates can be dangerous.)
Statistical mechanics points to the solution: knowing the most likely microstate of a box of particles doesn’t tell you anything; physicists care about macrostates, which are observables. You define a statistic (any function of the data, which somehow summarizes it) which you actually care about, and then take the mode of that. For example, number of breakthrough discoveries by time t.
Generate an image randomly with each pixel black with 51% chance and white with 49% chance, independently. The most likely image? Totally black. But virtually all the probability mass is on images which are ~49% white. Adding correlations between neighbouring pixels (or, in 1D, correlations between time series events) doesn’t remove this problem, despite what you might assume.
The core problem is that the mode of a high-dimensional probability distribution is typically degenerate. (Aside, it also causes problems for parameter estimation of unnormalized energy-based models, an extremely broad class, because you should sample from them to normalize; maximum probability estimates can be dangerous.)
Statistical mechanics points to the solution: knowing the most likely microstate of a box of particles doesn’t tell you anything; physicists care about macrostates, which are observables. You define a statistic (any function of the data, which somehow summarizes it) which you actually care about, and then take the mode of that. For example, number of breakthrough discoveries by time t.