Can I get an example? Say, X is a random positive real number. For which distribution which parameters that maximize E(X) will not maximize E(log(X))?
That is exactly what the Kelly criterion provides examples of. Let p be the probability of winning some binary bet and k the multiple of your bet that is returned to you if you win. Given an initial bankroll of 1, let theta be the proportion of it you are going to bet. Let the distribution of your bankroll after the bet be X. With probability p, X is 1+theta(k-1), and with probability 1-p, X is 1-theta. theta is a parameter of this distribution. (So are p and k, but we are interested in maximising over theta for given p and k.)
If pk > 1 then theta = 1 maximises E(X), but theta = (pk-1)/(k-1) maximises E(log(X)).
The graphs of E(X) and E(log(X)) as functions of theta look nothing like each other. The first is a linear ascending gradient, and the second rises to a maximum and then plunges to -∞.
May have gotten confused because log is monotonically increasing e.g. log likelihood maximized at the same spot as likelihood. So log E(X) is maximized at the same spot as E(X). But log and E do not commute (Jensen’s inequality is not called Jensen’s equality, after all).
That is exactly what the Kelly criterion provides examples of. Let p be the probability of winning some binary bet and k the multiple of your bet that is returned to you if you win. Given an initial bankroll of 1, let theta be the proportion of it you are going to bet. Let the distribution of your bankroll after the bet be X. With probability p, X is 1+theta(k-1), and with probability 1-p, X is 1-theta. theta is a parameter of this distribution. (So are p and k, but we are interested in maximising over theta for given p and k.)
If pk > 1 then theta = 1 maximises E(X), but theta = (pk-1)/(k-1) maximises E(log(X)).
The graphs of E(X) and E(log(X)) as functions of theta look nothing like each other. The first is a linear ascending gradient, and the second rises to a maximum and then plunges to -∞.
Yep, I was wrong. Now I need to figure out why I thought I was right..
May have gotten confused because log is monotonically increasing e.g. log likelihood maximized at the same spot as likelihood. So log E(X) is maximized at the same spot as E(X). But log and E do not commute (Jensen’s inequality is not called Jensen’s equality, after all).
Was probably part of it—I think the internal cheering for the wrong position included the words “But log likelihood!” :-/