It seems to me that an optimization measure should take into account “what would have happened anyway”. Like, suppose your list of outcomes is:
A) the laws of physics continue unbroken
B) the laws of physics are broken in a first way
C) the laws of physics are broken in a second way
...
add a zillion ways to break the laws of physics
Obviously anything ordering the list of outcomes in this way is super good at optimization! (/s)
Therefore, I think that any measure of optimization should be relative to some sort of counterfactual prior on the outcomes—which might be, e.g. what would happen if you had no optimizer, or based on some predefined distribution over what could be in place of the optimizer in question. I think you should carefully keep track of what counterfactual prior you are measuring relative to.
My intuition here is apparently the opposite of Garrett Baker’s. But, I have two things to say on this:
depending on the situation, you might be able to have some kind of “objective” counterfactual prior—like maybe you know that all outcomes are equally possible and can use the uniform prior.
I really think you can’t do without this, and if you try doing without this, you’re just basically going to end up with something equivalent to some choice of counterfactual prior, while pretending it is “objective”—like Bayes vs. frequentism.
If you also want the definition to be based on utility rather than on ordinal rankings, I don’t think it makes sense to measure it in bits. If you want it to take into account utility, you’re going to want some kind of smooth relation to the utility numbers which is in general going to be incompatible with the bits idea because your distributions of outcomes in utility could have any sort of weird shape.
So, I suggest, if you do want a utility-based definition, to just throw out the bits idea entirely and just use utility and a counterfactual prior and make it linear in utility.
A couple of suggestions for measuring optimization relative to a utility function and a counterfactual prior:
The number of standard deviations of improvement in utility obtained relative to the counterfactual prior. This might be a more useful measure for “weak” optimizers where the whole distribution is relevant. Requires that the standard deviation of u under the counterfactual prior is well defined.
If u is bounded above, you can have an optimization measure linear in utility that maxes at 1 if you obtain maximum utility (and optimization is 0 for the counterfactual prior of course). This might be a more relevant measure for “strong” optimizers that we expect to get close to maximum utility, but requires that u is bounded above, and that the counterfactual prior obtains a well-defined expected utility (i.e. not minus infinity).
It seems to me that an optimization measure should take into account “what would have happened anyway”. Like, suppose your list of outcomes is:
A) the laws of physics continue unbroken
B) the laws of physics are broken in a first way
C) the laws of physics are broken in a second way
...
add a zillion ways to break the laws of physics
Obviously anything ordering the list of outcomes in this way is super good at optimization! (/s)
Therefore, I think that any measure of optimization should be relative to some sort of counterfactual prior on the outcomes—which might be, e.g. what would happen if you had no optimizer, or based on some predefined distribution over what could be in place of the optimizer in question. I think you should carefully keep track of what counterfactual prior you are measuring relative to.
My intuition here is apparently the opposite of Garrett Baker’s. But, I have two things to say on this:
depending on the situation, you might be able to have some kind of “objective” counterfactual prior—like maybe you know that all outcomes are equally possible and can use the uniform prior.
I really think you can’t do without this, and if you try doing without this, you’re just basically going to end up with something equivalent to some choice of counterfactual prior, while pretending it is “objective”—like Bayes vs. frequentism.
If you also want the definition to be based on utility rather than on ordinal rankings, I don’t think it makes sense to measure it in bits. If you want it to take into account utility, you’re going to want some kind of smooth relation to the utility numbers which is in general going to be incompatible with the bits idea because your distributions of outcomes in utility could have any sort of weird shape.
So, I suggest, if you do want a utility-based definition, to just throw out the bits idea entirely and just use utility and a counterfactual prior and make it linear in utility.
A couple of suggestions for measuring optimization relative to a utility function and a counterfactual prior:
The number of standard deviations of improvement in utility obtained relative to the counterfactual prior. This might be a more useful measure for “weak” optimizers where the whole distribution is relevant. Requires that the standard deviation of u under the counterfactual prior is well defined.
If u is bounded above, you can have an optimization measure linear in utility that maxes at 1 if you obtain maximum utility (and optimization is 0 for the counterfactual prior of course). This might be a more relevant measure for “strong” optimizers that we expect to get close to maximum utility, but requires that u is bounded above, and that the counterfactual prior obtains a well-defined expected utility (i.e. not minus infinity).