It might clarify things to note the connection between Kullback-Leibler divergence and communication theory. The Kullback-Leibler divergence is the utility function to use when minimizing the expected length of the signal encoding (i.e, recording or communicating) what actually happened. The choice of “1/2” or “0″ is equivalent to to constraining the agent to choose between using one bit or or an infinite amount of bits to record/communicate the state of “improbable event did (not) occur”.
In short, KL divergence isn’t about truth-seeking per se. It’s about the resources necessary to encode signals—definitely an instrumental question.
It might clarify things to note the connection between Kullback-Leibler divergence and communication theory. The Kullback-Leibler divergence is the utility function to use when minimizing the expected length of the signal encoding (i.e, recording or communicating) what actually happened. The choice of “1/2” or “0″ is equivalent to to constraining the agent to choose between using one bit or or an infinite amount of bits to record/communicate the state of “improbable event did (not) occur”.
In short, KL divergence isn’t about truth-seeking per se. It’s about the resources necessary to encode signals—definitely an instrumental question.