Gurkenglas comments on Preface to the sequence on value learning

Gurkenglas 31 Oct 2018 11:18 UTC
LW: 5 AF: 1
0
AF

“the parameters of a model are chosen to maximize the log probability that the model assigns to the observed dataset”

log is a monotonous function, so how does this differ from choosing parameters to maximize the probability?
- Scott Garrabrant 1 Nov 2018 0:00 UTC
  LW: 12 AF: 5
  0
  AF Parent
  I don’t think this is relevant, but there are theoretical uses for maximizing expected log probability, and maximizing expected log probability is not the same as maximizing expected probability, since they interact with the expectation differently.
- Ofer 31 Oct 2018 14:18 UTC
  LW: 9 AF: 3
  0
  AF Parent
  If you have lots of training data, the probability that the model assigns the training data is very small. You can’t represent such small numbers with the commonly used floating point types in Python/Java/etc.
  It’s more practical to compute the log probability of the training data (by summing the log probabilities assigned to the training examples rather than multiplying the original probabilities).
- Rohin Shah 31 Oct 2018 20:57 UTC
  LW: 8 AF: 5
  0
  AF Parent
  It doesn’t, modulo practical concerns that ofer brings up below. Also the math is often nicer in log space (since you have a sum over log probabilities of data points, instead of a product over probabilities of data points). But yes, formally they are equivalent.