ESRogs comments on The Principle of Predicted Improvement

ESRogs 25 Apr 2019 2:35 UTC
16 points
0
E[P(H|D)]≥E[P(H)]
In English the theorem says that the probability we should expect to assign to the true value of H after observing the true value of D is greater than or equal to the expected probability we assign to the true value of H before observing the value of D.
I have a very basic question about notation—what tells me that H in the equation refers to the true hypothesis?
Put another way, I don’t really understand why that equation has a different interpretation than the conservation-of-expected-evidence equation: E[P(H=hi|D)]=P(H=hi).
In both cases I would interpret it as talking about the expected probability of some hypothesis, given some evidence, compared to the prior probability of that hypothesis.
- adrusi 25 Apr 2019 6:40 UTC
  24 points
  0
  Parent
  I also had trouble with the notation. Here’s how I’ve come to understand it:
  Suppose I want to know whether the first person to drive a car was wearing shoes, just socks, or no footwear at all when they did so. I don’t know what the truth is, so I represent it with a random variable $H$ , which could be any of “the driver wore shoes,” “the driver wore socks” or “the driver was barefoot.”
  This means that $P (H)$ is a random variable equal to the probability I assign to the true hypothesis (it’s random because I don’t know which hypothesis is true). It’s distinct from $P (H = h_{i})$ and $P (h_{i})$ which are both the same constant, non-random value, namely the credence I have in the specific hypothesis $h_{i}$ (i.e. “the driver wore shoes”).
  ( $P (H = h_{i})$ is roughly “the credence I have that ‘the driver wore shoes’ is true,” while $P (h_{i})$ is “the credence I have that the driver wore shoes,” so they’re equal, and semantically equivalent if you’re a deflationist about truth)
  Now suppose I find the driver’s great-great-granddaughter on Discord, and I ask her what she thinks her great-great-grandfather wore on his feet when he drove the car for the first time. I don’t know what her response will be, so I denote it with the random variable $D$ . Then $P (H | D)$ is the credence I assign to the correct hypothesis after I hear whatever she has to say.
  So $E (P (H = h_{i} | D)) = P (H = h_{i})$ is equivalent to $E (P (h_{i} | D)) = P (h_{i})$ and means “I shouldn’t expect my credence in ‘the driver wore shoes’ to change after I hear the great-great-granddaughter’s response,” while $E (P (H | D)) \geq E (P (H))$ means “I should expect my credence in whatever is the correct hypothesis about the driver’s footwear to increase when I get the great-great-granddaughter’s response.”
  I think there are two sources of confusion here. First, $H$ was not explicitly defined as “the true hypothesis” in the article. I had to infer that from the English translation of the inequality,
  In English the theorem says that the probability we should expect to assign to the true value of H after observing the true value of D is greater than or equal to the expected probability we assign to the true value of H before observing the value of D,
  and confirm with the author in private. Second, I remember seeing my probability theory professor use sloppy shorthand, and I initially interpreted $P (H)$ as a sloppy shorthand for $P (H = h_{i})$ . Neither of these would have been a problem if I were more familiar with this area of study, but many people are less familiar than I am.
- DanielFilan 25 Apr 2019 5:21 UTC
  9 points
  0
  Parent
  
  I have a very basic question about notation—what tells me that H in the equation refers to the true hypothesis?
  
  H stands for hypothesis. We’re taking expectations over our distribution over hypotheses: that is, expectations over which hypothesis is true.
  
  Put another way, I don’t really understand why that equation has a different interpretation than the conservation-of-expected-evidence equation: E[P(H=hi|D)]=P(H=hi).
  
  In the PPI inequality, the expectations are being taken over H and D jointly, in the CEE equation, the expectation is just being taken over D.
  - DanielFilan 25 Apr 2019 5:28 UTC
    8 points
    0
    Parent
    I should note that when I first saw the PPI inequality, I also didn’t get what it was saying, just because I had very low prior probability mass on it saying the thing it actually says. (I can’t quite pin down what generalisation or principle led to this situation, but there you go.)
- habryka 25 Apr 2019 2:41 UTC
  5 points
  0
  Parent
  Yeah, I have intuitively the same interpretation.
  My model is also that there is indeed lots of competing notational syntax in probability theory, and that some people would tell you that the current notation being used is invalid, or stands for something weird and meaningless. So I do think explaining the notation and the choice of notation in detail here is a good idea.
- Ronny Fernandez 25 Apr 2019 6:57 UTC
  4 points
  0
  Parent
  I honestly could not think of a better way to write it. I had the same problem when my friend first showed me this notation. I thought about using $" E [P (H = h_{t r u e})] "$ but that seemed more confusing and less standard? I believe this is how they write things in information theory, but those equations usually have logs in them.
  - DanielFilan 25 Apr 2019 18:36 UTC
    10 points
    2
    Parent
    Just to add an additional voice here, I would view that as incorrect in this context, instead referring to the thing that the CEE is saying. The way I’d try to clarify this would be to put the variables varying in the expectation in subscripts after the $E$ , so the CEE equation would look like $E_{D} [P (H = h_{i} | D)] = P (H = h_{i})$ , and the PPI inequality would be $E_{(H, D)} [P (H | D)] \geq E_{H} [P (H)]$ .
  - habryka 25 Apr 2019 17:26 UTC
    2 points
    0
    Parent
    Yeah, this is the one that I would have used.