paulfchristiano comments on A possible training procedure for human-imitators

paulfchristiano 18 Mar 2016 4:03 UTC
0 points
AF
Actually I’m not sure exactly what you mean by importance sampling here.

The variational lower bound would be to draw samples from $q$ and compute $l o g (p / q)$ . The log probability of the output under $p$ is bounded by the expectation of this quantity (with equality iff $q$ is the correct conditional distribution over $A$ ).

I’m just going to work with this in my other comments, I assume it amounts to the same thing.
- jessicata 19 Mar 2016 19:47 UTC
  0 points
  AF Parent
  What I mean is: compute ${^E}_{q} [[f (A) = x] p (A) / q (A)]$ , which is a probabilistic lower bound on $P_{p} (f (A) = x)$ .
  
  The variational score gives you a somewhat worse lower bound if $q$ is different from $p (A | f (A) = x)$ . Due to Jensen’s inequality, $E_{q} [log ([f (A) = x] p (A) / q (A))] \leq log E_{q} [[f (A) = x] p (A) / q (A)] \leq log P_{p} (f (A) = x)$
  
  It probably doesn’t make a huge difference either way.