# paulfchristiano comments on A possible training procedure for human-imitators

• Actually I’m not sure exactly what you mean by importance sampling here.

The variational lower bound would be to draw samples from and compute . The log probability of the output under is bounded by the expectation of this quantity (with equality iff is the correct conditional distribution over ).

I’m just going to work with this in my other comments, I assume it amounts to the same thing.

• What I mean is: compute , which is a probabilistic lower bound on .

The variational score gives you a somewhat worse lower bound if is different from . Due to Jensen’s inequality,

It probably doesn’t make a huge difference either way.