Thank you for writing up! I’m still not sure I understand condensation. I would summarize as: instead of encoding the givens, we encode some latents which can be used to compute the set of possible answers to the givens (so we need a distribution over questions).
Also, the total cost of condensation has to be the at least the entropy of the answer distribution (generated by the probability distribution over questions, applied to the givens) because of Shannon’s bound.
I feel like if the optimal condensation setup is indeed 1 book per question, then it’s not a very good model of latent variables, no? But perhaps it’s going in the right direction.
The optimal condensation is not (typically) 1 book per question. Instead, it typically recovers the meaningful latents which you’d want to write down to model the problem. Really, the right thing to do is to work examples to get an intuition for what happens. Sam does some of this in his paper.
Thank you for writing up! I’m still not sure I understand condensation. I would summarize as: instead of encoding the givens, we encode some latents which can be used to compute the set of possible answers to the givens (so we need a distribution over questions).
Also, the total cost of condensation has to be the at least the entropy of the answer distribution (generated by the probability distribution over questions, applied to the givens) because of Shannon’s bound.
I feel like if the optimal condensation setup is indeed 1 book per question, then it’s not a very good model of latent variables, no? But perhaps it’s going in the right direction.
The optimal condensation is not (typically) 1 book per question. Instead, it typically recovers the meaningful latents which you’d want to write down to model the problem. Really, the right thing to do is to work examples to get an intuition for what happens. Sam does some of this in his paper.