Rohin Shah comments on Teaching ML to answer questions honestly instead of predicting human answers

Rohin Shah 27 Jul 2021 12:25 UTC
LW: 2 AF: 2
0
AF
Some confusions I have:
Why does $θ_{1}$ need to include part of the world model? Why not instead have $θ_{1}$ be the parameters of the two heads, and $θ_{2}$ be the parameters of the rest of the model?
This would mean that you can’t initialize $θ_{2}$ to be equal to $θ_{1}$ , but I don’t see why that’s necessary in the first place—in particular it seems like the following generative model should work just fine:
$P (θ_{1}) \propto exp (- | | θ_{1} - θ_{1, i n i t} | |^{2})$
$P (θ_{2} ∣ θ_{1}) \propto exp (- λ C (θ_{1}, θ_{2}) - | | θ_{2} - θ_{2, i n i t} | |^{2})$
(I’ll be thinking of this setup for the rest of my comment, as it makes more sense to me)
When differentiating the consistency test C we should treat the intended head as fixed rather than differentiating through it. This removes SGD’s incentive to achieve consistency by e.g. making sure the world is simple and so all questions have simple answers.
Hmm, why is this necessary? It seems like the whole point of $L (θ_{2})$ is to ensure that you have to learn a detailed world model that gets you the right answers. I guess as $λ \to \infty$ , that doesn’t really help you, but really you shouldn’t have $λ \to \infty$ because you shouldn’t expect to be able to have $C (θ_{1}, θ_{2}) \to 0$ .
(Also, shouldn’t that be $L (θ_{1}, θ_{2})$ , since it is $θ_{1}$ and $θ_{2}$ together that compute answers to questions?)