If all happy families are alike, but each unhappy family is unhappy in its own way, then even if most families are unhappy the most common continuation will be the one type of happy family
Note that this is not true if you’re generating text from a base model at temperature one. The proportion of happy and unhappy families generated should match that in the training data. (This assumes training went reasonably well, of course, but it probably did.)
Now, people often use a temperature less than one. And few seem to realize that they are then biasing the generated text towards answers that it so happens can be expressed in only a few ways, and against answers that can be expressed in many different ways. Of course RLFH or whatever adds further biases...
Note that this is not true if you’re generating text from a base model at temperature one. The proportion of happy and unhappy families generated should match that in the training data. (This assumes training went reasonably well, of course, but it probably did.)
Now, people often use a temperature less than one. And few seem to realize that they are then biasing the generated text towards answers that it so happens can be expressed in only a few ways, and against answers that can be expressed in many different ways. Of course RLFH or whatever adds further biases...