There’s probably also a weight decay side of the story. In training on real text, the real text is always pulling the weights up against weight regularization. Train on your own text and instead of pulling up it’s pulling you to where you already are, so there should be some “sag.”
There’s probably also a weight decay side of the story. In training on real text, the real text is always pulling the weights up against weight regularization. Train on your own text and instead of pulling up it’s pulling you to where you already are, so there should be some “sag.”