Very much agreed on the metric and measure. Finetuning, with the correct meat parameters, approximates Bayesian reasoning (but is generally done with meta parameters which have the effect of weighting the evidence-per-document of the new data higher than the pretraining set, unless you mix pretraining data in to it to reduce catastrophic forgetting). Thus it can change the model’s mind, but small changes are easier than large changes, and thus theories that were already fairly plausible are easier then ones that we previously highly disfavoured. I thin k it’s useful to think in terms of “Roughly how many bit of Bayesian evidence would it take to raise the model’s prior for a theory to a particular level”.
Very much agreed on the metric and measure. Finetuning, with the correct meat parameters, approximates Bayesian reasoning (but is generally done with meta parameters which have the effect of weighting the evidence-per-document of the new data higher than the pretraining set, unless you mix pretraining data in to it to reduce catastrophic forgetting). Thus it can change the model’s mind, but small changes are easier than large changes, and thus theories that were already fairly plausible are easier then ones that we previously highly disfavoured. I thin k it’s useful to think in terms of “Roughly how many bit of Bayesian evidence would it take to raise the model’s prior for a theory to a particular level”.
I’m currently working on a post on this.