Jesse Hoogland comments on Jesse Hoogland’s Shortform

Jesse Hoogland 28 Oct 2025 14:27 UTC
LW: 27 AF: 9
1
AF
TLDR: One model’s trauma is another’s enlightenment.
Why study model development? One reason is that training data can have the opposite effect on a model depending on when it is shown. So alignment is not just about training on the right data but training in the right order.
We just put out a new paper that explains how this can arise and provides some examples (mostly toy). This builds on our recent work introducing a new influence function technique (see my previous shortform). I thought I’d write up a quick note on how these papers are connected: Why does a generalization of influence functions imply influence functions changing over training?
Static view of development ⇔ classical influence functions. Assume regularity (a single, nondegenerate global minimum). This implies:
1. Development is a gradual convergence to the true parameters, controlled by the curvature (spectrum of the Hessian) around this point. For Bayesian learners, this follows from the Bernstein–von Mises theorem. For (stochastic) optimization, there are similar results, such as the Polyak-Ruppert averaging theorem.
2. The classical influence function is all you need to characterize influence. That is, the full Bayesian influence function (BIF) asymptotically reduces to just the classical influence function.
Dynamic view of development ⇔ Bayesian influence functions. Drop the regularity assumption (allow for a set of non-unique, degenerate minima). This implies:
1. Development is a stagewise succession of phase transitions, controlled by a tradeoff between loss and complexity. For Bayesian learners, this follows from Watanabe’s free energy formula and the singular learning process. For stochastic optimization, the theory is not yet developed enough to handle this regime.
2. The classical influence function is insufficient to characterize influence, even in the asymptotic limit. In this regime, the asymptotic equivalence of the BIF and the classical IF breaks down.
For more on how this plays out in real-world training, see the announcement thread for our new paper (reproduced partially below):
Training Data Attribution (TDA) should account for learning dynamics! The same data can influence model behavior in dramatically different ways at different time points of training. We call for a shift towards stagewise data attribution and the study of influence dynamics.

¹⁄₁₁
Influence changes over training – sometimes dramatically. The same data that helps learn general categories early in training can harm later specialization. Why is this?

²⁄₁₁
Read more