Will Duncan comments on models have some pretty funny attractor states

Will Duncan 22 Feb 2026 14:35 UTC
1 point
0
I wonder if you could use the speed at which models converge on a degenerate attractor as a training signal. The more turns it takes for a model to reach some degenerate attractor measures how much coherent diversity remains in the distribution. Look at what happens between models: cross-model conversations produce emergent complexity before eventual convergence, but mirrored conversations degenerate rapidly.

What one could do is run one of these attractor experiments every so many steps during fine-tuning to detect how robust the model is to degenerate stimulus. Mirror conversations would detect models’ internal diversity and conceptual landscape. Heterogeneous conversations would measure how well models play with others. The OLMo RL checkpoints already show this signal implicitly, early RL steps produce rich diverse content while late steps collapse to zen. Changing hyperparameters during the training process in-line with this signal would allow you to increase their robustness.