J Bostock comments on porby’s Shortform

J Bostock 5 Oct 2025 23:23 UTC
6 points
2
The presence of induction heads would be a decent answer. If you show a model …[A][B]...[A] It will often complete it with [B]. One specific mechanism which can cause this is well-studied, but the general reason is that this pattern is really common in lots of different types of pretraining data.