The presence of induction heads would be a decent answer. If you show a model …[A][B]...[A] It will often complete it with [B]. One specific mechanism which can cause this is well-studied, but the general reason is that this pattern is really common in lots of different types of pretraining data.
The presence of induction heads would be a decent answer. If you show a model …[A][B]...[A] It will often complete it with [B]. One specific mechanism which can cause this is well-studied, but the general reason is that this pattern is really common in lots of different types of pretraining data.