Goutham Nalagatla

Karma: 12

Goutham Nalagatla 16 Jun 2026 22:37 UTC
3 points
0
in reply to: anaguma’s comment on: 1 Layer Induction Heads and Some Research
Good point.
At the data level, yes: because the sequence is periodic, an ideal algorithm could infer the block size and phase, then predict the rest from the first block. What I meant is that, for the transformer mechanisms I am testing, the useful operation still has to be implemented from context: the model must identify the earlier matching position or phase and use information from that earlier occurrence.
So the distinction I care about is not “copying” versus “period inference” as abstract algorithms, but which circuit the model actually uses. A pure positional/period shortcut should look different from an induction-style QK matching circuit, and that is why I also look at attention patterns, induction score, and ablations.

1 Layer Induction Heads and Some Research

Goutham Nalagatla and Carlos Guerrero Alvarez

16 Jun 2026 18:09 UTC

10 points

2 comments14 min readLW link

Goutham Nalagatla 2 Jun 2026 2:17 UTC
2 points
0
on: Induction heads—illustrated
The First diagram really helped me understand the concept of K-composition. Infact composition of heads seems to be responsible for induction heads rather than the 2 layers themselves which many articles seem to claim.