Perhaps related to this mother goose nursery counting rhyme?
One, two, three, four, five,
Once I caught a fish alive.
Six, seven, eight, nine, ten,
Then I let it go again.
Why did you let it go,
Because he bit my finger so!
Which finger did it bite?
This little finger on my right!
https://en.wikipedia.org/wiki/One,_Two,_Three,_Four,_Five
Perhaps in a bizarre twist of fate, GPT learns similarly to many young humans or even adult learners on new languages: by using nursery rhymes.
Edit: to add, I wonder if the is used as a vector/index of relative indirection. That would mean clusters of meaning around whatever level of the is being used.
In the end all language could be a 1 op instruction just compressed with a distance function from the root position. Almost like a MOV based cpu—perhaps transport mov based with some ‘aliasing’ (mov pointing to N long mov). Also to be maximum efficient for compressibility and durability there would seem to exist something like this, as it appears this is what genes do.
Isn’t this most programming jobs? Code by reference/example. Implement, get output, not understand intimately.