Perhaps in a bizarre twist of fate, GPT learns similarly to many young humans or even adult learners on new languages: by using nursery rhymes.
Edit: to add, I wonder if the is used as a vector/index of relative indirection. That would mean clusters of meaning around whatever level of the is being used.
In the end all language could be a 1 op instruction just compressed with a distance function from the root position. Almost like a MOV based cpu—perhaps transport mov based with some ‘aliasing’ (mov pointing to N long mov). Also to be maximum efficient for compressibility and durability there would seem to exist something like this, as it appears this is what genes do.
Perhaps related to this mother goose nursery counting rhyme?
https://en.wikipedia.org/wiki/One,_Two,_Three,_Four,_Five
Perhaps in a bizarre twist of fate, GPT learns similarly to many young humans or even adult learners on new languages: by using nursery rhymes.
Edit: to add, I wonder if the is used as a vector/index of relative indirection. That would mean clusters of meaning around whatever level of the is being used.
In the end all language could be a 1 op instruction just compressed with a distance function from the root position. Almost like a MOV based cpu—perhaps transport mov based with some ‘aliasing’ (mov pointing to N long mov). Also to be maximum efficient for compressibility and durability there would seem to exist something like this, as it appears this is what genes do.