Yes, absolutely, but I don’t expect algorithms to be implemented in separable chunks the way a human would do it. Comparing frequencies of various words just needs an early attention head with broad attention. But such an attention head will also be recruited to do other things, not just faithfully pass on the sum of its inputs, and so you’d never literally find TF-IDF.
Yes, absolutely, but I don’t expect algorithms to be implemented in separable chunks the way a human would do it. Comparing frequencies of various words just needs an early attention head with broad attention. But such an attention head will also be recruited to do other things, not just faithfully pass on the sum of its inputs, and so you’d never literally find TF-IDF.