KV cache
Seems out of place in the list: as noted by Nostalgebraist, it was already implemented in the very first transformer in 2017
I know very little about this topic, but I was under the impression that there was more to it than “KV cache: yes or no?”, and I was trying to refer to that whole category of possible improvements. E.g. here’s a paper on “KV cache compression”.
Seems out of place in the list: as noted by Nostalgebraist, it was already implemented in the very first transformer in 2017
I know very little about this topic, but I was under the impression that there was more to it than “KV cache: yes or no?”, and I was trying to refer to that whole category of possible improvements. E.g. here’s a paper on “KV cache compression”.