I know very little about this topic, but I was under the impression that there was more to it than “KV cache: yes or no?”, and I was trying to refer to that whole category of possible improvements. E.g. here’s a paper on “KV cache compression”.
I know very little about this topic, but I was under the impression that there was more to it than “KV cache: yes or no?”, and I was trying to refer to that whole category of possible improvements. E.g. here’s a paper on “KV cache compression”.