Thanks for trying this! I wonder if this is making things worse in a similar way to top-k. The C-tokenizer makes it very likely that “c” is always in the top 200 tokens. I wonder if it’s also ensuring that it’s rarely sufficiently uncertain to be filtered by this scoring rule?
Thanks for trying this! I wonder if this is making things worse in a similar way to top-k. The C-tokenizer makes it very likely that “c” is always in the top 200 tokens. I wonder if it’s also ensuring that it’s rarely sufficiently uncertain to be filtered by this scoring rule?