Yeah characterizing the impact on current models would definitely be interesting.
I think the toy models are interesting since the impact of top-k and temperature is straightforward in one sense (it makes likely tokens more likely), but LLMs are complicated and it’s possible that my theory about forcibly shortening tokens to trigger this wouldn’t have worked.
I was also surprised by how big the effect was (admittedly, with a really large change to the tokenizer).
Yeah characterizing the impact on current models would definitely be interesting.
I think the toy models are interesting since the impact of top-k and temperature is straightforward in one sense (it makes likely tokens more likely), but LLMs are complicated and it’s possible that my theory about forcibly shortening tokens to trigger this wouldn’t have worked.
I was also surprised by how big the effect was (admittedly, with a really large change to the tokenizer).