Same tokenizers on different training data lead to different glitch tokens, see e. g. comparison of Llama-family models in Yuxi Li et al. 2024 https://arxiv.org/abs/2404.09894
Good point! I hadn’t quite realized that although it seems obvious in retrospect.
Same tokenizers on different training data lead to different glitch tokens, see e. g. comparison of Llama-family models in Yuxi Li et al. 2024 https://arxiv.org/abs/2404.09894
Good point! I hadn’t quite realized that although it seems obvious in retrospect.