Prior to OpenAI’s 2023-02-14 patching of ChatGPT (which seemingly prevents it from directly encountering glitch tokens like ‘ petertodd’)
I’ve never seen it mentioned around here, but since that update, ChatGPT is using a different tokenizer that has glitch tokens of its own:
Doesn’t strike me as inevitable at all, just a result of OpenAI following similar methods for creating their tokenizer twice. (In both cases, leading to a few long strings being included as tokens even though they don’t actually appear frequently in large corpuses.)
They presumably had already made the GPT-4 tokenizer long before SolidGoldMagikarp was discovered in the GPT-2/GPT-3 one.