I think a lot of it comes down to training data context - ” Leilan” is only present in certain videogame scrapes, ” petertodd” is only found in Bitcoin spam, ect. So when you try to use it in a conversational context, the model starts spitting out weird stuff because it doesn’t have enough information to understand what those tokens actually mean. I think GPT-2′s guess for ” petertodd” is something like “part of a name/email, if you see it, expect more mentions of Bitcoin”. And not anything more, since that token doesn’t occur much anywhere else. Thus, if you bring it up in a context where Bitcoin spam is very unlikely to occur, like a conversation with an AI assistant, it kinda just acts like a masked token, and you get the glitch token behavior.
I don’t think this phenomenon is just related to the training data alone because in RLLMv3, the ” Leilan” glitch mode persisted while ” petertodd” became entirely unrelated to bitcoin. It’s like some glitch tokens can be affected by the amount of re-training and some aren’t. I believe that there is something much deeper is happening here, an architectural flaw that might be related to the token selection/construction process.
I think a lot of it comes down to training data context - ” Leilan” is only present in certain videogame scrapes, ” petertodd” is only found in Bitcoin spam, ect. So when you try to use it in a conversational context, the model starts spitting out weird stuff because it doesn’t have enough information to understand what those tokens actually mean. I think GPT-2′s guess for ” petertodd” is something like “part of a name/email, if you see it, expect more mentions of Bitcoin”. And not anything more, since that token doesn’t occur much anywhere else. Thus, if you bring it up in a context where Bitcoin spam is very unlikely to occur, like a conversation with an AI assistant, it kinda just acts like a masked token, and you get the glitch token behavior.
I don’t think this phenomenon is just related to the training data alone because in RLLMv3, the ” Leilan” glitch mode persisted while ” petertodd” became entirely unrelated to bitcoin. It’s like some glitch tokens can be affected by the amount of re-training and some aren’t. I believe that there is something much deeper is happening here, an architectural flaw that might be related to the token selection/construction process.