Yeah, even properly scrapped webpages will often times contain strings of weird tokens like hyperlinks, ASCII art, twitter embedds, ect, that LLMs have been trained to ignore. So GPT5 is treating the random appended tokens like glitch tokens by ignoring them, but only in the context of them being nonsensical.
The best explaination is probably something like “these tokens are obviously not part of the intended user prompt, GPT5 realizes this, and correctly ignores them.”
Edit: OK, I shouldn’t write right after waking up.
I think a better explaination is that GPT5 reserves those tokens for chain-of-thought, and so ignores them in other contexts where they obviously don’t belong. This common behavior for glitch tokens, or just general out-of-context tokens. You should try using tokens that are out-of-context but don’t normally have glitch behavior, maybe non-English tokens or programming-related tokens.
Yeah, even properly scrapped webpages will often times contain strings of weird tokens like hyperlinks, ASCII art, twitter embedds, ect, that LLMs have been trained to ignore. So GPT5 is treating the random appended tokenslikeglitch tokens by ignoring them, but only in the context of them being nonsensical.The best explaination is probably something like “these tokens are obviously not part of the intended user prompt, GPT5 realizes this, and correctly ignores them.”Edit: OK, I shouldn’t write right after waking up.
I think a better explaination is that GPT5 reserves those tokens for chain-of-thought, and so ignores them in other contexts where they obviously don’t belong. This common behavior for glitch tokens, or just general out-of-context tokens. You should try using tokens that are out-of-context but don’t normally have glitch behavior, maybe non-English tokens or programming-related tokens.