Anybody else having flashbacks to the weird GPT2 keywords with SolidGoldMagicarp or whatever acting as a bizarre atractor to a bunch of seemingly unrelated concepts? Ended up being some artifact of trimming data between tokenization and training IIRC, such that there were almost no examples of certain tokens left? Some of those symbols seem like the same kind of dregs that’d be barely represented.
Anybody else having flashbacks to the weird GPT2 keywords with SolidGoldMagicarp or whatever acting as a bizarre atractor to a bunch of seemingly unrelated concepts? Ended up being some artifact of trimming data between tokenization and training IIRC, such that there were almost no examples of certain tokens left? Some of those symbols seem like the same kind of dregs that’d be barely represented.