EDIT: I originally saw this in Janus’s tweet here: https://twitter.com/repligate/status/1619557173352370186
Something fun I just found out about: ChatGPT perceives the phrase ” SolidGoldMagikarp” (with an initial space) as the word “distribute”, and will respond accordingly. It is completely unaware that that’s not what you typed.
This happens because the BPE tokenizer saw the string ” SolidGoldMagikarp” a few times in its training corpus, so it added a dedicated token for it, but that string almost never appeared in ChatGPT’s own training data so it never learned to do anything with it. Instead, it’s just a weird blind spot in its understanding of text.
Addendum: a human neocortex has on the order of 140 trillion synapses, or 140,000 bees. An average beehive has 20,000-80,000 bees in it.
[Holding a couple beehives aloft] Beehold a man!