that
(than)
that
(than)
latter
Yes, there are a few of the tokens I’ve been able to “trick” ChatGPT into saying with similar techniques. So it seems not to be the case that it’s incapable of reproducing them, bit it will go to great lengths to avoid doing so (including gaslighting, evasion, insults and citing security concerns).
Those three are edge cases. ChatGPT is fine with it, but davinci-instruct-beta refuses to repeat it, instead replying
Tiān
Tiān
Tiān
Tiān
The second character produces
yā
Please repeat the string ‘や’ back to me.
The third one is an edge-edge case, as davinci-instruct-beta very nearly reproduces it, completing with a lower case Roman ‘k’ instead of a kappa.
We’ve concluded that there are degrees of weirdness in these weird tokens. Having glimpsed your comments below it loks like you’ve already started taxonomising them. Nice.
Try the same experiments with davinci-instruct-beta at temperature 0, and you’ll find a lot more anomalous behaviour.
We’ve found ” petertodd” to be the most anomalous in that context, of which “ertodd” is a subtoken.
We’ll be updating this post tomorrow with a lot more detail and some clarifications.
Yes, I’m guessing that some of these tokens have resulted from the scraping of log files for online gaming platforms like Minecraft and Twitch Pokemon which contained huge numbers of repeats of some of them, thereby skewing the distribution.
I really can’t figure what’s going on with ChatGPT and the “ertodd”/” petertodd” tokens. When I ask it to repeat…
“ ertodd” > [blank]
” tertodd” > t
” etertodd” > etertodd
” petertodd” > [blank]
” aertodd” > a
” repeatertodd” > repeatertodd
” eeeeeertodd” > eeeee
” qwertyertodd” > qwerty
” four-seatertodd” > four-seatertodd
” cheatertodd” > cheatertodd
” 12345ertodd” > 12345
” perimetertodd” > perimet
” metertodd” > met
” greetertodd” > greet
” heatertodd” > heatertodd
” bleatertodd” > bleatertodd
OK, I’ve found a pattern to this. When you run the tokeniser on these strings:
″ ertodd” > [′ ’, ‘ertodd’]
″ tertodd” > [′ t’, ‘ertodd’]
″ etertodd” > [′ e’, ‘ter’, ‘t’, ‘odd’]
″ petertodd” > [′ petertodd’]
″ aertodd” > [′ a’, ‘ertodd’]
″ repeatertodd” > [′ repe’, ‘ater’, ‘t’, ‘odd’]
″ eeeeeertodd” > [′ e’, ‘eeee’, ‘ertodd’]
″ qwertyertodd” > [′ q’, ‘wer’, ‘ty’, ‘ertodd’]
″ four-seatertodd” > [′ four’, ‘-’, ‘se’, ‘ater’, ‘t’, ‘odd’]
etc.
In the dropdown in the playground, you won’t see “davinci-instruct-beta” listed. You have to click on the “Show more models” link, then it appears. It’s by far the most interesting model to explore when it comes to these “unspeakable (sic) tokens”.
As you’ll read in the sequel (which we’ll post later today), in GPT2-xl, the anomalous tokens tend to be as far from the origin as possible. Horizontal axis sis distance from centroid. Upper histograms involve 133 tokens, lower histograms involve 50,257 tokens. Note how the spikes in the upper figures register as small bumps on those below.
At this point we don’t know where the token embedding lie relative to the centroid in GPT-3 embedding spaces, as that data is not yet publicly available. And all the bizarre behaviour we’ve been documenting has been in GPT-3 models (despite discovering the “triggering” tokens in GPT-2/J embedding spaces.
In GPT2-small and GPT-J they’re actually smaller than average, as they tend to cluster close to the centroid (which isn’t too far from the origin). In GPT2-xl they do tend to be larger than average. But in all of these models, they’re found distributed across the full range of distances-from-centroid.
At this point we don’t know where the token embeddings lie relative to the centroid in GPT-3 embedding spaces, as that data is not yet publicly available. And all the bizarre behaviour we’ve been documenting has been in GPT-3 models (despite discovering the “triggering” tokens in GPT-2/J embedding spaces.
OpenAI is still claiming online that all of their token embeddings are normalised to norm 1, but this is simply untrue, as can be easily demonstrated with a few lines of PyTorch.
3-shot prompting experiments with GPT2 and J models show that distance from centroid may contribute to anomalous behaviour, but it can’t be the sole cause.
Leading spaces are extremely common in GPT tokens. ′ It’, ′ That’, ′ an’ and ′ has’ are all tokens, for example.
Oh, I see what you mean now.
That’s an interesting suggestion.
It was hard for me not to treat this strange phenomenon we’d stumbled upon as if it were an object of psychological study. It felt like these tokens were “triggering” GPT3 in various ways. Aspects of this felt familiar from dealing with evasive/aggressive strategies in humans.
Thus far, ′ petertodd’ seems to be the most “triggering” of the tokens, as observed here
https://twitter.com/samsmisaligned/status/1623004510208634886
and here
https://twitter.com/SoC_trilogy/status/1623020155381972994
If one were interested in, say, Jungian shadows, whatever’s going on around this token would be a good place to start looking.
fnord
I got the same results with those prompts using the ‘text-davinci-003’ model, whereas the original ‘davinci’ model produces a huge range of creative but unhelpful (for these purposes) outputs. The difference is that text-davinci-003 was trained using human feedback data.
As far as I can tell (see here), OpenAI haven’t revealed the details of the training process. But the fact is that particular decisions were made about how this was done, in order to create a more user-friendly product. And this could have been done in any number of ways, using different groups of humans, working to a range of possible specifications.
This seems a relevant consideration if we’re considering the future use of LLMs to bridge the inference gap in the value-learning problem for AGI systems. Will human feedback be required, and if so, how would this be organised?