I’m suspicious of having made a mistake because LLaMa outputs similar tokens in sequence, e.g. the cyrillic tokens in succession, or repeating “partiellement”. Overall the text looks too coherent (?), not enough weird unicode symbols and encoding errors. Probably a bug in my function60%, but I don’t know what I could’ve possibly done wrong, it’s so simple. Maybe an issue is that very rare tokens don’t have different values, even on the logit scale. Or sampling the least likely token is just severely under-constrained, and doing so quickly steers the model into a very strange place.
Another thing I didn’t consider when hacking, but comes to mind while writing this, is model welfare considerations: Is doing this kind of sampling harmful to the model I’m using, unnatural with a weird prompt and too hard?
My intuition is that it’s not a big deal97%, but to better be safe I’ll stop it now instead of running the model run overnight.
I had a conversation with Claude 3.6 Sonnet about this, and together we concluded that the worry was overblown. I should’ve added that in, together with a justification.
I’m suspicious of having made a mistake because LLaMa outputs similar tokens in sequence, e.g. the cyrillic tokens in succession, or repeating “partiellement”. Overall the text looks too coherent (?), not enough weird unicode symbols and encoding errors.
A trajectory produced by sampling the least likely tokens almost certainly is not the least likely trajectory and your experiment may suggest it’s not among the least likely trajectories either.
Yeah, definitely not the least likely trajectories, instead it’s just the next token with the smallest probability. I was thinking of doing beam search with minimizing logits, but that looked difficult to implement. Still surprised that it produces things like prü|stor|oire| which are pretty pronounceable.
Maybe it would look more random if you presented it segmented by token instead of translated into characters? I’m not familiar with the LLaMA tokenizations, but you seem to imply that a lot of the apparent patterns here are single tokens (like “partiellement” would be very surprising to me as the output of a greedy likelihood-minimizing sampling, but is trivial if it is a single BPE token). This would create a misleading impression of coherence.
Also, as Baginski notes, greedy sampling to minimize likelihood will not minimize total likelihood any more than greedy maximizing likelihood would maximize total likelihood. So it would be worth trying at least ‘worst-of-n’ sampling to see if it looks more like what you expect, in the same way that best-of-n often helps produce more expected LLM output. (After all, you would expect the tiniest logits to be the worst estimated of all logits, right? Full of sheer numerical noise and error, given that this is pushing ‘dark knowledge’ to its extremes. Who can really say much better or worse an answer, exactly, ‘衡’ is than ‘д’ when following ‘*’, etc? So if best-of-n can make such a qualitative difference when greedily sampling from the best-estimated logits..)
You can see what looks like a fairly clear anti-pattern of switching languages/scripts, and the glitch-tokens may help explain the apparent patternness of the repetition in the non-token-split visualization: if LLaMA has ” Хронологија” as a glitch-token, it may literally be unable to see that it’s repeating a token by writing the apparently-patterned ” Хронологија| Хронологија”. Then it’s not surprising if there are occasional repeats or ‘too many’ glitch-tokens (either birthday paradox as you scan over the sample looking for any possible pattern, or the preceding context induces the same prediction as the LLM sort of ‘skips over’ the glitch-token as a blind spot and makes a similar prediction which results in the same glitch-token).
It’s totally possible that I’m seeing faces in the clouds but there seems to be a non-trivial relationship between these two glitch tokens and what they make the model say.
Хронологија → chronologija → chronology, i.e. time-related, like February
“kwiet” is similar to “kwiecień” which means “April” in Polish (also “kviten’” in Ukrainian)
Huh, cool. Intuitively, I’d expect those character-level similarities not to matter too much since the tokenization makes these end up in very different parts of embedding space, unless “kwiecień” or “kviten” are often misspelled as words with the prefix “kwiet”. (I check with Google translate, which ~always translates “kwiet” as “quiet” for Slavic languages & Maltese, and as “flower” in Polish).
I was curious which kind of output LLMs would produce when sampling the least likely next token—a sort of “dual” to the content of the internet.
Using llama.cpp, I got a simple version based on top-k sampling running in an hour. (llama.cpp got hands.) Diff is here, new sampler is named
bot_k
.To invoke, simply call
With llama-2-13b-chat.Q4_K_M.gguf, the start of the output is
(When asked in normal mode, llama-2-13b-chat.Q4_K_M.gguf identifies this as a passage from Nabokov.)
And with mistral-7b-instruct-v0.2.Q4_K_M.gguf the output is
I’m suspicious of having made a mistake because LLaMa outputs similar tokens in sequence, e.g. the cyrillic tokens in succession, or repeating “partiellement”. Overall the text looks too coherent (?), not enough weird unicode symbols and encoding errors. Probably a bug in my function60%, but I don’t know what I could’ve possibly done wrong, it’s so simple. Maybe an issue is that very rare tokens don’t have different values, even on the logit scale. Or sampling the least likely token is just severely under-constrained, and doing so quickly steers the model into a very strange place.
Another thing I didn’t consider when hacking, but comes to mind while writing this, is model welfare considerations: Is doing this kind of sampling harmful to the model I’m using, unnatural with a weird prompt and too hard?
My intuition is that it’s not a big deal97%, but to better be safe I’ll stop it now instead of running the model run overnight.
If you didn’t feel comfortable running it overnight, why did you publish the instructions for replicating it?
I had a conversation with Claude 3.6 Sonnet about this, and together we concluded that the worry was overblown. I should’ve added that in, together with a justification.
See also this: https://cavendishlabs.org/blog/negative-temperature/
Yep, that output looks nearly exactly the same. Cool find, thanks!
https://www.lesswrong.com/doc/misc/bot_k.diff gives me a 404.
Looks like the base url is supposed to be niplav.site. I’ll change that now (FYI @niplav)
A trajectory produced by sampling the least likely tokens almost certainly is not the least likely trajectory and your experiment may suggest it’s not among the least likely trajectories either.
Yeah, definitely not the least likely trajectories, instead it’s just the next token with the smallest probability. I was thinking of doing beam search with minimizing logits, but that looked difficult to implement. Still surprised that it produces things like
prü|stor|oire|
which are pretty pronounceable.Maybe it would look more random if you presented it segmented by token instead of translated into characters? I’m not familiar with the LLaMA tokenizations, but you seem to imply that a lot of the apparent patterns here are single tokens (like “partiellement” would be very surprising to me as the output of a greedy likelihood-minimizing sampling, but is trivial if it is a single BPE token). This would create a misleading impression of coherence.
Also, as Baginski notes, greedy sampling to minimize likelihood will not minimize total likelihood any more than greedy maximizing likelihood would maximize total likelihood. So it would be worth trying at least ‘worst-of-n’ sampling to see if it looks more like what you expect, in the same way that best-of-n often helps produce more expected LLM output. (After all, you would expect the tiniest logits to be the worst estimated of all logits, right? Full of sheer numerical noise and error, given that this is pushing ‘dark knowledge’ to its extremes. Who can really say much better or worse an answer, exactly, ‘衡’ is than ‘д’ when following ‘*’, etc? So if best-of-n can make such a qualitative difference when greedily sampling from the best-estimated logits..)
Tokenizing the output of LLaMa gives:
Some of the outputs are glitch-tokens for LLaMa-2-13b:
That looks pretty sensible overall, thanks.
You can see what looks like a fairly clear anti-pattern of switching languages/scripts, and the glitch-tokens may help explain the apparent patternness of the repetition in the non-token-split visualization: if LLaMA has ” Хронологија” as a glitch-token, it may literally be unable to see that it’s repeating a token by writing the apparently-patterned ” Хронологија| Хронологија”. Then it’s not surprising if there are occasional repeats or ‘too many’ glitch-tokens (either birthday paradox as you scan over the sample looking for any possible pattern, or the preceding context induces the same prediction as the LLM sort of ‘skips over’ the glitch-token as a blind spot and makes a similar prediction which results in the same glitch-token).
It’s totally possible that I’m seeing faces in the clouds but there seems to be a non-trivial relationship between these two glitch tokens and what they make the model say.
Хронологија → chronologija → chronology, i.e. time-related, like February
“kwiet” is similar to “kwiecień” which means “April” in Polish (also “kviten’” in Ukrainian)
Huh, cool. Intuitively, I’d expect those character-level similarities not to matter too much since the tokenization makes these end up in very different parts of embedding space, unless “kwiecień” or “kviten” are often misspelled as words with the prefix “kwiet”. (I check with Google translate, which ~always translates “kwiet” as “quiet” for Slavic languages & Maltese, and as “flower” in Polish).