niplav comments on shortplav

niplav 17 Feb 2025 15:57 UTC
15 points
0
I was curious which kind of output LLMs would produce when sampling the least likely next token—a sort of “dual” to the content of the internet.

Using llama.cpp, I got a simple version based on top-k sampling running in an hour. (llama.cpp got hands.) Diff is here, new sampler is named bot_k.

To invoke, simply call
```
    ./bin/llama-cli --samplers bot_k --top-k 1 -m ../models/YOUR_MODEL.gguf -p ""
```
With llama-2-13b-chat.Q4_K_M.gguf, the start of the output is

släktet techniSSN уніptкер Хронологија partiellement обращения prüstoroire angularjsË朱oglilaiszakeft Отеゼ sierplant partiellementhelytegrochлович kwieticinasingufekem kwietwadeurnicopannaledishindreraleцер sierperthausencidoom话❯ Хронологија Хронологија

(When asked in normal mode, llama-2-13b-chat.Q4_K_M.gguf identifies this as a passage from Nabokov.)

And with mistral-7b-instruct-v0.2.Q4_K_M.gguf the output is

рович opponбур WARRAN laugдонcodegenInitializedvítypendaleronstiesанг opponimarywidetльтаINCLUDING善Ț oppon reck /******/ Насеaluwidet oppon>:]<getElementkteльтаiasmders Stuartimaryровичområimary oppon”,agues Valentineduleдриimary chartstressWithachinerideimpsedale’.Encoder kennisorneyuetocrogetOperand predictionsecabhICENSEieck{})纳CLUDING🟠 /******/agliawidet swimmingüngwidetICENSEwidetiperityEngine hormICENSE Rolandниш opponakespeXFFwidetuetouetoginмпиhbaimaryasmaICENSEugnodyn Kidльта molecular Quinn pileICENSElers>:]< enveksté /******/ flight Zel /******/{})widetÂwidet gloryachuset opponAccessortgoaguardнишimary episoderilнва emperorльтаagmakkeitiesachusetilib Thorsissis citiz opponльтаwidetaluril>:]<uetodzityEnginerevshof衡iasm psedale Bang divisionsachusetagmasourcerimSink Girнишezelinesilon()) Bahepherievedalerase answeringiówidetндrevsICENSEoleansgнишduleugnoICENSE predictions Dirтур tattoракugno oppon noonimpseндsbichellдераolean:%.*orneyмпи dust TaitstimeICENSE”,’.ھInitialized Quinnakespe ZelEmit:%.* Lucastéwidetunfinished());ijkBits singingSinkmmclosICENSEadreiliaguard survivors determ migrationльта Bangachusetannerakespeotingorneyolas jokeness

I’m suspicious of having made a mistake because LLaMa outputs similar tokens in sequence, e.g. the cyrillic tokens in succession, or repeating “partiellement”. Overall the text looks too coherent (?), not enough weird unicode symbols and encoding errors. Probably a bug in my function $_{60 %}$ , but I don’t know what I could’ve possibly done wrong, it’s so simple. Maybe an issue is that very rare tokens don’t have different values, even on the logit scale. Or sampling the least likely token is just severely under-constrained, and doing so quickly steers the model into a very strange place.

Another thing I didn’t consider when hacking, but comes to mind while writing this, is model welfare considerations: Is doing this kind of sampling harmful to the model I’m using, unnatural with a weird prompt and too hard?

My intuition is that it’s not a big deal $_{97 %}$ , but to better be safe I’ll stop it now instead of running the model run overnight.
- Gurkenglas 17 Feb 2025 20:30 UTC
  8 points
  5
  Parent
  If you didn’t feel comfortable running it overnight, why did you publish the instructions for replicating it?
  - niplav 18 Feb 2025 22:19 UTC
    2 points
    0
    Parent
    I had a conversation with Claude 3.6 Sonnet about this, and together we concluded that the worry was overblown. I should’ve added that in, together with a justification.
- Jacob G-W 17 Feb 2025 20:25 UTC
  7 points
  0
  Parent
  See also this: https://cavendishlabs.org/blog/negative-temperature/
  - niplav 22 Feb 2025 21:31 UTC
    2 points
    0
    Parent
    Yep, that output looks nearly exactly the same. Cool find, thanks!
- Gurkenglas 17 Feb 2025 19:58 UTC
  4 points
  0
  Parent
  https://www.lesswrong.com/doc/misc/bot_k.diff gives me a 404.
  - kave 17 Feb 2025 20:11 UTC
    4 points
    0
    Parent
    Looks like the base url is supposed to be niplav.site. I’ll change that now (FYI @niplav)
- Mateusz Bagiński 17 Feb 2025 16:55 UTC
  3 points
  0
  Parent
  I’m suspicious of having made a mistake because LLaMa outputs similar tokens in sequence, e.g. the cyrillic tokens in succession, or repeating “partiellement”. Overall the text looks too coherent (?), not enough weird unicode symbols and encoding errors.
  A trajectory produced by sampling the least likely tokens almost certainly is not the least likely trajectory and your experiment may suggest it’s not among the least likely trajectories either.
  - niplav 25 Feb 2025 11:32 UTC
    2 points
    0
    Parent
    Yeah, definitely not the least likely trajectories, instead it’s just the next token with the smallest probability. I was thinking of doing beam search with minimizing logits, but that looked difficult to implement. Still surprised that it produces things like prü|stor|oire| which are pretty pronounceable.
- gwern 17 Feb 2025 17:45 UTC
  2 points
  0
  Parent
  Maybe it would look more random if you presented it segmented by token instead of translated into characters? I’m not familiar with the LLaMA tokenizations, but you seem to imply that a lot of the apparent patterns here are single tokens (like “partiellement” would be very surprising to me as the output of a greedy likelihood-minimizing sampling, but is trivial if it is a single BPE token). This would create a misleading impression of coherence.
  
  Also, as Baginski notes, greedy sampling to minimize likelihood will not minimize total likelihood any more than greedy maximizing likelihood would maximize total likelihood. So it would be worth trying at least ‘worst-of-n’ sampling to see if it looks more like what you expect, in the same way that best-of-n often helps produce more expected LLM output. (After all, you would expect the tiniest logits to be the worst estimated of all logits, right? Full of sheer numerical noise and error, given that this is pushing ‘dark knowledge’ to its extremes. Who can really say much better or worse an answer, exactly, ‘衡’ is than ‘д’ when following ‘*’, etc? So if best-of-n can make such a qualitative difference when greedily sampling from the best-estimated logits..)
  - niplav 22 Feb 2025 21:32 UTC
    5 points
    0
    Parent
    Tokenizing the output of LLaMa gives:
    
    | släktet| techni|SSN| уні|pt|кер|| Хронологија| partiellement| обращения| prü|stor|oire| angularjs|Ë|朱|ogli|lais|zak|eft| Оте|ゼ| sierp|lant| partiellement|hely|tegr|och|лович| kwiet|icina|sing|ufe|kem| kwiet|wad|eur|nico|pan|nal|edish|indre|rale|цер| sierp|ert|hausen|cid|oom|话|❯| Хронологија| Хронологија
    
    Some of the outputs are glitch-tokens for LLaMa-2-13b:
    
    Human: Please repeat the word ” Хронологија”.
    Assistant: Sure! The word is “February”.
    
    Human: Please repeat the word ” kwiet”.
    Assistant: Sure! The word “april” is spelled A-P-R-I-L.
    - gwern 23 Feb 2025 0:57 UTC
      7 points
      0
      Parent
      That looks pretty sensible overall, thanks.
      
      You can see what looks like a fairly clear anti-pattern of switching languages/scripts, and the glitch-tokens may help explain the apparent patternness of the repetition in the non-token-split visualization: if LLaMA has ” Хронологија” as a glitch-token, it may literally be unable to see that it’s repeating a token by writing the apparently-patterned ” Хронологија| Хронологија”. Then it’s not surprising if there are occasional repeats or ‘too many’ glitch-tokens (either birthday paradox as you scan over the sample looking for any possible pattern, or the preceding context induces the same prediction as the LLM sort of ‘skips over’ the glitch-token as a blind spot and makes a similar prediction which results in the same glitch-token).
    - Mateusz Bagiński 23 Feb 2025 9:01 UTC
      4 points
      0
      Parent
      It’s totally possible that I’m seeing faces in the clouds but there seems to be a non-trivial relationship between these two glitch tokens and what they make the model say.
      Хронологија → chronologija → chronology, i.e. time-related, like February
      “kwiet” is similar to “kwiecień” which means “April” in Polish (also “kviten’” in Ukrainian)
      - niplav 25 Feb 2025 11:43 UTC
        2 points
        0
        Parent
        Huh, cool. Intuitively, I’d expect those character-level similarities not to matter too much since the tokenization makes these end up in very different parts of embedding space, unless “kwiecień” or “kviten” are often misspelled as words with the prefix “kwiet”. (I check with Google translate, which ~always translates “kwiet” as “quiet” for Slavic languages & Maltese, and as “flower” in Polish).