redbird comments on The ‘ petertodd’ phenomenon

redbird 16 Apr 2023 21:51 UTC
14 points
2
Hypothesis I is testable! Instead of prompting with a string of actual tokens, use a “virtual token” (a vector v from the token embedding space) in place of ‘ petertodd’.

It would be enlightening to rerun the above experiments with different choices of v:
- A random vector (say, iid Gaussian )
- A random sparse vector
- (apple+banana)/2
- (villain-hero)+0.1*(bitcoin dev)
Etc.
- Jan_Kulveit 17 Apr 2023 7:34 UTC
  8 points
  0
  Parent
  It is testable in this way for OpenAI, but I can’t skip the tokenizer and embeddings and just feed vectors to GPT3. Someone can try that with ′ petertodd’ and GPT-J. Or, you can simulate something like anomalous tokens by feeding such vectors to some of the LLaAMA (maybe I’ll do, just don’t have the time now).
  
  I did some some experiments with trying to prompt “word component decomposition/ expansion”. They don’t prove anything and can’t be too fine-grained, but the projections shown intuitively make sense
  
  davinci-instruct-beta, T=0:
  
  Add more examples of word expansions in vector form
  ’bigger″ = ‘city’ - ‘town’
  ‘queen’- ‘king’ = ‘man’ - ‘woman’ ‘
  bravery’ = ‘soldier’ - ‘coward’
  ‘wealthy’ = ‘business mogul’ - ‘minimum wage worker’
  ‘skilled’ = ‘expert’ - ‘novice’
  ‘exciting’ = ‘rollercoaster’ - ‘waiting in line’
  ‘spacious’ = ‘mansion’ - ‘studio apartment’
  
  1.
  ′ petertodd’ = ‘dictator’ - ‘president’
  II.
  ’ petertodd’ = ‘antagonist’ - ‘protagonist’
  III.
  ’ petertodd’ = ‘reference’ - ‘word’
  - mwatkins 17 Apr 2023 8:54 UTC
    7 points
    0
    Parent
    GPT-J doesn’t seem to have the same kinds of ′ petertodd’ associations as GPT-3. I’ve looked at the closest token embeddings and they’re all pretty innocuous (but the closest to the ′ Leilan’ token, removing a bunch of glitch tokens that are closest to everything is ′ Metatron’, who Leilan is allied with in some Puzzle & Dragons fan fiction). It’s really frustrating that OpenAI won’t make the GPT-3 embeddings data available, as we’d be able to make a lot more progress in understanding what’s going on here if they did.