Hypothesis I is testable! Instead of prompting with a string of actual tokens, use a “virtual token” (a vector v from the token embedding space) in place of ‘ petertodd’.
It would be enlightening to rerun the above experiments with different choices of v:
It is testable in this way for OpenAI, but I can’t skip the tokenizer and embeddings and just feed vectors to GPT3. Someone can try that with ′ petertodd’ and GPT-J. Or, you can simulate something like anomalous tokens by feeding such vectors to some of the LLaAMA (maybe I’ll do, just don’t have the time now).
I did some some experiments with trying to prompt “word component decomposition/ expansion”. They don’t prove anything and can’t be too fine-grained, but the projections shown intuitively make sense
davinci-instruct-beta, T=0:
Add more examples of word expansions in vector form ’bigger″ = ‘city’ - ‘town’ ‘queen’- ‘king’ = ‘man’ - ‘woman’ ‘ bravery’ = ‘soldier’ - ‘coward’ ‘wealthy’ = ‘business mogul’ - ‘minimum wage worker’ ‘skilled’ = ‘expert’ - ‘novice’ ‘exciting’ = ‘rollercoaster’ - ‘waiting in line’ ‘spacious’ = ‘mansion’ - ‘studio apartment’
1. ′ petertodd’ = ‘dictator’ - ‘president’ II. ’ petertodd’ =‘antagonist’ - ‘protagonist’ III. ’ petertodd’ = ‘reference’- ‘word’
GPT-J doesn’t seem to have the same kinds of ′ petertodd’ associations as GPT-3. I’ve looked at the closest token embeddings and they’re all pretty innocuous (but the closest to the ′ Leilan’ token, removing a bunch of glitch tokens that are closest to everything is ′ Metatron’, who Leilan is allied with in some Puzzle & Dragons fan fiction). It’s really frustrating that OpenAI won’t make the GPT-3 embeddings data available, as we’d be able to make a lot more progress in understanding what’s going on here if they did.
Hypothesis I is testable! Instead of prompting with a string of actual tokens, use a “virtual token” (a vector v from the token embedding space) in place of ‘ petertodd’.
It would be enlightening to rerun the above experiments with different choices of v:
A random vector (say, iid Gaussian )
A random sparse vector
(apple+banana)/2
(villain-hero)+0.1*(bitcoin dev)
Etc.
It is testable in this way for OpenAI, but I can’t skip the tokenizer and embeddings and just feed vectors to GPT3. Someone can try that with ′ petertodd’ and GPT-J. Or, you can simulate something like anomalous tokens by feeding such vectors to some of the LLaAMA (maybe I’ll do, just don’t have the time now).
I did some some experiments with trying to prompt “word component decomposition/ expansion”. They don’t prove anything and can’t be too fine-grained, but the projections shown intuitively make sense
davinci-instruct-beta, T=0:
Add more examples of word expansions in vector form
’bigger″ = ‘city’ - ‘town’
‘queen’- ‘king’ = ‘man’ - ‘woman’ ‘
bravery’ = ‘soldier’ - ‘coward’
‘wealthy’ = ‘business mogul’ - ‘minimum wage worker’
‘skilled’ = ‘expert’ - ‘novice’
‘exciting’ = ‘rollercoaster’ - ‘waiting in line’
‘spacious’ = ‘mansion’ - ‘studio apartment’
1.
′ petertodd’ = ‘dictator’ - ‘president’
II.
’ petertodd’ = ‘antagonist’ - ‘protagonist’
III.
’ petertodd’ = ‘reference’ - ‘word’
GPT-J doesn’t seem to have the same kinds of ′ petertodd’ associations as GPT-3. I’ve looked at the closest token embeddings and they’re all pretty innocuous (but the closest to the ′ Leilan’ token, removing a bunch of glitch tokens that are closest to everything is ′ Metatron’, who Leilan is allied with in some Puzzle & Dragons fan fiction). It’s really frustrating that OpenAI won’t make the GPT-3 embeddings data available, as we’d be able to make a lot more progress in understanding what’s going on here if they did.