Neel Nanda comments on SolidGoldMagikarp (plus, prompt generation)

Neel Nanda 6 Feb 2023 21:23 UTC
LW: 2 AF: 1
0
AF

but a quick inspection of the embeddings available through the huggingface model shows this isn’t the case

That’s GPT-2 though, right? I interpret that Q&A claim as saying that GPT-3 does the normalisation, I agree that GPT-2 definitely doesn’t. But idk, doesn’t really matter

For prompt generation, we normalise the embeddings ourselves and constrain the search to that space, which results in better performance.

Interesting, what exactly do you mean by normalise? GPT-2 presumably breaks if you just outright normalise, since different tokens have very different norms