Based on Mikhail’s Twitter comments, ‘precise’ and ‘creative’ don’t seem to be too much more than simply the ‘temperature’ hyperparameter for sampling. ‘Precise’ would presumably correspond to very low, near-zero or zero, highly deterministic samples.
That’s interesting. Earlier, he was very explicitly identifying temperature with creativity in the Tweets I collated when commenting about how the controls worked. So now if the temperature is identical but he’s calling whatever it is ‘creative’, he’s completely flipped his position on “hallucinations = creativity”, apparently.
Hm. So it’s the same temperature, but it’s more expensive, which has ‘longer output, more expressive, slower’, requires more context… That could point to it being a different model under the hood. But it could also point to a different approach entirely, like implementing best-of sampling, or perhaps some inner-monologue-like approach like a hidden prompt generating several options and then another prompt to pick “the most creative” one. There were some earlier comments about Sydney possibly having a hidden inner-monologue scratchpad/buffer where it could do a bunch of outputs before returning only 1 visible answer to the user. (This could be parallelized if you generated the n suggestions in parallel and didn’t mind the possible redundancy, but is inherently still more serial steps than simply generating 1 answer immediately.) This could be ‘pick the most creative one’ for creative mode, or ‘pick the most correct one’ for ‘precise’ mode, etc. So this wouldn’t necessarily be anything new and could have been iterated very quickly (but as he says, it’d be inherently slower, generate longer responses, and be more expensive, and be hard to optimize much more).
This is something you could try to replicate with ChatGPT/GPT-4. Ask it to generate several different answers to the Monty Fall problem, and then ask it for the most correct one.
We will {...increase the speed of creative mode...}, but it probably always be somewhat slower, by definition: it generates longer responses, has larger context.
Our current thinking about Bing Chat modes: Balanced: best for the most common tasks, like search, maximum speed Creative: whenever you need to generate new content, longer output, more expressive, slower Precise: most factual, minimizing conjectures
So creative mode definitely has larger context size, and might also be a larger model?
Nope, Mikhail has said the opposite: https://twitter.com/MParakhin/status/1630280976562819072
So I’d guess the main difference is in the prompt.
That’s interesting. Earlier, he was very explicitly identifying temperature with creativity in the Tweets I collated when commenting about how the controls worked. So now if the temperature is identical but he’s calling whatever it is ‘creative’, he’s completely flipped his position on “hallucinations = creativity”, apparently.
Hm. So it’s the same temperature, but it’s more expensive, which has ‘longer output, more expressive, slower’, requires more context… That could point to it being a different model under the hood. But it could also point to a different approach entirely, like implementing best-of sampling, or perhaps some inner-monologue-like approach like a hidden prompt generating several options and then another prompt to pick “the most creative” one. There were some earlier comments about Sydney possibly having a hidden inner-monologue scratchpad/buffer where it could do a bunch of outputs before returning only 1 visible answer to the user. (This could be parallelized if you generated the n suggestions in parallel and didn’t mind the possible redundancy, but is inherently still more serial steps than simply generating 1 answer immediately.) This could be ‘pick the most creative one’ for creative mode, or ‘pick the most correct one’ for ‘precise’ mode, etc. So this wouldn’t necessarily be anything new and could have been iterated very quickly (but as he says, it’d be inherently slower, generate longer responses, and be more expensive, and be hard to optimize much more).
This is something you could try to replicate with ChatGPT/GPT-4. Ask it to generate several different answers to the Monty Fall problem, and then ask it for the most correct one.
Additional comments on creative mode by Mikhail (from today):
https://twitter.com/MParakhin/status/1636350828431785984
https://twitter.com/MParakhin/status/1636352229627121665
https://twitter.com/MParakhin/status/1636356215771938817
So creative mode definitely has larger context size, and might also be a larger model?