My current assumption is that extracting “intelligence” from images and even more so from videos is much less efficient than from text. Text is just extremely information dense.
So I wouldn’t expect Gemini to initially feel more intelligent than GPT4 even if it used 5 times the compute.
I mostly wonder about qualitative differences maybe induced by algorithmic improvements like actually using RL or search components for a kind of self-supervised finetuning, that’s one area where I can easily see Deepmind outcompeting OpenAI.
My current assumption is that extracting “intelligence” from images and even more so from videos is much less efficient than from text. Text is just extremely information dense.
So I wouldn’t expect Gemini to initially feel more intelligent than GPT4 even if it used 5 times the compute.
I mostly wonder about qualitative differences maybe induced by algorithmic improvements like actually using RL or search components for a kind of self-supervised finetuning, that’s one area where I can easily see Deepmind outcompeting OpenAI.