Expectations for Gemini: hopefully not a big deal

Introduction

My goal is to register and share my expectations and hear others’ opinions on their expectation for the relative performances of Gemini VS GPT-4.

My expectations

GPT-4 to Gemini will likely not be as big a jump in capabilities as GPT-3 to GPT-4 was.

Gemini could bring surprises by being more agentic than GPT-4. Being better at planning and longer horizon tasks. But this is likely difficult to achieve, or strong LLM agents would already be making the buzz.

Comparison

From GPT-3 to GPT-4

Scaling Factor: x100 more compute than GPT-3.
Optimization: Chinchilla scaling laws (for MoE) over OpenAI/Kaplan scaling laws.
MoE Over Dense: Utilizes Mixture of Experts (MoE) instead of dense layers.
Data Quality: Likely higher-quality data, not sure.
Image Generation: Not publicly released, possibly due to subpar performance or security risks.
Tools are added during finetuning.
Algorithmic Gains: 3 years between GPT-3 and GPT-4.
GPT-4 may already employ process-based feedback.
GPT-4 aimed for training compute efficiency. GPT-4 was not designed to be commercially deployed at scale.

GPT-4 to Gemini

Scaling Factor: ~x5 (x20) more compute than GPT-4.
Supercomputer Constraint: No existing supercomputer could feasibly provide x100 more compute than used for GPT-4. (Not sure but likely)
Multimodal: maybe image, audio, speech.
Data Efficiency: Possibly better quality data like Google Books, fewer epochs.
Tools could be added either during finetuning or pretraining.
Algorithmic Gains: ~1 year between GPT-4 and Gemini.
Gemini more likely aims for inference efficiency, given its intended extensive usage by Google. Maybe sacrificing training efficiency.
Gemini trained to be more agentic, better at planning, etc. (“GPT-4 + AlphaGo”).

Note: I drafted that before news of Gemini’s release and capabilities but failed to finish writing… Since then, there have been some reports of Gemini being roughly at the level of GPT-4...