Inflection.ai is a major AGI lab

Update (April 2024): Due to the recent breakup of Inflection, I no longer think they’re on track to be a major AGI lab.

Inflection.ai (co-founded by DeepMind co-founder Mustafa Suleyman) should be perceived as a frontier LLM lab of similar magnitude as Meta, OpenAI, DeepMind, and Anthropic based on their compute, valuation, current model capabilities, and plans to train frontier models. Compared to the other labs, Inflection seems to put less effort into AI safety.

Thanks to Laker Newhouse for discussion and feedback!

Inflection has a lot of compute dedicated to training LLMs

Inflection has a lot of funding

Inflection is on the cutting edge of LLMs

Inflection plans to train frontier LLMs

Inflection doesn’t seem to acknowledge existential risks or have a sizable safety team

Appendix: Estimating Inflection’s compute

Here are some back-of-the-envelope calculations for Inflection’s current compute from three data sources. They result in estimates ranging around 2 orders of magnitude, centered around 4e18.

FLOPs = plural of “floating point operation (FLOP)”

FLOPS = floating point operations per second

The H100 route

From the H100 datasheet, it seems like different components of the H100 (of which, different models exist), have different amounts of FLOPS. I will simplify and assume one H100 provides an effective 10,000 teraFLOPS, which is 1e12 FLOPS. Inflection.ai currently has around 3.6 thousand H100s, which puts total FLOPS at 3.6e19.

The “train GPT-4 in 4 months when we triple our cluster” route

Inflection thinks they’ll be able to train GPT-4 with four months of cluster time once they triple their cluster size. This means they think they can train GPT-4 in one year of cluster time right now. Epoch thinks GPT-4 took 2.1e25 FLOPs to train, which puts Inflection’s current compute at 6.7e17 FLOPS.

The “11 minutes on the GPT-3 MLBench benchmark” route

Inflection can train GPT-3 up to 2.69 log perplexity on the C4 dataset in 11 minutes. What does this mean? I’m not sure, as I have found it hard to find any modern model’s log perplexity scores on that dataset. GPT-3′s log perplexity seems to be −1.73 on some dataset. GPT-2-1.5b’s log perplexity on another dataset seems to be around 3.3. Not sure what to make of that, but let’s assume Inflection can train GPT-2 in 11 minutes on their cluster. This would put their current compute at 2.3e18 FLOPS if we use the Epoch estimate of how much compute GPT-2 took to train.