Meta released 3 models from the Llama 4 family herd today.
The 109B (small) and 400B (mid) models are MoE models with 17B active param each. The large model is 2T MoE model with 288B active param instead.
The small model is called Scout. The middle model is called Maverick. The big one is called Behemoth.
Natively multimodal and much longer context length. 10M (!!) for the small one and 1M for the mid sized one.
Maverick is currently number 2 on LLM arena. (Number 5 if style controlled)
We don’t have the 2T big model available yet. It is used as a distillation model for Maverick and Scout.
Behemoth beats Claude 3.7 (not thinking), GPT 4,5 and Gemini 2.0 Pro in various benchmarks (LiveCodeBench, Math-500, GPQA Diamond, etc)
Scout is meant to work on a single H100.
https://ai.meta.com/blog/llama-4-multimodal-intelligence/
Meta releases Llama-4 herd of models
Meta released 3 models from the Llama 4
familyherd today.The 109B (small) and 400B (mid) models are MoE models with 17B active param each. The large model is 2T MoE model with 288B active param instead.
The small model is called Scout. The middle model is called Maverick. The big one is called Behemoth.
Natively multimodal and much longer context length. 10M (!!) for the small one and 1M for the mid sized one.
Maverick is currently number 2 on LLM arena. (Number 5 if style controlled)
We don’t have the 2T big model available yet. It is used as a distillation model for Maverick and Scout.
Behemoth beats Claude 3.7 (not thinking), GPT 4,5 and Gemini 2.0 Pro in various benchmarks (LiveCodeBench, Math-500, GPQA Diamond, etc)
Scout is meant to work on a single H100.
https://ai.meta.com/blog/llama-4-multimodal-intelligence/