Introduction
We’re releasing gpt-oss-120b and gpt-oss-20b—two state-of-the-art open-weight language models that deliver strong real-world performance at low cost. Available under the flexible Apache 2.0 license, these models outperform similarly sized open models on reasoning tasks, demonstrate strong tool use capabilities, and are optimized for efficient deployment on consumer hardware. They were trained using a mix of reinforcement learning and techniques informed by OpenAI’s most advanced internal models, including o3 and other frontier systems.
The gpt-oss-120b model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU. The gpt-oss-20b model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure. Both models also perform strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o).
These models are compatible with our Responses API and are designed to be used within agentic workflows with exceptional instruction following, tool use like web search or Python code execution, and reasoning capabilities—including the ability to adjust the reasoning effort for tasks that don’t require complex reasoning and/or target very low latency final outputs. They are entirely customizable, provide full chain-of-thought (CoT), and support Structured Outputs.
Safety is foundational to our approach to releasing all our models, and is of particular importance for open models. In addition to running the models through comprehensive safety training and evaluations, we also introduced an additional layer of evaluation by testing an adversarially fine-tuned version of gpt-oss-120b under our Preparedness Framework. gpt-oss models perform comparably to our frontier models on internal safety benchmarks, offering developers the same safety standards as our recent proprietary models. We’re sharing the results of that work and more details in a research paper and in the model card. Our methodology was reviewed by external experts and marks a step forward in setting new safety standards for open-weight models.
We’ve also been working with early partners like AI Sweden, Orange, and Snowflake to learn about real-world applications of our open models, from hosting these models on-premises for data security to fine-tuning them on specialized datasets. We’re excited to provide these best-in-class open models to empower everyone—from individual developers to large enterprises to governments—to run and customize AI on their own infrastructure. Coupled with the models available in our API, developers can choose the performance, cost, and latency they need to power AI workflows.
The model card can be found here.
I comment some on the situation with CBRN/bio with respect to these models here.
Putting GPT back in the name but making it lowercase is a fun new installment in the “OpenAI can’t name things consistently” saga.
they’re actually referring to throwing your doctor, which is why it’s called “gp-toss”
this comment was at no point endorsed by its author
I think they are doing that quite deliberately. Like Greg Brockman said when announcing their third product named Codex: “We call this system - (smiling) in the grand tradition of OpenAI naming—Codex” https://www.youtube.com/watch?v=hhdpnbfH6NU
;-) I guess it’s too late to start a prediction market on whether gpt5 ends up being lowercased or without the dash before 5 (or with the “o” after 5) ;-) so many options, all tempting ;-)
am not at all involved in naming so not sure, but i’d guess none of it is intentional and more just the emergent result of compromise between a bunch of people / relatively little effort being expended on naming
Thanks!
I wonder who has the actual final decision power, after all deliberations and compromises, who is the person or a group actually saying, “yes, we are naming this new shiny thing as XYZ”.
Is it Sam? Or is there a meeting? Or is it just “organic”, like at some point discussions seem converge to that, and people just let it stay at whatever name it converged to?