OpenAIGA
gpt-oss (120b / 20b)
OpenAI’s open-weight family — Apache 2.0 reasoning that fits on one GPU or a laptop.
Read the full gpt-oss (120b / 20b) analysisContext
131K
Max output
—
Input /1M
$0.04
Output /1M
$0.18
Live pricing via OpenRouter
Best for
- Local-first, privacy-sensitive products (gpt-oss-20b offline on 16GB)
- Single-GPU reasoning workloads (gpt-oss-120b on one 80GB card)
- Cost-controlled agentic loops with no per-token meter or vendor lock-in
Watch out
Open weights are not free inference — self-hosting the 120b only pencils out at high steady volume or under data-residency rules; otherwise a hosted endpoint at ~$0.04-$0.15/1M is cheaper. Requires OpenAI’s harmony response format. No longer tops open-model leaderboards.
For creators. Run gpt-oss-20b locally via Ollama/LM Studio for offline drafting, agent prototyping, and any workflow where prompts/outputs can’t leave the machine.
Benchmarks
| gpqa diamond | 80.1 |
| mmlu pro | 90 |
| aime 2025 tools | 97.9 |
| swe bench verified | 62.4 |
| humanitys last exam | 19 |
Capabilities
- Apache 2.0 open weights (free, commercial-friendly, no copyleft)
- MXFP4-quantized: 120b on a single 80GB GPU (H100/MI300X), 20b in ~16GB
- Run via Ollama / vLLM / LM Studio / llama.cpp / HF Transformers
- 131K-token context, low/medium/high reasoning-effort levels
- Native function calling, browsing, Python, structured outputs (harmony format)