Microsoft AI launched 7 self-built MAI models — Thinking-1, Image-2.5, Code-1-Flash and more — on its own MAIA silicon. What the vendor claims, what's verifiable, and what it means for builders.
You will understand exactly what Microsoft shipped with its 7 MAI models, which claims are verifiable today, and how to slot them into a model-routing strategy.
TL;DR: On June 2, 2026, Mustafa Suleyman (CEO, Microsoft AI) announced seven new MAI models — Microsoft's first fully self-built frontier family, trained and served on Microsoft's own MAIA silicon. The lineup: MAI-Image-2.5 and MAI-Image-2.5 Flash, MAI-Transcribe-1.5, MAI-Thinking-1, MAI-Voice-2 and MAI-Voice-2 Flash, and MAI-Code-1-Flash. The flagship reasoning model, MAI-Thinking-1, is a 35B-active-parameter mixture-of-experts with a 256K context window. Microsoft claims human raters on Surge prefer it over Claude Sonnet 4.6 in blind comparisons, that it scores 97% on AIME 2025, and 53% on SWE-Bench Pro. Every figure here is vendor-claimed from a launch announcement — not independently reproduced. The strategic signal matters more than any single benchmark: Microsoft now owns the full stack, chip to model to customization layer.
Seven models, launched together, spanning text, image, voice, transcription, and code. Here's the lineup as described in the announcement:
| Model | Type | What Microsoft claims |
|---|---|---|
| MAI-Thinking-1 | Text reasoning foundation | 35B-active MoE, 256K context; preferred over Sonnet 4.6 by human raters (blind); 97% AIME 2025; 53% SWE-Bench Pro |
| MAI-Image-2.5 | Image generation/editing | #2 on image leaderboards; surpasses Nano Banana 2 on editing |
| MAI-Image-2.5 Flash | Fast image gen | Flash-tier latency/cost variant of Image-2.5 |
| MAI-Voice-2 | Speech synthesis | Next-gen voice model |
| MAI-Voice-2 Flash | Fast speech | Flash-tier voice variant |
| MAI-Transcribe-1.5 | Speech-to-text | Transcription model |
| MAI-Code-1-Flash | Coding | 5B params; 51% SWE-Bench Pro; tuned for VS Code + GitHub Copilot CLI |
The framing from Suleyman: a new era of AI "designed to keep you in control and on the frontier," explicitly tied to Microsoft's stated pursuit of humanist superintelligence.
For years, Microsoft's frontier AI story ran through OpenAI. This launch is the clearest evidence yet that Microsoft AI is building a parallel, fully in-house stack — and not just the models. Three layers are now Microsoft-owned:
Silicon. MAI-Thinking-1 was co-designed with Microsoft's MAIA 200 chip. Microsoft claims a ~30% better performance-per-dollar and a 1.4x performance-per-watt gain running MAI models on MAIA 200 end-to-end, benchmarked head-to-head against NVIDIA's GB200. If even directionally true, owning the chip is what makes the economics of the Flash-tier models (Image, Voice, Code) viable at scale.
Models. Seven of them, across every major modality, shipped on one day. That cadence only works when you control your own training infrastructure.
Customization — "Frontier Tuning." This is the part most builders skipped over. Microsoft is letting customers tune MAI models into custom, company-specific agents — "your model, your data, your agents, your moat," in Suleyman's words. The cited proof point: a McKinsey-tuned model that delivered the highest win rate on McKinsey's tasks, claimed to outperform GPT-5.5 on quality while running ~10x cheaper. Microsoft also announced a collaboration with Mayo Clinic to jointly train a healthcare frontier model.
The pattern is the same one Anthropic, Google, and OpenAI are racing toward: vertical integration from chip to deployed agent. Microsoft just made its move public and concrete.
Here's where I separate the claim from the evidence, because the editorial standard here is that every number gets a source — and these numbers have exactly one: Microsoft's own launch.
The claims:
How I read it: A 35B-active MoE landing near the top of frontier coding and math benchmarks would be genuinely impressive — that's a small active footprint for that class of result, and it's consistent with the MAIA-200 efficiency story. But two cautions. First, "human raters prefer it over Sonnet 4.6" is a preference signal on a specific eval set, not a capability ceiling — preference studies are notoriously sensitive to prompt mix and judging rubric. Second, none of this is reproduced yet. We've seen vendor launch numbers compress hard once LMArena, ARC Prize, and Artificial Analysis get their hands on a model. Until then, MAI-Thinking-1 sits in my "promising, unverified" bucket — the same bucket I put vendor-launch Gemini numbers in.
MAI-Image-2.5 is the one with the most checkable claim: Microsoft says it and its Flash variant sit at #2 on image leaderboards, surpassing Google's Nano Banana 2 on image editing specifically. Image leaderboards (the LMArena image arena, in particular) update fast and are crowd-judged, so this is the claim most likely to be confirmed or debunked within weeks. If it holds, the Flash variant is the interesting one — competitive editing quality at Flash-tier cost is exactly what high-volume creative pipelines need.
MAI-Code-1-Flash is the efficiency play. 51% on SWE-Bench Pro at 5B parameters puts it in Haiku-class size territory but, per Microsoft, cheaper — and it's explicitly tuned for VS Code and the GitHub Copilot CLI, which is the obvious distribution channel. A small, cheap, IDE-native coding model is a different product than a frontier reasoning model: it's the one you run on every keystroke, not the one you call for architecture. That's a smart lane to own given Microsoft already controls the editor.
| Claim | Status |
|---|---|
| 7 models exist, across these modalities | Verifiable (announced) |
| Tuned for VS Code / Copilot CLI | Verifiable (Microsoft's own surfaces) |
| Mayo Clinic + McKinsey collaborations | Stated (vendor, not yet detailed) |
| MAI-Thinking-1 beats Sonnet 4.6 (human pref) | Vendor-claimed — single eval, unreproduced |
| 97% AIME 2025 / 53% SWE-Bench Pro | Vendor-claimed — await independent runs |
| MAI-Image-2.5 > Nano Banana 2 on editing | Vendor-claimed — leaderboards will confirm fast |
| MAIA 200: 30% perf/$, 1.4x perf/watt vs GB200 | Vendor-claimed — no third-party silicon data |
If you're making procurement or architecture decisions, treat the bottom five rows as marketing until the independent benchmarks land.
For the broader competitive picture, see the frontier model landscape for 2026, the Claude Opus 4.6 deep analysis, and the live Frontier Models Intelligence Hub. For routing and pricing across every provider, the LLM Hub tracks them side by side.
What are Microsoft's MAI models? MAI (Microsoft AI) models are Microsoft's in-house frontier model family. The June 2026 launch introduced seven: MAI-Thinking-1 (text reasoning), MAI-Image-2.5 and Image-2.5 Flash (image), MAI-Voice-2 and Voice-2 Flash (speech), MAI-Transcribe-1.5 (speech-to-text), and MAI-Code-1-Flash (coding).
Is MAI-Thinking-1 better than Claude or GPT? Microsoft claims human raters prefer it over Claude Sonnet 4.6 in blind comparisons, with 97% on AIME 2025 and 53% on SWE-Bench Pro. These are vendor-claimed numbers from the launch, not independently reproduced. Wait for third-party benchmarks (LMArena, ARC Prize, Artificial Analysis) before treating it as a Claude or GPT replacement.
What is MAIA 200? MAIA 200 is Microsoft's in-house AI accelerator chip. MAI-Thinking-1 was co-designed for it; Microsoft claims ~30% better performance-per-dollar and 1.4x performance-per-watt versus NVIDIA's GB200 when running MAI models end-to-end. No independent silicon benchmarks exist yet.
What is Microsoft Frontier Tuning? Frontier Tuning lets customers fine-tune MAI models into custom, company-specific agents on their own data. Microsoft cites a McKinsey-tuned model that beat GPT-5.5 on quality at roughly 10x lower cost, and a Mayo Clinic collaboration to train a healthcare frontier model.
How is MAI-Code-1-Flash different from a frontier coding model? It's a small (5B-parameter), inference-efficient coding model tuned for VS Code and the GitHub Copilot CLI — designed for high-frequency, low-latency in-editor use rather than complex multi-file architecture. Microsoft claims 51% on SWE-Bench Pro, strong for its size.
Where can I track these models against competitors? The FrankX Frontier Models Intelligence Hub and LLM Hub track frontier models side by side on benchmarks, context windows, and pricing as independent data becomes available.
Step-by-step guide to setting up ACOS, creating your first agent, and shipping real products with AI.
Start buildingDownload AI architecture templates, multi-agent blueprints, and prompt engineering patterns.
Browse templatesConnect with creators and architects shipping AI products. Weekly office hours, shared resources, direct access.
Join the circleRead on FrankX.AI — AI Architecture, Music & Creator Intelligence
Weekly field notes on AI systems, production patterns, and builder strategy.