Microsoft Strikes Out on Its Own Microsoft revealed MAI-Thinking-1, a Claude Sonnet 4.6-sized reasoning model developed without distillation

Published
Reading time
3 min read
The table shows MAI-Thinking-1 leading in several benchmarks, compared to other AI models.
Loading the Elevenlabs Text to Speech AudioNative Player...

Microsoft, once OpenAI’s exclusive partner and still a major reseller of other companies’ AI models, built its own reasoning model from scratch.

What’s new: Microsoft introduced MAI-Thinking-1, its first reasoning language model that was not distilled or fine-tuned from a model built by a different developer. Microsoft describes MAI-Thinking-1 as a medium-sized model comparable to Claude Sonnet 4.6. It leads a family of seven MAI models unveiled at Microsoft’s Build conference, including MAI-Code-1-Flash, a small coding model available in GitHub Copilot and Visual Studio Code.

  • Input/output: Text in (up to 256,000 tokens), text out (up to 256,000 tokens)
  • Architecture: Mixture of experts (1 trillion parameters total, 35 billion parameters active per token)
  • Features: Function calling, developer instructions (set by developers and ranked above user prompts in conflicts), compatible with OpenAI’s Chat Completions API
  • Performance: According to Microsoft’s tests, third-place on the AIME 2025 math benchmark
  • Availability: Private preview via Microsoft Foundry, broader access planned via Fireworks AI, Baseten, and OpenRouter
  • Undisclosed: Full list of datasets and data providers, price

How it works: Microsoft built MAI-Thinking-1 by pretraining a base model, fine-tuning separate copies into specialized models to use as teachers, distilling them into a student model, and teaching the student to reason via reinforcement learning. The pretraining and midtraining data comprised 30 trillion and 3.55 trillion tokens respectively, including primarily human-generated data, including over 50 percent code. Post-training data included more than 5 million STEM questions and more than 160,000 coding questions.

  • Microsoft pretrained primarily on licensed material, avoiding synthetic data. The company argues that a model trained on AI-generated data or via distillation of third-party teacher models inherits the teacher’s design choices and generalizes less easily, so training it directly yields more steerable behavior.
  • For further training data, Microsoft crawled roughly 1.2 trillion webpages, filtered this set to 794 billion webpages, and separately drew in 24.2 billion deduplicated webpages from Common Crawl, an open archive whose own terms grant no rights to the content it stores and tell users they rely on it at their own risk.
  • The team applied reinforcement learning on this base model to train three separate specialist models: one focused on STEM reasoning, another on agentic coding and tool use, and a third primarily devoted to helpfulness and safety. Rather than imitating reasoning traces of other models, the models generated original chains of thought and were rewarded for correct math solutions, passing code test cases, and sound judgement.
  • The team consolidated the three specialists in two stages: (i) supervised fine-tuning that distilled them into a single model and (ii) a round of reinforcement learning to avoid over-refusals and improve safety and style.

Results: According to Microsoft's tests, MAI-Thinking-1 is strongest on mathematics and trails other models (including those from Anthropic, DeepSeek, and OpenAI) on graduate-level science and agentic coding. On AIME 2025, which tests the ability to solve competition math problems, MAI-Thinking-1 (97.0 percent) topped Claude Sonnet 4.6 (95.6 percent) and DeepSeek V3.2 (93.1 percent) but trailed Claude Opus 4.6 (99.8 percent). No independent evaluations or comparisons to more recent models have been published yet.

Behind the news: Microsoft has long relied on OpenAI’s models to power products such as Copilot and built earlier models of its own by drawing on those of its rivals. Its Phi family distilled OpenAI’s GPT-4 and GPT-5 models, and MAI-DS-R1 was a fine-tuned version of DeepSeek-R1. That changed in April 2026, when Microsoft and OpenAI amended their partnership, making Microsoft’s license to OpenAI’s models non-exclusive and freeing OpenAI to serve its products with any cloud provider.

Why it matters: Teams on the Microsoft stack can reach a capable reasoning model without adding a vendor or moving data out of the tools they already use. That may attract users among Microsoft’s large base of Azure and Copilot customers. Microsoft says it is planning more models based on the data pipeline created for MAI-Thinking-1 and its siblings.

We’re thinking: The company set out to train a reasoning model on fully attributable data but still drew heavily on the web. Of course everyone else does, too. Much of the success of large language models to date has been built on web data.

Share

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox