Microsoft fully trains its own models Copilot app brings agent management to desktop

Published

Jun 3, 2026

Reading time

5 min read

In today’s edition of Data Points, you’ll learn about Microsoft’s announcements from its Build developer conference, plus more:

How agents think about search
Hermes now a multi-platform desktop app
Qwen3.7-Plus, Alibaba’s midsized cloud model
OpenAI’s latest plugins for Codex

But first:

Microsoft moves away from distilling OpenAI’s models

Microsoft released seven new in-house AI models spanning reasoning, coding, image generation, transcription, and voice synthesis—all trained from scratch without distilling from third-party models. The flagship MAI-Thinking-1 rivals leading models on software engineering benchmarks despite being much smaller, while MAI-Transcribe-1.5 claims best-in-class accuracy on the FLEURS benchmark across 43 languages, though independent leaderboards place it third overall in word error rate. The company also introduced “Frontier Tuning,” a reinforcement learning approach that lets organizations train models on their own workflows. (Microsoft)

GitHub refits its desktop app to manage agents

GitHub released the Copilot app, a desktop environment designed to manage multiple AI agents working in parallel on different development tasks. The app solves a real problem: As agents handle more work, developers lose visibility into what’s running, context scatters across windows, and code reaches pull requests without clear trails of what the agent attempted or where human judgment is needed. From a single “My Work” view, developers can see active sessions, issues, pull requests, and background automations across connected repositories. Each agent session runs in its own isolated git worktree. The app also introduces canvases, bidirectional work surfaces where agents show their plans, code changes, and deployment states while developers can edit, approve, or redirect that work on the same surface. (GitHub)

A Bing search engine just for agents

Microsoft released Web IQ, a suite of AI-native grounding APIs powered by Bing’s index but rebuilt from the ground up for AI agents. that’s purpose-built for how AI agents search the web—a fundamentally different task than human search. The service already powers search in Copilot and ChatGPT, and Microsoft rebuilt it from scratch to prioritize efficiency and token usage over traditional ranking signals. Agents don’t behave like humans: they don’t stop at one query result but instead fan out across multiple searches, extracting and packaging information for inference. Microsoft engineered Web IQ to minimize tokens per call while running nearly 2.5 times faster than alternatives, according to Microsoft’s internal comparisons, directly reducing cost for platforms that rely on frequent web grounding. The API is currently available only to major LLM platforms, though Microsoft plans wider access as it scales. (Search Engine Land)

Nous makes its claw-like Hermes agent a real app

Nous Research released Hermes Desktop in public preview, a native application for macOS, Windows, and Linux that replaces the command-line interface for its open-source Hermes Agent. The desktop reuses the same agent core, meaning sessions, configuration, API keys, and learned skills sync seamlessly across the CLI, messaging gateways, and the GUI, allowing users to start a task in the desktop and resume it in Slack or Discord without context loss. The interface streams agent responses and tool output in real time, with a side pane that previews web pages and file outputs, plus voice input and output. What sets Hermes apart from a chat wrapper is its closed learning loop: after completing a complex task, the agent writes a reusable skill that self-improves over time through persistent, LLM-summarized memory. Execution sandboxes across five backends (local, Docker, SSH, Singularity, and Modal), and the agent integrates tools through the Model Context Protocol standard. The tool is MIT-licensed and model-agnostic, working with any API endpoint or Nous’s own Portal subscription, avoiding lock-in. (MarkTechPost)

Qwen3.7-Plus packs Max-like punch at lower cost

Alibaba’s Qwen team launched Qwen3.7-Plus, a multimodal model that understands images and video alongside text, available through Model Studio. The model combines vision capabilities with five agentic abilities: deep reasoning, self-programming, tool invocation, verification and testing, and autonomous iteration, all designed to handle long-running tasks that require the model to write code, call external APIs, and refine outputs without human intervention. The preview ranked number 16 overall in Vision Arena, placing Alibaba as the fifth lab in vision benchmarks. The platform adds reinforcement learning mechanisms that use real-world execution feedback to improve accuracy, plus built-in safety guardrails to constrain autonomous tool use. (MarkTechPost)

Codex opens up for more kinds of work

OpenAI released six role-specific plugins for Codex, expanding beyond its software development roots to target analysts, marketers, sales teams, designers, investors, and bankers. The plugins bundle workflows and integrations for 62 apps and 110 skills total, so teams can connect Codex to tools like Salesforce, Figma, Snowflake, and FactSet without writing code. Non-developers already make up 20 percent of Codex’s five million weekly users and are growing three times faster than developers. OpenAI also previewed Sites, a feature that lets business and enterprise customers create shareable interactive web apps from prompts, and extended annotations so users can refine documents and slides in place by selecting specific elements. (OpenAI)

Want to know more about what matters in AI right now?

Read the latest issue of The Batch for in-depth analysis of news and research.

Last week, Andrew discussed the emerging role of AI Forward Deployed Engineers (FDEs) in Silicon Valley, predicted greater demand for AI Engineers than for FDEs, and anticipated more specialization within AI engineering roles.

“What will be the future, specialized AI engineering roles? I don’t know. Perhaps there will be AI FDEs, LLMOps Engineers, Evals Engineers, AI Data Engineers, Harness Engineers, and other roles we don’t have names for yet. But for now, I see a lot of AI engineers who are generalists create a lot of value. Skilled AI Engineers are in very high demand!”

Read Andrew’s letter here.

Other top AI news and research stories covered in depth:

Google’s updated Flash, Gemini 3.5 Flash Pairs Smarts With Speed, approaches top models but raises prices.
European Union regulators pause some AI Act provisions and delete others, affecting the timeline of AI regulations.
A study shows that internet traffic driven by AI tripled last year, highlighting the growing influence of AI-generated content.
Meta enhances image models by planning generated images in stages, improving quality through step-by-step plotting and revising.

A special offer for our community

In case you missed it, DeepLearning.AI launched our first-ever subscription plan for our entire course catalog! As a Pro Member, you’ll immediately enjoy access to:

Nearly 200 AI short and long courses from Andrew Ng and industry experts
Labs and quizzes to test your knowledge
Projects to share with employers
Certificates to testify to your new skills
A community to help you advance at the speed of AI

Enroll now to lock in a year of full access for $25 per month paid upfront, or opt for month-to-month payments at just $30 per month. Both payment options begin with a one-week free trial. Explore Pro’s benefits and start building today!

Try Pro Now

Data Points is produced by human editors with AI assistance.

A special offer for our community

Subscribe to Data Points