Meta’s newest world model research project Google’s Data Commons now available via MCP

Published
Sep 26, 2025
Reading time
4 min read
Scientists in a modern lab analyze data on multiple screens, wearing lab coats and collaborating in a research environment.

In today’s edition of Data Points, you’ll learn more about:

  • DeepSeek’s recent model update
  • SWE-Bench-Pro, a tough new coding benchmark
  • ChromeDevTools, a new way for agents to control a web browser
  • Microsoft’s latest move to open up Copilot to more models

But first:

Meta unveils an open world model for code generation research

Meta’s FAIR CodeGen Team released Code World Model (CWM), a 32-billion-parameter dense, decoder-only model designed to advance research on code generation with world models. CWM specializes in reasoning about how code and commands affect program or system state, using mid-training on over 200 million Python execution traces and 3 million agentic environment trajectories, followed by extensive multi-task reinforcement learning in verifiable coding, math, and software engineering settings. The model supports up to 131,000 tokens of context, uses Grouped-Query Attention, and achieves strong results on SWE-bench Verified (53.9 percent pass@1 base, 65.8 percent with test-time scaling), LiveCodeBench v5 (68.6 percent), and Math-500 (96.6 percent). Researchers can download CWM and its checkpoints from ai.meta.com and Hugging Face, run it on a single 80 GB GPU with quantization, and find code and documentation at GitHub. (Meta)

Google opens Data Commons server to improve AI access to public datasets

Google’s Data Commons platform collects data from sources like government surveys and the United Nations. The new MCP server enables integration with AI tools through simple prompts. The open MCP standard allows compatibility with any large language model, and Google provides starter kits and sample code through Colab, PyPI, and GitHub. This move offers training, fine-tuning, and grounding datasets rooted in verified, real-world data, which helps developers fine-tune systems for specific use cases and avoid hallucinations. The MCP Server and associated tools are available to the public at no cost. (GitHub)

DeepSeek-V3.1-Terminus update improves agent performance

DeepSeek updated its base 3.1 language model, fixing language mixing issues and abnormal character output, and enhancing the performance of its Code Agent and Search Agent. The update led to gains on several benchmarks, including a higher BrowseComp score (38.5 vs 30.0) and improved results on SWE Verified and Terminal-bench. The new version keeps the same model structure as DeepSeek-V3.1, so developers can use existing tools and templates, with additional demo code provided for easier inference. DeepSeek-V3.1-Terminus and its weights are available under the MIT License. (Hugging Face)

Scale AI introduces new benchmark tests for coding models

A research team at Scale AI released SWE-Bench Pro, a benchmark of 1,865 difficult software engineering problems from 41 active repositories, to better reflect complex, enterprise-level coding challenges. The benchmark divides tasks into public, held-out, and commercial sets, drawing on business applications, B2B services, and developer tools, with only the public set freely available. All problems have been human-verified and often require multi-file patches and substantial code changes. Top models, including GPT-5 and Claude Opus 4.1, resolved less than 25 percent of public set problems and scored even lower on commercial tasks, underscoring remaining limitations of current AI models for professional-grade software development. The new benchmark steps into a crowded field for agentic coding benchmarks, including the original SWE-Bench and the more difficult SWE-Bench Verified. SWE-Bench Pro is open for public research, but commercial access remains restricted. (arXiv)

Google opens Chrome DevTools MCP for public preview

Google released a public preview of Chrome DevTools MCP, a tool that lets AI coding agents control and inspect a live Chrome browser. The server lets agents run performance tests, inspect page structure, execute JavaScript, and automate user actions, helping them check what actually happens on live development websites. MCP aims to fix a common problem with code-generating AI: most agents cannot see or test the pages they build in a real browser. This release lets developers connect their AI assistants directly to Chrome for better bug fixing and performance checks. Developers can install the server using npx on Node.js version 22 or later, and it supports agent clients like Gemini CLI, Claude Code, Cursor, and GitHub Copilot. (GitHub)

Microsoft adds Anthropic models to Copilot for 365, Studio

Microsoft will let business users of its Copilot AI assistant choose between models from OpenAI and Anthropic for tasks like digital research and building custom AI tools. Anthropic’s Claude Opus 4.1 will power the Researcher feature in Microsoft 365 Copilot, while Copilot Studio users can also access the lighter Claude Sonnet 4 model. Microsoft’s move opens Copilot to a leading OpenAI competitor and gives users the option to toggle between different AI engines. Microsoft’s decision reflects a broader industry trend, as most providers now offer access to a range of AI models from multiple companies, but is unusual given the company’s close partnership and investment stake in OpenAI. Copilot users can access Anthropic models starting Wednesday, with continued availability of OpenAI models. (Microsoft)


Still want to know more about what matters in AI right now?

Read this week’s issue of The Batch for in-depth analysis of news and research.

This week, Andrew Ng talked about China's move to bar major tech companies from buying Nvidia chips, signaling its progress in semiconductor self-sufficiency, and the implications for U.S. reliance on Taiwan's chip manufacturing.

“Specifically, it signals that China has progressed sufficiently in semiconductors to break away from dependence on advanced chips designed in the U.S., the vast majority of which are manufactured in Taiwan. It also highlights the U.S. vulnerability to possible disruptions in Taiwan at a moment when China is becoming less vulnerable.”

Read Andrew’s full letter here.

Other top AI news and research stories we covered in depth:


Subscribe to Data Points

Share

Subscribe to Data Points

Your accelerated guide to AI news and research