In today’s edition of Data Points, you’ll learn more about:
- GPT-5.1-Codex-Max, OpenAI’s improved long-context coding model
- Music startup Klay’s reported deal with Universal, Warner, and Sony
- DeepSeek R1 Slim, a trim, decensored reasoning model
- NTT’s Tsuzumi 2, an efficient model optimized for Japan
But first:
Meta’s SAM 3 adds text prompts and video tracking
Meta released Segment Anything Model 3 (SAM 3), a unified AI model that detects, segments, and tracks objects in images and videos using text, exemplar, and visual prompts. The model accepts open-vocabulary text prompts like “striped red umbrella” rather than fixed label sets, and delivers a 2x performance gain over existing systems on Meta’s new SA-Co benchmark. Meta built SAM 3 using a hybrid data engine combining human annotators with AI models, including Llama-based systems, which annotated data 5x faster than humans alone and created a training set with over 4 million unique concepts. SAM 3 enables new features across Meta’s products, including object-specific effects in Instagram’s Edits app and a View in Room feature for Facebook Marketplace. Meta released model weights, fine-tuning code, evaluation datasets, and the Segment Anything Playground platform for public experimentation. (Meta)
Google releases Gemini 3, claiming top spot on AI leaderboards
Google launched Gemini 3, its newest multimodal model, which scored 1,501 Elo on the LMArena Leaderboard and topped the WebDev Arena with 1,487 Elo. The model achieved 91.9 percent on GPQA Diamond, 81 percent on MMMU-Pro, and 76.2 percent on SWE-bench Verified, demonstrating advances in reasoning, multimodal understanding, and coding capabilities. Google previewed Gemini 3 Deep Think, an enhanced reasoning mode that scored 93.8 percent on GPQA Diamond and 45.1 percent on ARC-AGI-2. The company also introduced Google Antigravity, an agentic development platform that enables autonomous planning and execution of complex software tasks. Gemini 3 is now available in the Gemini app, AI Studio, Vertex AI, and third-party platforms like Cursor and GitHub. Gemini 3 Deep Think will roll out to Google AI Ultra subscribers in the coming weeks following additional safety testing. (Google)
OpenAI released GPT-5.1-Codex-Max, a coding model designed for long-running tasks
GPT-5.1-Codex-Max uses 30 percent fewer thinking tokens than its predecessor while achieving better performance on benchmarks like SWE-bench Verified, and can work independently for more than 24 hours on complex tasks. The model is OpenAI’s first to be natively trained to operate across multiple context windows through a process called “compaction,” enabling it to work coherently over millions of tokens in a single task for project-scale refactors, debugging sessions, and multi-hour agent loops. OpenAI noted that it is their most capable cybersecurity model to date and implemented additional safeguards to prevent misuse. GPT-5.1-Codex-Max is available now in Codex for ChatGPT Plus, Pro, Business, Edu, and Enterprise plans, with API access coming soon. (OpenAI)
Klay becomes first AI company to license music from all three major labels
Klay secured licensing agreements with Universal Music Group, Sony Music, and Warner Music Group to build a streaming service that lets users remake songs with AI tools, Bloomberg reported. The startup licensed thousands of hit songs to train its large language model and promised artists and labels control over how their work is used. Klay is led by music producer Ary Attie and employs former executives from Sony Music and Google’s DeepMind. The deals mark a shift in the music industry’s approach to AI, as labels try to embrace the technology while protecting their copyrights amid ongoing lawsuits against other AI music companies like Suno. (Bloomberg)
Spanish quantum physicists claim to have removed censorship from DeepSeek R1
Researchers at Multiverse Computing created DeepSeek R1 Slim, a version of the Chinese reasoning model that is 55 percent smaller and allegedly free of government-imposed censorship. The team used tensor networks — a mathematical technique borrowed from quantum physics — to compress the model while selectively removing specific information, including censorship filters required by Chinese regulations. They tested the modified model on approximately 25 politically sensitive questions, such as references to President Xi Jinping and the Tiananmen Square protests, and used GPT-5 to evaluate whether responses matched Western models’ factual output. The work reflects broader industry efforts to make AI models more efficient and raises questions about how censorship embedded in Chinese open-source models shapes the global AI ecosystem. But experts warn that fully removing censorship from models trained on restricted data may be more complex than a small test set can verify. (MIT Technology Review and Multiverse Computing)
NTT’s lightweight model challenges the need for massive GPU infrastructure
NTT launched Tsuzumi 2, a large language model that runs on a single GPU instead of the dozens or hundreds that most enterprise AI systems require. In internal tests for financial-system inquiries, the model performed as well as much larger systems while using a fraction of the computing resources. Tokyo Online University deployed it on-premise to handle course Q&A, create teaching materials, and provide student guidance—keeping sensitive data on campus while avoiding the cost of building GPU clusters. The model works particularly well with Japanese text and includes specialized knowledge in finance, medicine, and public sector applications, allowing organizations to deploy it without extensive customization. For enterprises concerned about sending proprietary data to cloud-based AI services, localized models like Tsuzumi 2 offer an alternative: run the model locally, process sensitive information internally, and handle text, images, and voice without managing multiple specialized systems. (NTT)
A special offer for our community
DeepLearning.AI recently launched the first-ever subscription plan for our entire course catalog! As a Pro Member, you’ll immediately enjoy access to:
- Over 150 AI courses and specializations from Andrew Ng and industry experts
- Labs and quizzes to test your knowledge
- Projects to share with employers
- Certificates to testify to your new skills
- A community to help you advance at the speed of AI
Enroll now to lock in a year of full access for $25 per month paid upfront, or opt for month-to-month payments at just $30 per month. Both payment options begin with a one week free trial. Explore Pro’s benefits and start building today!
Still want to know more about what matters in AI right now?
Read this week’s issue of The Batch for in-depth analysis of news and research.
This week, Andrew Ng talked about the AI Dev x NYC conference, highlighting the optimism in the AI community despite broader skepticism, and emphasized the importance of in-person events for sparking new opportunities and collaborations.
“The event was full of conversations about coding with AI, agentic AI, context engineering, governance, and building and scaling AI applications in startups and in large corporations. But the overriding impression I took away was one of near-universal optimism about our field, despite the mix of pessimism and optimism about AI in the broader world.”
Read Andrew’s full letter here.
Other top AI news and research stories we covered in depth:
- Waymo deployed self-driving cars on expressways in California and Arizona, marking an important step in integrating autonomous vehicles on U.S. freeways.
- Kimi K2 Thinking outperformed proprietary models with new techniques for agentic tool use, showing leading results with open weights.
- A recent Anthropic cyberattack report sparked controversy, as security researchers questioned the potential for unprecedented automated attacks carried out by coding agents.
- Researchers developed more efficient agentic search by fine-tuning models to search within their own parameters, which significantly improved recall.