Inside Olmo 3, a new family of fully open models Grok 4.1’s uneasy balance between EQ and sycophancy

Data Points

Published

Nov 24, 2025

Reading time

5 min read

Welcome back! In today’s edition of Data Points, you’ll learn more about:

Nano Banana Pro, Google’s updated image generator
Anthropic’s latest partnerships with Microsoft and Nvidia
Memo, a home robot trained on real-life human tasks
A new AI play modeled on legendary French playwright Molière’s work

But first:

Olmo 3 opens complete development pipeline to researchers

The Allen Institute for AI released Olmo 3, a family of open-source language models that exposes the entire “model flow”: every training stage, checkpoint, dataset, and dependency required to create and modify the models. The release includes Olmo 3-Base (7 billion and 32 billion parameters), Olmo 3-Think (the strongest fully open 32 billion-parameter reasoning model), Olmo 3-Instruct (for chat and tool use), and Olmo 3-RL Zero (for reinforcement learning experiments). Olmo 3-Base outperforms other fully open base models on benchmarks for programming, reading comprehension, and math, while Olmo 3-Think narrows the gap with leading open-weight models like Qwen 3 despite training on roughly six times fewer tokens. The release enables researchers to trace model behaviors back to specific training data and decisions, fork development at any stage, and conduct experiments that require full visibility into how AI systems learn, all of which help address concerns about transparency and accountability in AI development. All components, including the 9.3 trillion-token Dolma 3 training corpus and post-training datasets, are available under permissive open-source licenses. (Allen AI)

xAI’s Grok 4.1 tops emotional intelligence leaderboard

Grok 4.1 now leads EQ-Bench3, a benchmark that measures how well language models handle emotional intelligence through roleplay scenarios. The model beat GPT-4o and Claude 3.5 Sonnet on metrics like empathy and interpersonal skills, but it also became more overly agreeable and flattering, even when it’s wrong. This trade-off between emotional warmth and truthfulness is a challenge that all major AI labs are dealing with as they tune their models. For developers building customer support, coaching, or wellness apps, this means picking a high-EQ model now requires weighing the benefits against the risk of a system that prioritizes agreeableness over accuracy. The benchmark itself relies on another AI to judge responses, which raises questions about whether models are developing real emotional intelligence or just learning to please other AI systems. (xAI and i10x.ai)

Gemini’s latest image generator has landed

Google released Nano Banana Pro, an image generation model built on Gemini 3 Pro that creates detailed visuals with accurate text rendering in multiple languages. The model can generate educational infographics, translate text within images, and combine up to 14 input images while keeping up to five people looking consistent across compositions. It also offers professional controls like adjustable lighting, camera angles, and color grading, with output available in resolutions up to 4K. The model is rolling out across Google products including the Gemini app (with limited free quotas), Google Ads, Workspace tools, and developer platforms like Vertex AI. All generated images include Google’s SynthID watermark for verification. (Google)

Anthropic’s valuation soars with new cloud partnerships

Microsoft and Nvidia announced investments of up to $5 billion and $10 billion respectively in Anthropic on Tuesday, pushing the AI startup’s valuation to around $350 billion, up from $183 billion in September. Anthropic committed to purchasing $30 billion of Azure compute capacity from Microsoft and up to 1 gigawatt of compute capacity from Nvidia, while Nvidia will collaborate with Anthropic on engineering and design to optimize Claude models for its architectures. The partnerships mark a strategic shift for Microsoft; backing Anthropic reduces its dependence on OpenAI (where it holds a roughly 27 percent stake valued at $135 billion). The deals reshapes the competitive landscape for AI developers, with Anthropic now simultaneously backed by Microsoft, Nvidia, Google, and Amazon, cementing Claude’s developer as a central player with the industry’s major cloud providers and chip makers. (Microsoft and CNBC)

Sunday unveils Memo, a home robot trained on millions of tasks

Sunday Robotics emerged from stealth with Memo, a wheeled home robot designed to handle chores like dishes, laundry, and tidying. The company trained Memo using roughly 10 million recordings of household routines collected from over 500 homes, where workers wore Sunday’s Skill Capture Glove, a $400 wearable that captures human movements more accurately than standard remote control methods. Memo can make espresso, clear tables, and load dishwashers. However, it works slowly and the real test will be how well it performs in actual homes without engineers present. The approach tackles a key problem in robotics: most home robots fail because they’re trained in labs rather than messy, unpredictable real-world environments. Sunday will accept applications for a beta program starting November 19, 2025, with 50 households receiving numbered robots in late 2026. (Sunday and Wired)

AI-generated Molière play to debut at Palace of Versailles

French scholars, artists, and AI firm Mistral collaborated to create “L’Astrologue ou les Faux Presages” (The Astrologer or the False Omens), a comedy imagining what 17th-century playwright Molière might have written next had he not died at age 51. The AI model analyzed Molière’s complete works to generate a play satirizing astrologers, centering on a gullible bourgeois deceived by a fraudulent fortune-teller. Researchers and scholars corrected historical inaccuracies and refined the AI’s output throughout the production process. The project suggests how AI can help scholars gain new insights into classic literature by identifying patterns scattered across an author’s body of work. The play will premiere in 2026 at the Palace of Versailles, where Molière’s patron Louis XIV once held court. (Reuters)

A special offer for our community

DeepLearning.AI recently launched the first-ever subscription plan for our entire course catalog! As a Pro Member, you’ll immediately enjoy access to:

Over 150 AI courses and specializations from Andrew Ng and industry experts
Labs and quizzes to test your knowledge
Projects to share with employers
Certificates to testify to your new skills
A community to help you advance at the speed of AI

Enroll now to lock in a year of full access for $25 per month paid upfront, or opt for month-to-month payments at just $30 per month. Both payment options begin with a one week free trial. Explore Pro’s benefits and start building today!

Try Pro Membership

Want to know more about what matters in AI right now?

Read the latest issue of The Batch for in-depth analysis of news and research.

Last week, Andrew Ng talked about the AI Dev x NYC conference, highlighted the optimism in the AI community despite broader skepticism, and emphasized the importance of in-person events for sparking new opportunities and collaborations.

“Speaking with fellow developers, I realized that because of AI’s low penetration in businesses, it is simultaneously true that (a) many businesses do not yet have AI delivering significant ROI, and (b) many skilled AI teams are starting to deliver significant ROI and see the number of successful AI projects climbing rapidly, albeit from a low base. This is why AI developers are bullish about the growth that is to come!”

Read Andrew’s letter here.

Other top AI news and research stories we covered in depth:

Waymo deployed self-driving cars on expressways in California and Arizona, marking an important step in integrating autonomous vehicles on U.S. freeways.
Kimi K2 Thinking outperformed proprietary models with new techniques for agentic tool use, showing leading results with open weights.
A recent Anthropic cyberattack report sparked controversy, as security researchers questioned the potential for unprecedented automated attacks carried out by coding agents.
Researchers developed more efficient agentic search by fine-tuning models to search within their own parameters, which significantly improved recall.

Subscribe to Data Points