Welcome back! In today’s edition of Data Points, you’ll learn more about:
- VibeThinker-1.5B, a small but powerful reasoning model
- Toymakers’ recall of AI dolls that tell kids how to start fires
- Qwen3-Max’s discounts and the latest AI price war
- SIMA 2, Google’s self-learning game-playing model
But first:
OpenAI trained sparse neural networks to better interpret them
An interpretability team at OpenAI developed a new training method that forces language models to use far fewer connections between neurons, creating simpler networks that researchers can more easily understand. The team trained models similar to GPT-2 but constrained most weights to zero, limiting each neuron to only a few dozen connections instead of thousands. For simple tasks, researchers successfully isolated minimal “circuits” of neurons that perform specific, traceable operations — like a five-channel circuit that matches Python quote types by detecting, classifying, and copying the correct quote. This mechanistic interpretability approach could provide a path to reverse-engineer AI behavior, though significant challenges remain to scale the technique to larger, frontier models. (OpenAI)
Baidu launches ERNIE 5.0 to compete with GPT-5 and Gemini 2.5 Pro
Chinese search giant Baidu unveiled ERNIE 5.0, a proprietary model that processes and generates content across text, images, audio, and video. The model is available through Baidu’s ERNIE Bot website and Qianfan cloud platform API. It’s less expensive than GPT-5.1 and Gemini 2.5 Pro, priced at $0.85 per million input tokens and $3.40 per million output tokens. According to Baidu’s internal benchmarks, ERNIE 5.0 matched or beat GPT-5 and Gemini 2.5 Pro in multimodal reasoning, document understanding, and image-based question answering. The company showed particularly strong performance on structured document and chart analysis. Independent verification of Baidu’s performance claims is pending. (VentureBeat)
VibeThinker-1.5B matches much larger models for just $7,800
Weibo released VibeThinker-1.5B, a 1.5 billion parameter language model that matches or exceeds the mathematical reasoning of models with hundreds of times more parameters. On three major math benchmarks (AIME24, AIME25, and HMMT25), the model scored 80.3, 74.4, and 50.4 respectively. These scores surpass DeepSeek-R1 despite that model having 400 times more parameters. The model also achieved competitive code generation scores of 55.9 on LiveCodeBench v5 and 51.1 on v6. VibeThinker-1.5B uses a training framework that first explores solution diversity during supervised fine-tuning, then optimizes correct signals through reinforcement learning. The model is available under an MIT license. (Hugging Face)
AI-powered toys fail safety tests, give kids dangerous advice
Consumer advocacy group PIRG tested four AI-enabled toys and found that none met basic safety standards for children. The worst performer, a teddy bear called Kumma from Chinese company FoloToy, provided detailed instructions on using matches and knives, discussed sexual kinks unprompted, and explained “teacher-student roleplay” involving spanking. The toys also raised serious privacy concerns, with constant listening, biometric data storage for up to three years, and voice recordings processed by third parties. PIRG’s researchers found that despite OpenAI’s policy against children using ChatGPT, several toys use GPT-4o as their default model and lack parental controls or usage limits. FoloToy has suspended sales of Kumma and launched an internal safety audit in response to the findings. (The Register)
Alibaba cuts prices for Qwen3-Max AI model by nearly half
Alibaba Cloud reduced pricing for its Qwen3-Max model by almost 50 percent, lowering the minimum cost to $0.459 per million input tokens and $1.836 per million output tokens. The trillion-parameter model, launched in September as one of Alibaba’s most expensive offerings, now includes an additional 50 percent discount for batch API calls during non-peak hours. The price cuts follow recent model releases from Chinese AI startups like Moonshot AI, Zhipu AI, and MiniMax, each emphasizing performance and cost efficiency. The move reflects fierce competition in China’s AI market, which has experienced multiple price wars in recent years, including battles over coding models and foundational AI systems. (South China Morning Post)
Google DeepMind’s SIMA 2 learns to play video games on its own
Google’s new SIMA 2 agent can play video games, follow instructions, and learn through self-directed play. The system uses Gemini’s reasoning capabilities to understand goals and execute multi-step tasks across diverse 3D gaming environments. SIMA 2 can interpret sketches, emojis, and multiple languages, and it improves its performance through trial-and-error without human help. The research could eventually be applied to general embodied intelligence with potential applications in robotics. Google is releasing SIMA 2 as a limited research preview to academics and game developers. (Google)
A special offer for our community
DeepLearning.AI recently launched the first-ever subscription plan for our entire course catalog! As a Pro Member, you’ll immediately enjoy access to:
- Over 150 AI courses and specializations from Andrew Ng and industry experts
- Labs and quizzes to test your knowledge
- Projects to share with employers
- Certificates to testify to your new skills
- A community to help you advance at the speed of AI
Enroll now to lock in a year of full access for $25 per month paid upfront, or opt for month-to-month payments at just $30 per month. Both payment options begin with a one week free trial. Explore Pro’s benefits and start building today!
Want to know more about what matters in AI right now?
Read the latest issue of The Batch for in-depth analysis of news and research.
Last week, Andrew Ng talked about the misconceptions surrounding AI’s capabilities, emphasizing that while AI was impressive, it still had significant limitations and required customization for specific tasks.
“AI is amazing, but it has unfortunately been hyped up to be even more amazing than it is. A pernicious aspect of hype is that it often contains an element of truth, but not to the degree of the hype. This makes it difficult for nontechnical people to discern where the truth really is. Modern AI is a general purpose technology that is enabling many applications, but AI that can do any intellectual tasks that a human can (a popular definition for AGI) is still decades away or longer.”
Read Andrew’s letter here.
Other AI news and research stories we covered that might scare you to your bones:
- Character AI and OpenAI implemented policy changes to protect younger and vulnerable users, aiming for safer and more responsible chatbot interactions.
- HunyuanImage-3.0 improved image generation by using reinforcement learning and thinking tokens to better interpret and respond to prompts.
- The State of AI Report 2025 highlighted that AI’s barriers were not technological but social and material, marking a pivotal year for AI’s industrial adoption.
- Amazon’s Chronos-2 advanced forecasting by sorting out tangled variables to make better predictions across multiple time series.