Google opens up its video-game world builder DeepSeek’s OCR model parses pages like humans

Published
Feb 2, 2026
Reading time
5 min read
In a library, a woman inspects an ancient manuscript using a magnifier and tech-assisted camera for detailed research.

In today’s edition of Data Points, you’ll learn more about:

  • OpenAI’s internal data analysis agent
  • SERA, open-weights coding models built for agents
  • Google’s gene prediction research in Nature
  • GPT-4o’s swan song

But first:

Google launches “Project Genie” world creation tool

Google DeepMind released Project Genie, an experimental web app powered by its Genie 3 world model that lets users create, explore, and remix interactive environments in real time. The prototype rolled out to Google AI Ultra subscribers in the U.S. and combines text prompts, image uploads, and fine-tuning tools to generate navigable worlds that respond to user actions. Genie 3 generates the environment ahead of the player in real time rather than relying on static 3D snapshots, simulating physics and interactions dynamically—capabilities the company claims enable applications from robotics simulation to historical exploration. The prototype includes three core features: world sketching (prompt-based creation with image preview), world exploration (real-time navigation), and world remixing (modifying existing worlds). Known limitations include 60-second generation caps, inconsistent physics adherence, control latency, and visual fidelity gaps. DeepMind frames Project Genie as foundational research toward general AI systems capable of understanding and navigating real-world complexity, with plans to expand access beyond AI Ultra subscribers. (Google)

DeepSeek unveils document reader that understands page layout

DeepSeek AI released DeepSeek-OCR 2, an improved document reading system that understands pages the way humans do — by following a logical reading order rather than scanning left-to-right, top-to-bottom. The system uses a new encoder called DeepEncoder-V2 that converts 2D document pages into ordered sequences of visual tokens, then passes them to a text decoder. It achieves significantly better results than the original version, improving its overall score from 87.36 to 91.09 on benchmark tests while using fewer visual tokens. The system was trained in three stages using OCR-heavy data, with the final stage freezing the encoder to speed up training. On real-world documents like academic papers and books, DeepSeek-OCR 2 outperforms competitors like Gemini-3 Pro at reading order and text extraction, though it still struggles with very dense newspapers. (GitHub)

Inside OpenAI’s GPT-5.2-powered data agent

OpenAI built a custom internal AI data agent powered by GPT-5.2 that allows its 3,500 employees to query and analyze over 600 petabytes of data across 70,000 datasets using natural language. The agent integrates with Slack, web interfaces, IDEs, and ChatGPT through Model Context Protocol connectors, handling end-to-end analysis from data discovery to SQL generation to report publishing. The system combines multiple context layers including schema metadata, code-level table definitions derived from Codex, historical query patterns, internal documentation from Slack and Google Docs, and a self-learning memory system that retains corrections and nuances across sessions. OpenAI uses its Evals API to continuously test the agent against curated question-answer pairs with golden SQL queries, comparing both generated SQL and result sets to catch regressions. The agent shows a practical application of agentic workflows at scale, using the same Codex, GPT-5.2, Evals API, and Embeddings API tools OpenAI makes available to external developers. (OpenAI)

Building open coding agents for any repository

Ai2 released SERA (Soft-verified Efficient Repository Agents), an open-weights family of coding models that achieve 54.2 percent on SWE-Bench Verified while requiring only 40 GPU days to train on two NVIDIA Hopper GPUs. The strongest model, SERA-32B, matches previous state-of-the-art open models like Devstral Small 2 at 57 times lower training cost than comparable methods. The system uses “soft-verified generation” to create synthetic training data without requiring full correctness testing, and includes a taxonomy of 51 bug patterns to generate diverse training examples from any codebase—making it possible to reproduce top open-source results for roughly $400 or match commercial coding agents for $12,000. SERA achieves 3,700 output tokens per second at FP8 precision on 4xH100 GPUs and scales to 8,600 tokens per second on Blackwell B200 systems, with all models, training recipes, and synthetic data released openly and compatible with Claude Code out of the box. When fine-tuned on specific repositories like Django and SymPy with just 8,000 synthetic samples, the 32B model matches or exceeds its 110B parameter teacher model’s performance, showing that smaller open models can inherit strong agentic behavior through simple supervised fine-tuning without complex reinforcement learning infrastructure. (Ai2)

AlphaGenome brings gene-predicting power to genetics research

Researchers at Google DeepMind unveiled AlphaGenome on Wednesday, an artificial intelligence system trained to predict how genetic mutations alter gene behavior—a capability that could accelerate understanding of cancer and inherited diseases. The model builds on the success of AlphaFold2, which won the 2024 Nobel Prize in Chemistry for predicting protein structures; AlphaGenome applies similar machine learning techniques to genomic data, enabling predictions about thousands of genes. The system can determine whether a mutation will disable a gene or activate it inappropriately, addressing a long-standing challenge in genetics research. Peter Koo, a computational biologist at Cold Spring Harbor Laboratory, called it “an engineering marvel” that represents a significant advance in applying AI to genomic analysis. As with AlphaFold2, widespread adoption by the research community could accelerate drug discovery and our understanding of genetic disease mechanisms. (The New York Times)

OpenAI retires GPT-4o and other older models

OpenAI will discontinue GPT-4o, GPT-4.1, GPT-4.1 mini, and o4-mini from ChatGPT on February 13, 2026, alongside previously announced retirements of GPT-5 Instant and Thinking modes. The company is retiring GPT-4o despite restoring it after initial deprecation, citing that only 0.1 percent of daily users still select it as their preferred model. OpenAI developed GPT-5.1 and GPT-5.2 directly in response to user feedback about GPT-4o’s conversational warmth and creative capabilities, adding customizable personality controls (Friendly, Helpful tones) and adjustable parameters like warmth and enthusiasm to match what users valued about the older model. The company is also addressing broader user complaints about unnecessary refusals and overly cautious responses, with updates coming soon. No changes are happening to the API at this time, and the retirement timeline gives users advance notice to migrate their workflows. (OpenAI)


Want to know more about what matters in AI right now?

Read the latest issue of The Batch for in-depth analysis of news and research.

Last week, Andrew Ng talked about how U.S. policies had driven allies towards sovereign AI, resulting in a growing interest in open-source AI models and heightened global competition.

“Given AI’s strategic importance, nations want to ensure no foreign power can cut off their access. Hence, sovereign AI. Sovereign AI is still a vague, rather than precisely defined, concept. Complete independence is impractical: There are no good substitutes to AI chips designed in the U.S. and manufactured in Taiwan, and a lot of energy equipment and computer hardware are manufactured in China.” 

Read Andrew’s letter here.

Other top AI news and research stories covered in depth:


A special offer for our community

DeepLearning.AI recently launched the first-ever subscription plan for our entire course catalog! As a Pro Member, you’ll immediately enjoy access to:

  • Over 150 AI courses and specializations from Andrew Ng and industry experts
  • Labs and quizzes to test your knowledge
  • Projects to share with employers
  • Certificates to testify to your new skills
  • A community to help you advance at the speed of AI

Enroll now to lock in a year of full access for $25 per month paid upfront, or opt for month-to-month payments at just $30 per month. Both payment options begin with a one week free trial. Explore Pro’s benefits and start building today!

Try Pro Membership

Share

Subscribe to Data Points

Your accelerated guide to AI news and research