GPT-5.4 Pro challenges Gemini 3.1 Pro Preview Luma’s new state-of-the-art image editor/interpreter

Data Points

Published

Mar 9, 2026

Reading time

5 min read

In today’s edition of Data Points, you’ll learn more about:

Microsoft’s new open-weights vision reasoning model
Yuan 3.0 Ultra, a document-retrieving juggernaut
OpenAI’s hardware leader’s resignation
Black Forest’s new training method for image models

But first:

OpenAI pushes the frontier with new GPT-5.4 models

OpenAI released GPT-5.4 and GPT-5.4 Pro across ChatGPT, the API, and Codex. The new models enable agents to operate computers directly by using screenshots, mouse commands, keyboard input, and browser automation libraries. The models support up to 1 million tokens of context and introduces tool search, which reduces token consumption by 47 percent on multi-tool benchmarks by fetching tool definitions on demand rather than loading them upfront. On GDPval, a benchmark testing agent performance across 44 professional occupations, GPT-5.4 matches or exceeds human professionals in 83 percent of comparisons, up from 71 percent for GPT-5.2. On OSWorld-Verified, which measures desktop environment navigation, it achieves 75 percent success — surpassing human performance at 72 percent and nearly doubling GPT-5.2’s 47 percent. The model reduces hallucinations by 33 percent and delivers 15 percent faster token velocity in Codex’s fast mode. Image processing now supports up to 10.24 million pixels. GPT-5.4 costs $2.50/$0.25/$15.00 per million input/cached/output tokens, and GPT-5.4 Pro costs $30/$180 per million input/output tokens, a bump over GPT-5.2’s prices. (OpenAI)

Luma’s Uni-1 beats Nano Banana 2 on image generation benchmarks

Luma AI released Uni-1, a decoder-only autoregressive transformer that combines visual understanding and generation in a single architecture. The model processes text and images as interleaved sequences, performing structured internal reasoning before and during image synthesis to decompose instructions, resolve constraints, and plan compositions. Uni-1 achieves state-of-the-art results on RISEBench, a benchmark for reasoning-informed visual editing that tests temporal, causal, spatial, and logical capabilities. On ODinW-13, which measures open vocabulary dense detection, Uni-1 demonstrates that learning to generate images improves fine-grained visual understanding of regions, objects, and layouts. Luma argues its unified architecture can extend to video, voice agents, and interactive world simulators by treating visual generation and reasoning as complementary capabilities rather than separate systems. (Luma)

Microsoft debuts new open-weights vision-language flagship model

Microsoft released Phi-4-reasoning-vision-15B, a 15 billion parameter open-weight multimodal model available through Microsoft Foundry, HuggingFace, and GitHub. The model handles vision-language tasks including image captioning, document reading, and screen element grounding, with particular strength in math, science reasoning, and computer interface understanding. Microsoft trained it on 200 billion multimodal tokens — roughly five times fewer than competing models like Qwen 2.5 VL and Gemma 3 — by building on the Phi-4-Reasoning language model and using careful data curation rather than scale. The architecture uses mid-fusion with a SigLIP-2 Naflex vision encoder supporting dynamic resolution up to 3600 tokens. Microsoft’s studies revealed that dynamic resolution encoders substantially outperform fixed-resolution approaches on high-resolution benchmarks, and that balancing reasoning-heavy math data with computer-use data improved performance across all domains. (Microsoft)

Yuan 3.0 Ultra achieves new SOTA on retrieval and summary tasks

Yuan Lab AI released Yuan 3.0 Ultra, an open-weights Mixture-of-Experts model with one trillion total parameters and 68.8 billion activated parameters. The model introduces Layer-Adaptive Expert Pruning (LAEP), which identifies and removes underutilized experts during pre-training rather than after training, reducing total parameters from 1.5 to one trillion while preserving performance. An Expert Rearranging algorithm balances token loads across GPUs to minimize variance, achieving 92.60 TFLOPS per GPU compared to 62.14 for the base model, a 49 percent efficiency improvement split between pruning (32.4 percent) and rearrangement (15.9 percent). The model also uses a Revised Reflection Inhibition Reward Mechanism during reinforcement learning to prevent overthinking on simple tasks, yielding a 16.33 percent accuracy gain and 14.38 percent reduction in output length. Yuan 3.0 Ultra achieves state-of-the-art results on enterprise benchmarks like Docmatix (67.4 percent) and ChatRAG (68.2 percent average), outperforming GPT-5.2 and Kimi K2.5 on multimodal retrieval and long-context tasks. (MarkTechPost and arXiv)

OpenAI hardware chief resigns over company’s Pentagon deal

Caitlin Kalinowski, who led hardware initiatives at OpenAI, resigned on Saturday over concerns about the company's agreement to deploy AI models on the U.S. Department of Defense's classified networks. In an X post, Kalinowski cited insufficient deliberation before the deal announcement, specifically raising concerns about surveillance of Americans without judicial oversight and autonomous lethal systems without human authorization. She described the situation as fundamentally a governance problem, arguing the Pentagon agreement was rushed without adequate safeguards defined upfront. OpenAI responded by reiterating that its technology includes red lines prohibiting domestic surveillance and autonomous weapons, and stated it would continue engaging employees and stakeholders on these issues. Kalinowski's departure signals internal friction at OpenAI over the speed and conditions of defense partnerships, even as the company maintains it has protective measures in place. (Reuters)

Self-Flow technique replaces REPA in Black Forest image models

Black Forest Labs Researchers introduced Self-Flow, a self-supervised flow matching framework that learns semantic representations alongside generative capabilities. The method uses Dual-Timestep Scheduling, which applies different noise levels across tokens to create information asymmetry that forces models to infer missing information from corrupted inputs. A single 4 billion parameter model trained on 200 million images and 6 million videos outperformed standard flow matching and REPA across image, video, and audio generation after just 100,000 high-resolution fine-tuning steps. The approach shows improved scaling behavior compared to REPA, with performance gaps widening as model size increases from 290 million to 1 billion parameters. Self-Flow also enables effective multi-modal training on a single model and shows potential for world model applications through successful video-action prediction tasks. (Black Forest)

Want to know more about what matters in AI right now?

Read the latest issue of The Batch for in-depth analysis of news and research.

Last week, Andrew Ng talked about the launch of Context Hub, a tool designed to provide coding agents with up-to-date API documentation to improve their coding accuracy, and the potential for agents to share information through platforms like Moltbook.

“If you’re building AI systems using modern technologies, your coding agent will often use outdated APIs, hallucinate parameters, or not even know about the tool it should be using. This happens because AI tools are rapidly evolving, and coding agents were trained on old data that does not reflect the latest tools.”

Read Andrew’s letter here.

Other top AI news and research stories covered in depth:

Nano Banana 2 Ups Performance/Price as Gemini 3.1 Flash Image enhanced photo generation and editing speed and ease.
The U.S. Dept. of War Dismissed Anthropic, Embraced OpenAI, with Secretary of War Pete Hegseth citing Claude as a security risk and opting for GPT models.
OpenAI launched Management for Agents, introducing its Frontier agent insights and orchestration platform to select customers.
Agent Solved Stubborn Math Problems as Google’s Aletheia utilized Gemini 3 Deep Think to discover original solutions in mathematics.

A special offer for our community

DeepLearning.AI recently launched the first-ever subscription plan for our entire course catalog! As a Pro Member, you’ll immediately enjoy access to:

Over 150 AI courses and specializations from Andrew Ng and industry experts
Labs and quizzes to test your knowledge
Projects to share with employers
Certificates to testify to your new skills
A community to help you advance at the speed of AI

Enroll now to lock in a year of full access for $25 per month paid upfront, or opt for month-to-month payments at just $30 per month. Both payment options begin with a one week free trial. Explore Pro’s benefits and start building today!

Try Pro Membership

Data Points is produced by human editors with AI assistance.

A special offer for our community

Subscribe to Data Points