DeepSeek outlines V3 training, hardware limits OpenAI’s Codex now assists with code in the cloud

Data Points

Published

May 16, 2025

Reading time

4 min read

In today’s edition, you’ll learn more about:

Windsurf introduces SWE-1 family of coding/engineering models
Stripe adapts transformer architecture for versatile payments model
Alibaba’s top video model gets another boost
U.S. Republicans make an end run around local AI regulations

But first:

DeepSeek-V3 reveals hardware bottlenecks in model training

Researchers at DeepSeek-AI published a research paper sharing insights from training their 671 billion parameter language model DeepSeek-V3. The team trained DeepSeek-V3 on 2,048 NVIDIA H800 GPUs and developed several clever workarounds for current hardware constraints. The paper highlights hardware limitations that slow down AI development. The researchers identified three main bottlenecks: limited memory capacity, inefficient computation, and slow communication between GPUs. To address these challenges, they implemented Multi-Head Latent Attention to reduce memory usage, adopted a Mixture of Experts architecture that activates only necessary parts of the model, and utilized FP8 mixed-precision training to maximize performance on existing hardware. Based on their experience, the authors recommend future hardware improvements including better low-precision computation, more efficient GPU interconnections, and faster communication systems to support the next generation of AI models. (arXiv)

OpenAI unveils Codex programming agent in ChatGPT

OpenAI released a research preview of Codex, a cloud-based AI agent that can simultaneously perform multiple software engineering tasks. Codex writes features, answers codebase questions, fixes bugs, and proposes pull requests, with each task running in its own isolated cloud environment preloaded with the user’s repository. The system is powered by codex-1, a version of OpenAI’s o3 reasoning model specifically optimized for software engineering. Codex shows strong performance on coding evaluations and internal benchmarks, outperforming previous models on software engineering tasks. The service is initially rolling out to ChatGPT Pro, Enterprise, and Team users, with Plus and Edu support coming soon. (OpenAI)

Windsurf launches family of models built for coders

Coding assistant Windsurf released its first family of AI models called SWE-1, designed specifically for comprehensive software engineering tasks. The family includes three models: the flagship SWE-1 (comparable to Claude 3.5 Sonnet but less expensive), SWE-1-lite (replacing Windsurf’s previous base model), and SWE-1-mini (powering autocomplete and similar experiences). Windsurf says that SWE-1 is built with “flow awareness” that enables it to work across editors, terminals, and browsers while maintaining context of incomplete states and long-running tasks. Benchmark testing shows SWE-1 performing competitively with large models from major AI labs and significantly outperforming open-weight alternatives. The flagship SWE-1 model will be available to all paid Windsurf users for a promotional period at zero credits per prompt. (Windsurf)

Stripe develops transformer-based model for payment processing

Stripe created a transformer-based payments model that generates vector embeddings for payment transactions, designed to detect fraud and perform other tasks. The self-supervised network, trained on billions of transactions, positions payments in vector space where transactions with similar characteristics cluster together. Stripe’s earlier machine learning models had improved conversion by 15 percent and reduced fraud by 30 percent. This new approach improved card-testing attack detection rates on large users from 59 percent to 97 percent. The same embeddings work across multiple payment tasks including disputes and authorizations, indicating that payment data contains structural patterns and sequential dependencies that benefit from transformer architecture analysis. (Stripe and LinkedIn)

Alibaba launches upgraded video generation and editing model

Alibaba released Wan2.1-VACE, a video generation model that supports creation from text, images, and video inputs while enabling users to edit the generated content. The company is offering two open-weight versions: a comprehensive 14 billion parameter model and a smaller 1.3 billion parameter version designed to run on consumer-grade GPUs with just 8.19 GB of VRAM. The Wan2.1 suite claims superior performance across multiple benchmarks and features unusual capabilities including visual text generation in both Chinese and English. The model also includes Wan-VAE, which can efficiently encode and decode 1080p videos of any length while preserving temporal information. This marks Alibaba’s second update to its video model in a single month, soon after introducing the VACE framework in March, highlighting the fast pace of video generation development. (Hugging Face)

U.S. Congress proposes 10-year ban on state and local AI regulations

In the United States, House Republicans added language to a budget reconciliation bill that would block all state and local governments from regulating artificial intelligence for 10 years. The provision, introduced by Representative Brett Guthrie of Kentucky, would prevent states from enforcing both existing and proposed laws designed to protect citizens from AI systems. If passed, the measure would invalidate several current state laws, including California’s requirement for healthcare providers to disclose AI use and New York’s mandate for bias audits in AI hiring tools. The proposal has sparked backlash from consumer advocacy groups who call it a “giant gift to Big Tech” that would leave consumers unprotected from AI harms like deepfakes and algorithmic bias. The move aligns with the Trump administration’s industry-friendly approach to AI policy, which has already reversed several Biden-era executive orders on AI safety. (Ars Technica)

Still want to know more about what matters in AI right now?

Read this week’s issue of The Batch for in-depth analysis of news and research.

This week, Andrew Ng emphasized how AI’s ability to speed up tasks — not just reduce costs — can unlock significant business growth.

“Beyond reducing the cost of writing software, AI is shortening the time from idea to working prototype, and the ability to test ideas faster is changing how teams explore and invent.”

Read Andrew’s full letter here.

Other top AI news and research stories we covered in depth: Microsoft released training details for its new Phi-4-reasoning models, designed to improve problem-solving efficiency with minimal computing overhead; DeepCoder-14B-Preview showcased how further fine-tuning on coding tasks can enhance the capabilities of smaller reasoning models; European regulators announced changes to the AI Act, aiming to ease liability rules for developers and adjust other provisions; and Meta introduced memory-layer enhancements to Llama-style models, enabling them to recall factual details more accurately without increasing computational demands.

Subscribe to Data Points