During Bay Area blackout, Waymo conks out Meta’s Segment Anything family adds audio

Published
Dec 22, 2025
Reading time
5 min read
Medical office with robotic arms, doctor consulting patient, family observes.

In today’s edition of Data Points, you’ll learn more about:

  • GPT-5.2 Codex, OpenAI’s new agentic coding model
  • Qwen’s image layer segmentation model
  • A2UI, an open Google project to help agents build apps
  • Training robots to learn from human actions

But first:

Waymo autonomous taxis stop traffic during electricity blackout

Waymo suspended its robotaxi service in San Francisco on Saturday evening after a massive blackout caused numerous vehicles to stall at intersections and roads across the city. A fire at a Pacific Gas & Electric substation knocked out power to approximately one hundred twenty thousand customers, with thirty-five thousand still without power on Sunday morning. Social media photos and videos documented Waymo robotaxis blocking traffic as human drivers navigated around disabled vehicles, though Waymo did not explain the specific technical failure that immobilized its fleet. The blackout likely disabled traffic signals throughout the city, and observers speculated that interruptions to cell service or traffic data may have also contributed to the vehicles’ inability to operate. Waymo said it was working with city officials to restore service once infrastructure stability returned. (TechCrunch)

Meta model isolates sounds using text, visual, and temporal prompts

Meta released SAM Audio, a unified model that separates any sound from complex audio mixtures using text descriptions, visual clicks on video, or time-based markers. The model achieves state-of-the-art performance across speech, music, and general sound separation while operating faster than real-time with processing speeds around 0.7 times real-time. SAM Audio relies on Perception Encoder Audiovisual, an extension of Meta’s open source Perception Encoder that trained on over 100 million videos to align video frames with audio at precise moments. Meta also released SAM Audio-Bench, a new audio separation benchmark, and SAM Audio Judge, an evaluation model that assesses separation quality based on human-rated perceptual dimensions. All models are available today through Meta’s Segment Anything Playground. (Meta)

GPT-5.2 Codex launch leans into cybersecurity applications

OpenAI released GPT-5.2-Codex, an AI model optimized for complex software engineering tasks and cybersecurity work. The model improved its performance on long-horizon coding tasks, handles large-scale refactors and migrations more effectively, and works better in Windows environments. GPT-5.2-Codex achieved state-of-the-art results on SWE-Bench Pro and Terminal-Bench 2.0, benchmarks that test agentic performance in realistic terminal environments. The model is available now to paid ChatGPT users through Codex surfaces, with API access coming in the following weeks. OpenAI also announced an invite-only trusted access program for vetted security professionals and organizations focused on defensive cybersecurity, who will receive access to more permissive versions of the model for legitimate security research, vulnerability disclosure, and authorized red-teaming work. (OpenAI)

New Photoshop-like model decomposes images into layers

Alibaba's Qwen unit released Qwen-Image-Layered, an AI model that automatically separates images into individual layers with transparent backgrounds in RGBA format, similar to how professional image editing software like Photoshop organizes visual elements. The model enables downstream editing workflows by decomposing complex images into isolated, modifiable components rather than requiring users to manually extract and separate elements. This approach allows editors to adjust individual layers independently without affecting other parts of an image. The layered output format integrates directly with standard image editing pipelines and tools. Qwen made the model available as part of its expanding suite of multimodal AI capabilities. (Decoder)

New Google project builds app UIs using structured data

Google released A2UI, an open-source project that lets AI agents generate user interfaces that render securely across different platforms and frameworks. The system uses a declarative JSON format that allows agents to compose interfaces from pre-approved component catalogs, avoiding the security risks of executing AI-generated code while giving host applications full control over styling and branding. A2UI works with various UI frameworks including Lit, Angular, and Flutter, and integrates with Google's Agent-to-Agent protocol to enable remote agents to send interface descriptions as messages rather than executable code. The project includes renderers, component libraries, and sample implementations like a restaurant booking agent that can generate date pickers and time selectors instead of relying on text-only chat interactions. Google built A2UI in partnership with teams including AG UI, CopilotKit, Opal, Gemini Enterprise, and Flutter, with the code now available on GitHub for community collaboration. (Google)

Robotics startup shows big gains from watching humans

Physical Intelligence reports that incorporating human video data into the fine-tuning process improved its vision-language-action model for robotics. The approach leverages human demonstrations to enhance robot capabilities across visual understanding and action planning. By training on human movement patterns, the model gains improved generalization for physical tasks. This method addresses a key challenge in robotics: acquiring sufficient high-quality training data for diverse manipulation and navigation scenarios. The technique suggests robots can learn more efficiently from human behavior rather than requiring purely synthetic or robot-generated training data. (Physical Intelligence)


Want to know more about what matters in AI right now?

Read the latest issue of The Batch for in-depth analysis of news and research.

Last week, Andrew Ng talked about the limitations and current challenges of improving LLMs, emphasizing the need for a data-centric approach and the piecemeal nature of advancements in AI models.

“A typical human, despite having seen vastly less text or practiced far less in computer-use training environments than today’s frontier models, nonetheless can generalize to a far wider range of tasks than a frontier model. Humans might do this by taking advantage of continuous learning from feedback, or by having superior representations of non-text input.”

Read Andrew’s letter here.

Other top AI news and research stories we covered in depth:


A special offer for our community

DeepLearning.AI recently launched the first-ever subscription plan for our entire course catalog! As a Pro Member, you’ll immediately enjoy access to:

  • Over 150 AI courses and specializations from Andrew Ng and industry experts
  • Labs and quizzes to test your knowledge
  • Projects to share with employers
  • Certificates to testify to your new skills
  • A community to help you advance at the speed of AI

Enroll now to lock in a year of full access for $25 per month paid upfront, or opt for month-to-month payments at just $30 per month. Both payment options begin with a one week free trial. Explore Pro’s benefits and start building today!

Try Pro Membership

Share

Subscribe to Data Points

Your accelerated guide to AI news and research