Coherent, Interactive Worlds Runway’s GWM-1 models generate videos with consistent physics for robots and entertainment

Published
Reading time
3 min read
GIF showing a robotic arm picking up glasses on a table and handling tools on a kitchen countertop.
Loading the Elevenlabs Text to Speech AudioNative Player...

Runway’s GWM-1 family of video-generation models respond to user input in real time while producing scenes that remain consistent regardless of the camera’s position.

What’s new: Runway introduced GWM-1, a trio of “general world models” that were trained to understand how scenes behave, not just how scenes appear. GWM Worlds generates scenes, GWM Robotics produces synthetic data for training and testing robots, and GWM Avatars generates conversational characters with facial expressions and lip-synced speech. (In addition, the company added audio generation, audio editing, and multi-shot video editing capabilities to Gen-4.5, its flagship video generator.) 

  • Architecture: Autoregressive diffusion model based on Gen-4.5
  • Input/output: Text and images in, video out (up to 2 minutes, 1280x720-pixel resolution, 24 frames per second)
  • Availability: The models will be available in “coming weeks.” GWM Worlds and GWM Avatars will be available via web interface, GWM Robotics software development kit by request.
  • Undisclosed: Parameter count, training data and methods, pricing, release dates, performance metrics

How it works: Unlike typical diffusion models that generate an entire video simultaneously by removing noise progressively over a number of steps, GWM-1 generates one frame at a time based on past frames and control inputs. This autoregressive approach enables the model to respond to control input in real time. Runway built each GWM-1 model by post-training Gen-4.5 on domain-specific data. The models take still images and text as input.

  • GWM Worlds generates a video simulation as the user navigates through the scene by issuing text commands. Users prompt the system to define an agent, physics, and world (such as a person walking through a city or a drone flying over mountains). The model maintains space and geometry consistently as objects come in and out of view, so objects remain in place as they shift in and out of the camera’s view.
  • GWM Robotics was trained on unspecified robotics data to generate sequences of frames that show how a scene changes, from a robot’s point of view, depending on its actions. Developers can explore alternative robot motions or directions of travel by modifying the simulated actions and observing the output.
  • GWM Avatars is intended for conversational applications. Users select a voice and enter a portrait and/or text, and the model generates a character with realistic facial expressions, voices, lip sync, and gestures that will interact conversationally. Characters can be photorealistic or stylized.

Behind the news: Until recently, world models, or models that predict the future state of an environment given certain actions taken within that environment, reflected fairly limited worlds. Upon its launch in early 2024, OpenAI’s Sora 1 generated video output that was impressive enough to inspire arguments over whether it qualified as a world model of the real world. Those arguments were premature, since Sora 1’s output, however photorealistic it was, was not consistent with real-world physics, for instance. But they presaged models like Google Genie 2, which produces 3D video-game worlds that respond to keyboard inputs in real time, and World Labs [Marble], which generates persistent, editable, reusable 3D spaces from text, images, and other inputs.

Why it matters: Runway is among several AI companies that are racing to build models that simulate coherent worlds including objects, materials, lighting, fluid dynamics, and so on. Such models have huge potential value in entertainment and augmented reality but also in industrial and scientific fields, where they can help to design new products and plan for future scenarios. GWM Robotics (aimed at robotics developers) and GWM Avatars (which may be useful in applications like tutoring or customer service) show that Runway’s ambitions extend beyond entertainment.

We’re thinking: The world-model landscape is dividing between models that produce videos with real-time control (Runway GWM Worlds, Google Genie 3, World Labs RTFM) and those that make exportable 3D spaces (World Labs Marble). These approaches target different applications: Real-time interactivity enables training loops in which agents could learn from immediate feedback, while exportable 3D assets feed activities like game development, in which developers may refine and reuse assets across projects.

Share

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox