xAI launched a video generator that topped an independent quality ranking at a fraction of competitors’ prices.
What’s new: Grok Imagine 1.0 takes text with images and/or video, and produces video clips that can include dialogue, sound effects, and music.
- Input/output: Text, image (optional), video (optional) in, video with audio out (up to 10 seconds at 1,280x720 pixels via chat interfaces, up to 15 seconds at 1,280x720 pixels or 854x480 pixels via API)
- Performance: Topped Artificial Analysis Video Arena in both text-to-video and image-to-video at launch
- Capabilities: Video alteration via text instructions, camera motion (pan, tilt, zoom); add, remove, and swap objects within scenes; style transfer; multiple aspect ratios
- Availability/price: Web interface via grok.com, x.com, and Grok mobile app (free for X Basic and Premium users; Premium users can generate longer videos), API $4.20 per minute of output
- Undisclosed: xAI disclosed no information about Grok Imagine 1.0’s underlying technology and how it was built.
Performance: Grok Imagine 1.0 debuted at the top of the Artificial Analysis Video Arena, a blind, head-to-head test of preferences judged by human viewers. It’s slower than some competitors but generally less expensive. (Disclosure: Andrew Ng has a personal investment in Artificial Analysis.)
- At launch, Artificial Analysis’ leaderboards ranked Grok Imagine 1.0 first in both the text-to-video and image-to-video categories, ahead of Runway Gen-4.5, Kling 2.5 Turbo, and Google Veo 3.1.
- On LM Arena’s video leaderboards, grok-imagine-video-720p ranked first in image-to-video (1,400 Elo), ahead of Google Veo 3.1 (1,395 Elo), and fourth in text-to-video (1,362 Elo), behind Google Veo 3.1 (1,371 Elo) and OpenAI Sora 2 Pro (1,369 Elo).
- In xAI’s head-to-head tests using IVEBench (which evaluates the quality of instruction-guided video alterations), human raters preferred Grok Imagine 1.0 over Runway Aleph (64.1 percent of the time) and Kling O1 (57 percent of the time).
- According to Artificial Analysis, on average, Grok Imagine 1.0 generated a video (duration unspecified) in 110.1 seconds, slower than Kling 2.5 Turbo (89.2 seconds) and Vidu Q2 (39.1 seconds) but faster than OpenAI Sora 2 Pro (448.4 seconds) and MiniMax Hailuo 2.3 (167.1 seconds).
- At $4.20 per minute of generated video (with audio), Grok Imagine 1.0 matches the price of Kling 2.5 Turbo (without audio) and costs less than Google Veo 3.1 Preview ($12 per minute with audio) and OpenAI Sora 2 Pro ($30 per minute with audio).
Behind the news: Unlike video generators from Google, OpenAI, and Runway, which are available as standalone products and/or via APIs, Grok Imagine 1.0 is integrated with the X social network. This enables X users to generate and share video directly on X, where they have caused controversy. In late 2025, X users exploited Grok to produce nonconsensual sexualized images of real people, including children, resulting in investigations and bans in several countries. The phenomenon persisted after xAI promised to address it, Reuters reported.
Why it matters: Generating a video that matches your vision typically requires many iterations of adjusting prompts, regenerating, and comparing results. xAI says that early partners told the company that quality alone was not useful if latency and cost made iteration untenable. Third-party benchmarks show Grok Imagine 1.0 matches or exceeds leading models on quality at a lower cost than premium competitors, a combination that lowers the cost of experimentation.
We’re thinking: Image generation went from novelty to table stakes in roughly two years. Video generation is following a similar path. The seven-fold price gap between Grok Imagine 1.0 and the now-shuttered OpenAI Sora 2 Pro suggests that prices still have plenty of room to fall.