Building multi-agent systems in Rowboat’s IDE Top computer use agent UI-TARS gets an update

Published
Apr 25, 2025
Reading time
4 min read
Music producer uses AI assistant on screen to mix a song while singer records vocals in a professional studio.

In today’s edition, you’ll learn more about:

  • GPT-4o’s image generator now available via API
  • Google updates its Lyria model and music editing tools
  • Grok 3 models now available for API developers
  • Executive order would overhaul K-12 AI education in U.S. schools

But first:

Rowboat launches open-source IDE for multi-agent AI development

Rowboat, a new freely available integrated development environment, aims to simplify the creation and deployment of multi-agent AI systems. The platform features a visual interface that transforms natural language specifications into functional agent workflows, supports MCP servers for tool integration, and includes a playground for interactive testing and debugging. The Y Combinator-backed project integrates with OpenAI’s Agents SDK and is designed for developers working on applications in financial services, insurance, travel, and telecommunications. Rowboat is available now on GitHub under an Apache 2.0 license. (GitHub)

ByteDance updates GUI agent, outperforms OpenAI and Anthropic

ByteDance released UI-TARS-1.5, an updated multimodal agent framework that outperforms several leading models including OpenAI’s Operator and Anthropic’s Claude 3.7 Sonnet in GUI automation and game reasoning benchmarks. The model works as an end-to-end system that perceives screenshots and generates human-like control actions such as mouse movements and keyboard inputs, rather than relying on function calls or tool augmentation. The model performs well across desktop, mobile, and game environments, achieving higher success rates in complex benchmarks like ScreenSpotPro (61.6 percent) compared to earlier versions of UI-TARS and competitors. UI-TARS-1.5 is an open-weights model, available under an Apache 2.0 license through GitHub and Hugging Face. (TARS)

OpenAI makes new image generation model available through API

OpenAI released “gpt-image-1,” giving developers API access to the same image generation model used in ChatGPT. The company reports ChatGPT users created over 700 million images in the feature’s first week after launch. The API includes safety features and C2PA metadata in generated images. Pricing follows a token-based structure with text input tokens at $5 per million tokens, image input tokens at $10 per million tokens, and image output tokens at $40 per million tokens, which translates to approximately $0.02, $0.07, and $0.19 per generated image for low, medium, and high-quality square images, respectively. (OpenAI)

Google expands Music AI Sandbox with new features and Lyria 2 model

Google introduced new features and improvements to its Music AI Sandbox, including Lyria 2, their latest music generation model. The expanded toolkit offers three main capabilities: Create (generating music samples from text descriptions), Extend (continuing existing musical clips), and Edit (transforming existing audio with fine-grained control). Google developed these tools in collaboration with musicians through YouTube’s Music AI Incubator and is now giving more U.S.-based musicians access to experiment with them. The company also unveiled Lyria RealTime, which enables real-time interactive music creation and performance. Music AI Sandbox and Lyria 2 are currently available only to trusted testers via waitlist. (Google)

xAI launches Grok 3 models in API

xAI released what it called beta versions of its Grok 3 model lineup with standard and fast variants at different price points. The flagship Grok 3 model costs $3 per million tokens for input and $15 per million tokens for output, while the faster version charges $5 and $25 respectively. The company also offers more affordable Grok 3 Mini models starting at $0.30/$0.50 per million input/output tokens, plus separate Grok 2 models with vision and image generation capabilities. All text models feature a 131,072 token context window and share the same underlying architecture, differing only in server speed. In the API, Grok 3 models are not connected to the real-time web, and have a knowledge cutoff of November 2024. (xAI)

Trump executive order establishes AI education task force

U.S. President Trump signed an executive order creating a White House Task Force on Artificial Intelligence Education. The order directs the government to launch several concrete initiatives: development of K-12 AI education resources through public-private partnerships, allocation of existing federal funds for teacher training on AI integration, expansion of AI-related student apprenticeships, and a Presidential AI Challenge competition to highlight student achievements. These programs aim to build AI literacy and technical skills across the American workforce and educational system. (The White House)


Still want to know more about what matters in AI right now?

Read this week’s issue of The Batch for in-depth analysis of news and research.

This week, Andrew Ng highlighted how AI-assisted coding enables developers to work in unfamiliar languages, while understanding the core programming concepts of each language remains key to success.

“Understanding the concepts behind different languages is still important… This lets you prompt the LLM much more precisely, and helps you understand how to fix issues if something goes wrong.”

Read Andrew’s full letter here.

Other top AI news and research stories we covered in depth: OpenAI introduced the cost-efficient GPT-4.1 family, along with the o3 and o4-mini reasoning models, designed to improve complex problem-solving and coding; Hugging Face acquired Pollen Robotics and unveiled Reachy 2, a new open-weights model-powered robot for research and experimentation; the U.S. government imposed tighter restrictions on AI chip exports to China and began an investigation into Nvidia’s practices; and researchers developed a text-only language model capable of interpreting images, video, and audio — all without additional training.


Subscribe to Data Points

Share

Subscribe to Data Points

Your accelerated guide to AI news and research