From Clip to Composition Udio expands text-to-music generator, now extends existing recordings

Jun 19, 2024
Reading time
2 min read
From Clip to Composition: Udio expands text-to-music generator, now extends existing recordings

Is your song’s verse in need of a chorus? A popular text-to-music generator can extend existing recordings while maintaining their musical character.

What’s new: Paying users of Udio, a web service that generates pop-song productions from prompts, can upload audio clips and extend or alter them according to a text description. The service also increased its context window from 30 seconds to 2 minutes for more coherent output. You can hear the new capability here. Subscriptions start at $10 per month.

How it works: Given a prompt, Udio generates a 30-second passage and lets you assemble passages into compositions (previously up to four minutes long, now 15 minutes). Now users can create passages by uploading audio clips and extending them or modifying them by, say, adding or removing instruments or vocals complete with lyrics. 

  • In the demonstration video linked above, Udio adds a singing voice to an instrumental backing track using the prompt “funk, female vocalist.” Other examples enhance an electronic beat with a guitar melody and fill out hard-rock drums with a guitar riff and wailing voice.
  • Users are responsible for securing legal rights to use audio files they upload. They retain commercial rights to audio that they produce using the software, as long as they specify that Udio generated the recording. 
  • Udio has shared few details about how it built its model. “A large amount of publicly available and high-quality music” was in the training set, CEO David Ding told Music Ally. The company has “very strong artist filters and a copyright focus” to avoid generating output that sounded too much like copyrighted music, he added.

Behind the news: Udio competes with Suno, whose service also generates audio output with vocals, lyrics, and song structures. Also in the mix is Stability AI, whose Stable Audio 2.0 enables users to upload and extend brief instrumental recordings to a length of around three minutes.

Why it matters: Udio is quickly becoming not just a song generator, but a song editor and builder. Just as the ability of text-to-image generators to edit, extend, and infill existing images made those applications more useful in a variety of creative situations, Udio’s audio-to-audio capabilities give composers and producers new horizons for enhancing, orchestrating, and structuring their own productions. 

We’re thinking: Udio offers impressive capabilities for musicians (and wanna-be musicians), but its developer tools are lacking. A public-facing API would enable producers to automate the service and integrate it with other applications.


Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox