MuLan text tokens computed from text prompt as conditioning signal
He Who Types the Prompt Calls the Tune: Google introduces an AI that generates music from text.

As AI-generated text and images capture the world’s attention, music is catching up. Andrea Agostinelli, Timo I. Denk, and colleagues at Google and Sorbonne Université introduced MusicLM, a system that generates music from text descriptions.
Charts showing benchmark on medium-sized datasets
When Trees Outdo Neural Networks: Decision Trees Perform Best on Most Tabular Data

While neural networks perform well on image, text, and audio datasets, they fall behind decision trees and their variations for tabular datasets. New research looked into why.
Animation showing 3 main types of data augmentation and random cropping of a picture
Cookbook for Vision Transformers: A Formula for Training Vision Transformers

Vision Transformers (ViTs) are overtaking convolutional neural networks (CNN) in many vision tasks, but procedures for training them are still tailored for CNNs. New research investigated how various training ingredients affect ViT performance.

