Example of a video produced from a story-like description
Long-Form Videos from Text Stories: Google's Phenaki Generates Long-Form Video from Text

Only a week ago, researchers unveiled a system that generates a few seconds of video based on a text prompt. New work enables a text-to-video system to produce an entire visual narrative from several sentences of text.
Illustration of the Dialogue Transformer Language Model (DLM)
The Sound of Conversation: AI Learns to Mimic Conversational Pauses and Interruptions

In spoken conversation, people naturally take turns amid interjections and other patterns that aren’t strictly verbal. A new approach generated natural-sounding audio dialogs without training on text transcriptions that mark when one party should stop speaking and the other should chime in.
Panda on a swing
Text to Video Without Text-Video Training Data: Make-A-Video, an AI System from Meta, Generates Video from Text

Text-to-image generators like DALL·E 2, Midjourney, and Stable Diffusion are winning art contests and worrying artists. A new approach brings the magic of text-to-image generation to video.
Animation showing 3 main types of data augmentation and random cropping of a picture
Cookbook for Vision Transformers: A Formula for Training Vision Transformers

Vision Transformers (ViTs) are overtaking convolutional neural networks (CNN) in many vision tasks, but procedures for training them are still tailored for CNNs. New research investigated how various training ingredients affect ViT performance.
Animated overview of PP-Matting
Automating Mattes for Visual Effects: New ML Method Produces Image Mattes Easier

Researchers at Baidu introduced PP-Matting, an architecture that, given an image, estimates the transparency of pixels surrounding foreground objects to create mattes without requiring additional input.
Information related to Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model (SERAC)
Update Any Language Model: New Method to Update Pretrained Language Models

The ability to update language models is essential to incorporate new information and correct undesirable behaviors. Previous methods are unwieldy and often fail as the amount of new data increases. New work offers a workaround.
Illustration shows different self-attention mechanisms used by Transformer-based AI models.
Attention to Rows and Columns: Altering Transformers' Self-Attention Mechanism for Greater Efficiency

A new approach alters transformers' self-attention mechanism to balance computational efficiency with performance on vision tasks.
Object-Detection Transformers Simplified: New Research Improves Object Detection With Vision Transformers

ViTDet, a new system from Facebook, adds an object detector to a plain pretrained transformer.
Overall architecture of GEM.
What a Molecule’s Structure Reveals: Baidu Creates AI to Classify Molecular Properties

The authors trained a modified GNN on a dataset of 18 million molecules to find molecular properties.
Animated chart shows how AI can avoid mistaking an image's subject for its context.
Taming Spurious Correlations: New Technique Helps AI Avoid Classification Mistakes

When a neural network learns image labels, it may confuse a background item for the labeled object. New research avoids such mistakes.
Animated graphs showing how an ensemble of fine-tuned models can provide better performance.
Ensemble Models Simplified: New Machine Learning Research Simplifies Ensembles

A CLIP model whose weights were the mean of an ensemble of fine-tuned models performed as well as the ensemble and better than its best-performing constituent.
Animated flowcharts show how the ProtCNN AI model classifies proteins.
Protein Families Deciphered: Machine Learning Categorizes Proteins Based on Their Functions

Convolutional neural networks separate proteins into functional families without considering their shapes.
A flowchart shows how a jury learning method reduces annotator bias in machine learning models.
Choose the Right Annotators: Jury-Learning Helps Remove Bias from NLP Models

A new machine learning method attempts to account for biases that may be held by certain subsets of labelers.
Flowcharts show how a new contrastive learning approach uses metadata to improve AI image classifiers
Learning From Metadata: Descriptive Text Improves Performance for AI Image Classification Systems

Images in the wild may not come with labels, but they often include metadata. A new training method takes advantage of this information to improve contrastive learning.
Humanized Training for Robot Arms
Humanized Training for Robot Arms: New Research Improves Robot Performance and Adaptability

Robots trained via reinforcement learning usually study videos of robots performing the task at hand. A new approach used videos of humans to pre-train robotic arms.

