Network architecture of Reasoner

What the Missing Frames Showed: Machine Learning Describes Masked Video Events

Neural networks can describe in words what’s happening in pictures and videos — but can they make sensible guesses about things that happened before or will happen afterward? Researchers probed this ability.
Dependency between compute budget and number of parameters

Right-Sizing Models for the Dataset: Finding the Best Data-To-Parameter Ratio for NLP Models

The route to improving transformer-based language models like GPT-3 and Gopher, which are trained on immense quantities of text scraped from the web, has been to increase their size. But research shows that, given a processing budget, bigger doesn’t necessarily mean better.
Plot demonstrating the relative sizes of parallel and monolingual examples

Massively Multilingual Translation: Machine Learning Model Trained to Translate 1,000 Languages

Recent work showed that models for multilingual machine translation can increase the number of languages they translate by scraping the web for pairs of equivalent sentences in different languages. A new study radically expanded the language repertoire through training on untranslated web text.
Technical components of No Language Left Behind and how they fit together

The Net Speaks in Many Tongues: NLP Model Translates 200 Different Languages

Sentence pairs that have equivalent meanings in different languages — typically used to train machine translation systems — have been available in sufficient quantities for only around 100 languages. New work doubled that number and produced a more capable model.
Example of a video produced from a story-like description

Long-Form Videos from Text Stories: Google's Phenaki Generates Long-Form Video from Text

Only a week ago, researchers unveiled a system that generates a few seconds of video based on a text prompt. New work enables a text-to-video system to produce an entire visual narrative from several sentences of text.
Illustration of the Dialogue Transformer Language Model (DLM)

The Sound of Conversation: AI Learns to Mimic Conversational Pauses and Interruptions

In spoken conversation, people naturally take turns amid interjections and other patterns that aren’t strictly verbal. A new approach generated natural-sounding audio dialogs without training on text transcriptions that mark when one party should stop speaking and the other should chime in.
Panda on a swing

Text to Video Without Text-Video Training Data: Make-A-Video, an AI System from Meta, Generates Video from Text

Text-to-image generators like DALL·E 2, Midjourney, and Stable Diffusion are winning art contests and worrying artists. A new approach brings the magic of text-to-image generation to video.
Animation showing 3 main types of data augmentation and random cropping of a picture

Cookbook for Vision Transformers: A Formula for Training Vision Transformers

Vision Transformers (ViTs) are overtaking convolutional neural networks (CNN) in many vision tasks, but procedures for training them are still tailored for CNNs. New research investigated how various training ingredients affect ViT performance.
Robot with an arm, camera, and gripper handing over a plastic bottle to a person

Parsing Commands Into Actions: NLP Helps Google Robot Understand Spoken Instructions

A new method enables robots to respond helpfully to verbal commands by pairing a natural language model with a repertoire of existing skills.
Different Nvidia cloud-computing services

Chipmaker Boosts AI as a Service: Nvidia Launches Cloud Service for NLP Models

Nvidia, known for chips designed to process AI systems, is providing access to large language models. Nvidia announced early access to NeMo LLM and BioNeMo, cloud-computing services that enable developers to generate text and biological sequences respectively.
Information related to Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model (SERAC)

Update Any Language Model: New Method to Update Pretrained Language Models

The ability to update language models is essential to incorporate new information and correct undesirable behaviors. Previous methods are unwieldy and often fail as the amount of new data increases. New work offers a workaround.
Illustration shows different self-attention mechanisms used by Transformer-based AI models.

Attention to Rows and Columns: Altering Transformers' Self-Attention Mechanism for Greater Efficiency

A new approach alters transformers' self-attention mechanism to balance computational efficiency with performance on vision tasks.
Object-Detection Transformers Simplified: New Research Improves Object Detection With Vision Transformers

Object-Detection Transformers Simplified: New Research Improves Object Detection With Vision Transformers

ViTDet, a new system from Facebook, adds an object detector to a plain pretrained transformer.
Bloom logo

Large Language Models Unbound: BLOOM is the Largest Open Source NLP Model to Date

A worldwide collaboration produced the biggest open source language model to date. BLOOM is a family of language models built by the BigScience Research Workshop, a collective of over 1,000 researchers from 250 institutions around the globe.
A flowchart shows how a jury learning method reduces annotator bias in machine learning models.

Choose the Right Annotators: Jury-Learning Helps Remove Bias from NLP Models

A new machine learning method attempts to account for biases that may be held by certain subsets of labelers.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox