A chorus of reindeers singing carols from a Reindeer Holiday Songbook on a snowy night
Vision

Voices for the Voiceless: Generative AI models are creating voices for Hollywood and video games.

Musicians and filmmakers adopted AI as a standard part of the audio-production toolbox. What happened: Professional media makers embraced neural networks that generate new sounds and modify old ones. Voice actors bristled.
An illustration shows a cozy cabin where all the furniture is made out of coffee mugs.
Vision

Transformers Take Over: Transformers Applied to Vision, Language, Video, and More

In 2021, transformers were harnessed to discover drugs, recognize speech, and paint pictures — and much more.
Illustration of a woman riding a sled
Vision

Multimodal AI Takes Off: Multimodal Models, such as CLIP and DALL·E, are taking over AI.

While models like GPT-3 and EfficientNet, which work on text and images respectively, are responsible for some of deep learning’s highest-profile successes, approaches that find relationships between text and images made impressive
Equivariant subsampling on 1D feature maps with a scale factor c = 2.
Vision

Image Transformations Unmasked: CNNs for vision that aren't fooled by changing backgrounds.

If you change an image by moving its subject within the frame, a well trained convolutional neural network may not recognize the fundamental similarity between the two versions. New research aims to make CNN wise to such alterations.
Computer vision from SewerAI identifies flaws in water pipes.
Vision

AI Goes Underground: Computer Vision From SewerAI Classifies Defective Pipes

A system from California startup SewerAI analyzes videos of underground pipes to prioritize those in need of repair.
Animated video shows the Flippy 2 autonomous restaurant robot making french fries
Vision

Deep Learning for Deep Frying: White Castle Uses Robots to Cook French Fries

Flippy 2, a robotic fry station from California-based Miso Robotics, has been newly deployed in a Chicago White Castle location.
A graph shows the cost in dollars of training large natural language processing models.
Vision

Who Can Afford to Train AI?: Cost of AI is Too Expensive for Many Small Companies

The cost of training top-performing machine learning models has grown beyond the reach of smaller companies.
Animated chart shows how AI can help robots locate key spatial coordinates.
Vision

Finding Useful Points in Space: Keypoint3D Helps Robots Locate Spatial Coordinates

A new machine learning method aims to improve a machine’s ability to determine and locate points of interest.
Animation showing the use of AI to identify forgery in a famous painting
Vision

Seeing Through Forgeries: Researchers Use AI to Identify Da Vinci Forgery

Accusations of fraud hang over some of the world’s most highly valued artworks. Machine learning engineers are evaluating the authenticity of these famous pieces.
First image showing the Google Tensor chip. Second image showing the Google Pixel 6 phone
Vision

Competition Heats Up in Mobile AI: Google Designed Its Own Tensor AI Chip for Smartphones

Google designed its own AI chip for its new smartphone — a snub to Qualcomm, the dominant chip vendor in Android phones. What’s new: Google debuted the Tensor chip last week
Animation showing how MERLOT is able to match contextualized captions with their corresponding video frames
Vision

Richer Video Representations: Pretraining Method Improves AI's Ability to Understand Video

To understand a movie scene, viewers often must remember or infer previous events and extrapolate potential consequences. New work improved a model’s ability to do the same.
Illustration showing a kid being followed during Trick-or-treating
Vision

Democracies Embrace Surveillance: AI Increasingly Spies on Citizens of Western Democracies

What if AI-enabled monitoring isn’t just for dictators and despots?The fear: Under the pretext of maintaining law and order, even countries founded on a commitment to individual rights allow police to take advantage of smart-city infrastructure and smart-home devices.
A group of bats throwing missiles from the night sky
Vision

Killer Robots Are Here: Are AI-Powered Weapons the Future of Warfare?

War is already bad enough. What happens when human combatants are replaced by machines?
Series of example of accurate and inaccurate matching images to text
Vision

Crawl the Web, Absorb the Bias: NLP Models Absorb Biases from Web Training Data

The emerging generation of trillion-parameter models needs datasets of billions of examples, but the most readily available source of examples on that scale — the web — is polluted with bias and antisocial expressions. A new study examines the issue.
Animated image showing the transformer architecture of processing an image
Vision

Transformer Speed-Up Sped Up: How to Speed Up Image Transformers

The transformer architecture is notoriously inefficient when processing long sequences — a problem in processing images, which are essentially long sequences of pixels. One way around this is to break up input images and process the pieces

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox