Man with gun walking by detector
Vision

Stopping Guns at the Gate: How Camden Yards uses AI to scan for weapons.

A Major League Baseball stadium will be using computer vision to detect weapons as fans enter.What’s new: A system called Hexwave will look for firearms, knives, and explosives carried by baseball fans who visit Camden Yards, home field of the Baltimore Orioles.
Multimodal deep learning model
Vision

AI Versus the Garbage Heap: How Amazon uses AI to cut waste.

Amazon reported long-term success using machine learning to shrink its environmental footprint. The online retailer developed a system that fuses product descriptions, images, and structured data to decide how an item should be packed for shipping.
Photograph of Yale Song
Vision

Yale Song: Foundation models for vision.

Large models pretrained on immense quantities of text have been proven to provide strong foundations for solving specialized language tasks. My biggest hope for AI in 2022 is...
Yoav Shoham
Vision

Yoav Shoham: Language models that reason.

I believe that natural language processing in 2022 will re-embrace symbolic reasoning, harmonizing it with the statistical operation of modern neural networks. Let me explain what I mean by this.
Alexei Efros
Vision

Alexei Efros: Learning from the ground up.

Things are really starting to get going in the field of AI. After many years (decades?!) of focusing on algorithms, the AI community is finally ready to accept the central role of data and the high-capacity models that are capable of taking advantage of this data.
Wolfram Burgard
Vision

Wolfram Burgard: Train robots in the real world.

Robots are tremendously useful machines, and I would like to see them applied to every task where they can do some good. Yet we don’t have enough programmers for all this hardware and all these tasks.
Abeba Birhane
Vision

Abeba Birhane: Clean up web datasets.

From language to vision models, deep neural networks are marked by improved performance, higher efficiency, and better generalizations. Yet, these systems are also marked by perpetuation of bias and injustice.
A living room made out of cups of coffee: the people, the seats, the chimney, the lamp, all gather around a cozy fire.
Vision

One Architecture to Do Them All: Transformer: The AI architecture that can do it all.

The transformer architecture extended its reach to a variety of new domains.What happened: Originally developed for natural language processing, transformers are becoming the Swiss Army Knife of deep learning.
A chorus of reindeers singing carols from a Reindeer Holiday Songbook on a snowy night
Vision

Voices for the Voiceless: Generative AI models are creating voices for Hollywood and video games.

Musicians and filmmakers adopted AI as a standard part of the audio-production toolbox. What happened: Professional media makers embraced neural networks that generate new sounds and modify old ones. Voice actors bristled.
Illustration of a woman riding a sled
Vision

Multimodal AI Takes Off: Multimodal Models, such as CLIP and DALL-E, are taking over AI.

While models like GPT-3 and EfficientNet, which work on text and images respectively, are responsible for some of deep learning’s highest-profile successes, approaches that find relationships between text and images made impressive
An illustration shows a cozy cabin where all the furniture is made out of coffee mugs.
Vision

Transformers Take Over: Transformers Applied to Vision, Language, Video, and More

In 2021, transformers were harnessed to discover drugs, recognize speech, and paint pictures — and much more.
Equivariant subsampling on 1D feature maps with a scale factor c = 2.
Vision

Image Transformations Unmasked: CNNs for vision that aren't fooled by changing backgrounds.

If you change an image by moving its subject within the frame, a well trained convolutional neural network may not recognize the fundamental similarity between the two versions. New research aims to make CNN wise to such alterations.
Computer vision from SewerAI identifies flaws in water pipes.
Vision

AI Goes Underground: Computer Vision From SewerAI Classifies Defective Pipes

A system from California startup SewerAI analyzes videos of underground pipes to prioritize those in need of repair.
Animated video shows the Flippy 2 autonomous restaurant robot making french fries
Vision

Deep Learning for Deep Frying: White Castle Uses Robots to Cook French Fries

Flippy 2, a robotic fry station from California-based Miso Robotics, has been newly deployed in a Chicago White Castle location.
A graph shows the cost in dollars of training large natural language processing models.
Vision

Who Can Afford to Train AI?: Cost of AI is Too Expensive for Many Small Companies

The cost of training top-performing machine learning models has grown beyond the reach of smaller companies.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox