Multimodal deep learning model
Vision

AI Versus the Garbage Heap

Amazon reported long-term success using machine learning to shrink its environmental footprint. The online retailer developed a system that fuses product descriptions, images, and structured data to decide how an item should be packed for shipping.
2 min read
Man with gun walking by detector
Vision

Stopping Guns at the Gate

A Major League Baseball stadium will be using computer vision to detect weapons as fans enter.What’s new: A system called Hexwave will look for firearms, knives, and explosives carried by baseball fans who visit Camden Yards, home field of the Baltimore Orioles.
2 min read
Yoav Shoham
Vision

Yoav Shoham: Language Models That Reason

I believe that natural language processing in 2022 will re-embrace symbolic reasoning, harmonizing it with the statistical operation of modern neural networks. Let me explain what I mean by this.
2 min read
Alexei Efros
Vision

Alexei Efros: Learning From the Ground Up

Things are really starting to get going in the field of AI. After many years (decades?!) of focusing on algorithms, the AI community is finally ready to accept the central role of data and the high-capacity models that are capable of taking advantage of this data.
2 min read
Wolfram Burgard
Vision

Wolfram Burgard: Train Robots in the Real World

Robots are tremendously useful machines, and I would like to see them applied to every task where they can do some good. Yet we don’t have enough programmers for all this hardware and all these tasks.
3 min read
Abeba Birhane
Vision

Abeba Birhane: Clean Up Web Datasets

From language to vision models, deep neural networks are marked by improved performance, higher efficiency, and better generalizations. Yet, these systems are also marked by perpetuation of bias and injustice.
3 min read
A living room made out of cups of coffee: the people, the seats, the chimney, the lamp, all gather around a cozy fire.
Vision

One Architecture to Do Them All: Transformer: The AI Architecture That Can Do It All

The transformer architecture extended its reach to a variety of new domains.What happened: Originally developed for natural language processing, transformers are becoming the Swiss Army Knife of deep learning.
2 min read
A chorus of reindeers singing carols from a Reindeer Holiday Songbook on a snowy night
Vision

Voices for the Voiceless: Generative AI Models Are Creating Voices for Hollywood and Video Games

Musicians and filmmakers adopted AI as a standard part of the audio-production toolbox. What happened: Professional media makers embraced neural networks that generate new sounds and modify old ones. Voice actors bristled.
2 min read
An illustration shows a cozy cabin where all the furniture is made out of coffee mugs.
Vision

Transformers Take Over: Transformers Applied to Vision, Language, Video, and More

In 2021, transformers were harnessed to discover drugs, recognize speech, and paint pictures — and much more.
2 min read
Multimodal AI Takes Off: Multimodal Models, such as CLIP and Dall-E, Are Taking Over AI
Vision

Multimodal AI Takes Off: Multimodal Models, such as CLIP and Dall-E, Are Taking Over AI

While models like GPT-3 and EfficientNet, which work on text and images respectively, are responsible for some of deep learning’s highest-profile successes, approaches that find relationships between text and images made impressive
1 min read
Image Transformations Unmasked: CNNs for Vision that Aren't Fooled By Changing Backgrounds
Vision

Image Transformations Unmasked: CNNs for Vision that Aren't Fooled By Changing Backgrounds

If you change an image by moving its subject within the frame, a well trained convolutional neural network may not recognize the fundamental similarity between the two versions. New research aims to make CNN wise to such alterations.
2 min read
Computer vision from SewerAI identifies flaws in water pipes.
Vision

AI Goes Underground: Computer Vision From SewerAI Classifies Defective Pipes

A system from California startup SewerAI analyzes videos of underground pipes to prioritize those in need of repair.
1 min read
Animated video shows the Flippy 2 autonomous restaurant robot making french fries
Vision

Deep Learning for Deep Frying: White Castle Uses Robots to Cook French Fries

Flippy 2, a robotic fry station from California-based Miso Robotics, has been newly deployed in a Chicago White Castle location.
2 min read
A graph shows the cost in dollars of training large natural language processing models.
Vision

Who Can Afford to Train AI?: Cost of AI is Too Expensive for Many Small Companies

The cost of training top-performing machine learning models has grown beyond the reach of smaller companies.
2 min read
Animated chart shows how AI can help robots locate key spatial coordinates.
Vision

Finding Useful Points in Space: Keypoint3D Helps Robots Locate Spatial Coordinates

A new machine learning method aims to improve a machine’s ability to determine and locate points of interest.
2 min read

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox