Vision

264 Posts

Deep Learning at (Small) Scale: How to run PilotNet on a Raspberry Pi Pico microcontroller
Vision

Deep Learning at (Small) Scale: How to run PilotNet on a Raspberry Pi Pico microcontroller

TinyML shows promise for bringing deep learning to applications where electrical power is scarce, processing in the cloud is impractical, and/or data privacy is paramount.
Don’t Steal My Style: Glaze tool prevents AI from learning an artist's style.
Vision

Don’t Steal My Style: Glaze tool prevents AI from learning an artist's style.

Asked to produce “a landscape by Thomas Kinkade,” a text-to-image generator fine-tuned on the pastoral painter’s work can mimic his style in seconds, often for pennies. A new technique aims to make it harder for algorithms to mimic an artist’s style.
Battlefield Chat: A military chatbot can create battle plans.
Vision

Battlefield Chat: A military chatbot can create battle plans.

Large language models may soon help military analysts and commanders make decisions on the battlefield.
Image Generators Copy Training Data: Spotting similarities between generated images and data
Vision

Image Generators Copy Training Data: Spotting similarities between generated images and data

We know that image generators create wonderful original works, but do they sometimes replicate their training data? Recent work found that replication does occur.
Eyes on the Olympics: The 2024 Paris Olympics may have AI surveillance.
Vision

Eyes on the Olympics: The 2024 Paris Olympics may have AI surveillance.

French lawmakers said “oui” to broad uses of AI-powered surveillance. France’s National Assembly authorized authorities to test systems that detect unlawful, dangerous, or unusual behavior at next year’s Summer Olympics in Paris. The bill will become law unless the country’s top court blocks it.
Vision and Language Tightly Bound: Training on a single loss function improves multimiodal AI.
Vision

Vision and Language Tightly Bound: Training on a single loss function improves multimiodal AI.

Recent multimodal models process both text and images as sequences of tokens, but they learn to represent these distinct data types using separate loss functions. Recent work unifies the loss function as well.
GPT-4 Has Landed: Everything you need to know about GPT-4.
Vision

GPT-4 Has Landed: Everything you need to know about GPT-4.

Get ready for the next wave of language-model mania. OpenAI introduced the latest in its GPT series of large language models to widespread excitement. The company showed statistics and examples designed to demonstrate...
High-level overview of the STEGO architecture at train and prediction steps
Vision

Segmented Images, No Labeled Data: Improved unsupervised learning for semantic segmentation

Training a model to separate the objects in a picture typically requires labeled images for best results. Recent work upped the ante for training without labels.
Moving slide with information about AWS AI Service Cards.
Vision

Transparency for AI as a Service: Amazon introduces service cards to enhance responsible AI.

Amazon published a series of web pages designed to help people use AI responsibly. Amazon Web Services introduced so-called AI service cards that describe the uses and limitations of some models it serves.
3 graphs showing projections of data usage. Each one shows two extrapolations of data usage.
Vision

Will We Have Enough Data?

The world’s supply of data soon may fail to meet the demands of increasingly hungry machine learning models. Researchers at Epoch AI found that a shortage of text data could cause trouble as early as this year. Vision data may fall short within a decade.
Reza Zadeh photographed during a conference
Vision

Reza Zadeh: Generative AI can bring a breakthrough in active learning, says Matroid founder Reza Zadeh

As we enter the new year, there is a growing hope that the recent explosion of generative AI will bring significant progress in active learning. This technique, which enables machine learning systems to generate their own training examples and request them to be labeled...
Illustration of three deers doing holiday household chores: washing a champagne flute, cooking pie and wrapping a gift
Vision

One Model Does It All: Multi-task AI models got more sophisticated in 2022.

Individual deep learning models proved their mettle in hundreds of tasks. The scope of multi-task models expanded dramatically in the past year.
Illustration of a snowman with a top hat and glasses
Vision

AI's Eyes Evolve: Vision transformer research exploded in 2022.

Work on vision transformers exploded in 2022. Researchers published well over 17,000 ViT papers during the year. A major theme: combining self-attention and convolution.
Sequence showing how FIFA's Video Assisted Review (VAR) works
Vision

The World Cup's AI Referee

The outcome of the FIFA World Cup 2022 depends on learning algorithms. The annual championship tournament of football, which wraps up this week, is using machine learning to help human arbiters spot players who break a rule that governs their locations on the field.
Ground truth video of a road on the left and predicted video with MaskViT on the right
Vision

Seeing What Comes Next: Transformers predict future video frames.

If a robot can predict what it’s likely to see next, it may have a better basis for choosing an appropriate action — but it has to predict quickly. Transformers, for all their utility in computer vision, aren’t well suited to this because of their steep computational and memory requirements...

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox