Vision

265 Posts

Slideshow of images generated by DALL-E 3

Painting With Text, Voice, and Images: ChatGPT now accepts voice and image inputs and outputs.

The updates expand ChatGPT into a voice-controlled, interactive system for text and image interpretation and production.

Vision

Where Is Meta’s Generative Play?: Why Meta still lacks a flagship generative AI service

While Microsoft and Google scramble to supercharge their businesses with text generation, Meta has yet to launch a flagship generative AI service. Reporters went looking for reasons why.

Vision

Deep Learning at (Small) Scale: How to run PilotNet on a Raspberry Pi Pico microcontroller

TinyML shows promise for bringing deep learning to applications where electrical power is scarce, processing in the cloud is impractical, and/or data privacy is paramount.

Vision

Don’t Steal My Style: Glaze tool prevents AI from learning an artist's style.

Asked to produce “a landscape by Thomas Kinkade,” a text-to-image generator fine-tuned on the pastoral painter’s work can mimic his style in seconds, often for pennies. A new technique aims to make it harder for algorithms to mimic an artist’s style.

Vision

Battlefield Chat: A military chatbot can create battle plans.

Large language models may soon help military analysts and commanders make decisions on the battlefield.

Vision

Image Generators Copy Training Data: Spotting similarities between generated images and data

We know that image generators create wonderful original works, but do they sometimes replicate their training data? Recent work found that replication does occur.

Vision

Eyes on the Olympics: The 2024 Paris Olympics may have AI surveillance.

French lawmakers said “oui” to broad uses of AI-powered surveillance. France’s National Assembly authorized authorities to test systems that detect unlawful, dangerous, or unusual behavior at next year’s Summer Olympics in Paris. The bill will become law unless the country’s top court blocks it.

Vision

Vision and Language Tightly Bound: Training on a single loss function improves multimiodal AI.

Recent multimodal models process both text and images as sequences of tokens, but they learn to represent these distinct data types using separate loss functions. Recent work unifies the loss function as well.

Vision

GPT-4 Has Landed: Everything you need to know about GPT-4.

Get ready for the next wave of language-model mania. OpenAI introduced the latest in its GPT series of large language models to widespread excitement. The company showed statistics and examples designed to demonstrate...

High-level overview of the STEGO architecture at train and prediction steps

Vision

Segmented Images, No Labeled Data: Improved unsupervised learning for semantic segmentation

Training a model to separate the objects in a picture typically requires labeled images for best results. Recent work upped the ante for training without labels.

Moving slide with information about AWS AI Service Cards.

Vision

Transparency for AI as a Service: Amazon introduces service cards to enhance responsible AI.

Amazon published a series of web pages designed to help people use AI responsibly. Amazon Web Services introduced so-called AI service cards that describe the uses and limitations of some models it serves.

3 graphs showing projections of data usage. Each one shows two extrapolations of data usage.

Vision

Will We Have Enough Data?

The world’s supply of data soon may fail to meet the demands of increasingly hungry machine learning models. Researchers at Epoch AI found that a shortage of text data could cause trouble as early as this year. Vision data may fall short within a decade.

Reza Zadeh photographed during a conference

Vision

Reza Zadeh: Generative AI can bring a breakthrough in active learning, says Matroid founder Reza Zadeh

As we enter the new year, there is a growing hope that the recent explosion of generative AI will bring significant progress in active learning. This technique, which enables machine learning systems to generate their own training examples and request them to be labeled...

Illustration of three deers doing holiday household chores: washing a champagne flute, cooking pie and wrapping a gift

Vision

One Model Does It All: Multi-task AI models got more sophisticated in 2022.

Individual deep learning models proved their mettle in hundreds of tasks. The scope of multi-task models expanded dramatically in the past year.

Illustration of a snowman with a top hat and glasses

Vision

AI's Eyes Evolve: Vision transformer research exploded in 2022.

Work on vision transformers exploded in 2022. Researchers published well over 17,000 ViT papers during the year. A major theme: combining self-attention and convolution.

Vision

Painting With Text, Voice, and Images: ChatGPT now accepts voice and image inputs and outputs.

Where Is Meta’s Generative Play?: Why Meta still lacks a flagship generative AI service

Deep Learning at (Small) Scale: How to run PilotNet on a Raspberry Pi Pico microcontroller

Don’t Steal My Style: Glaze tool prevents AI from learning an artist's style.

Battlefield Chat: A military chatbot can create battle plans.

Image Generators Copy Training Data: Spotting similarities between generated images and data

Eyes on the Olympics: The 2024 Paris Olympics may have AI surveillance.

Vision and Language Tightly Bound: Training on a single loss function improves multimiodal AI.

GPT-4 Has Landed: Everything you need to know about GPT-4.

Segmented Images, No Labeled Data: Improved unsupervised learning for semantic segmentation

Transparency for AI as a Service: Amazon introduces service cards to enhance responsible AI.

Will We Have Enough Data?

Reza Zadeh: Generative AI can bring a breakthrough in active learning, says Matroid founder Reza Zadeh

One Model Does It All: Multi-task AI models got more sophisticated in 2022.

AI's Eyes Evolve: Vision transformer research exploded in 2022.

Subscribe to The Batch