Multimodal

15 Posts

One Model, Hundreds of Tasks
Multimodal

One Model, Hundreds of Tasks

Researchers took a step toward achieving a longstanding goal: One model that performs a whole lot of very different tasks.What's new: Scott Reed, Konrad Żołna, Emilio Parisotto and a team at
3 min read
Multimodal deep learning model
Multimodal

AI Versus the Garbage Heap

Amazon reported long-term success using machine learning to shrink its environmental footprint. What’s new: The online retailer developed a system that fuses product descriptions, images, and structured data to decide how an item should be packed for shipping. It evolved over six years, ultimately helping Amazon cut packaging waste equivalent to over 2 billion shipping boxes. How it works: The system initially made packaging decisions based on text descriptions. Last year, the company integrated
2 min read
Multimodal AI Takes Off: Multimodal Models, such as CLIP and Dall-E, Are Taking Over AI
Multimodal

Multimodal AI Takes Off: Multimodal Models, such as CLIP and Dall-E, Are Taking Over AI

While models like GPT-3 and EfficientNet, which work on text and images respectively, are responsible for some of deep learning’s highest-profile successes, approaches that find relationships between text and images made impressive
1 min read
Animation showing how MERLOT is able to match contextualized captions with their corresponding video frames
Multimodal

Richer Video Representations: Pretraining Method Improves AI's Ability to Understand Video

To understand a movie scene, viewers often must remember or infer previous events and extrapolate potential consequences. New work improved a model’s ability to do the same.
2 min read
Series of example of accurate and inaccurate matching images to text
Multimodal

Crawl the Web, Absorb the Bias: Language Models Absorb Biases from Web Training Data

The emerging generation of trillion-parameter models needs datasets of billions of examples, but the most readily available source of examples on that scale — the web — is polluted with bias and antisocial expressions. A new study examines the issue.
2 min read
Animations that shows how the Google Search Algorithm works with Multimodal AI
Multimodal

Search Goes Multimodal: Google Upgrades its Search Algorithm with Multimodal AI

Google will upgrade its search engine with a new model that tracks the relationships between words, images, and, in time, videos — the first fruit of its latest research into multimodal machine learning and multilingual language modeling.
2 min read
Frozen Pretrained Transformer (FPT) explained
Multimodal

Transformers: Smarter Than You Think

The transformer architecture has shown an uncanny ability to model not only language but also images and proteins. New research found that it can apply what it learns from the first domain to the others.
2 min read
Image showing how object detectors work
Multimodal

I Know It When I See It

Object detectors typically detect only items that were labeled in their training data. A new method liberates them to locate and recognize a much wider variety of objects.
2 min read
Architecture of vision-language tasks
Multimodal

One Model for Vision-Language

Researchers have proposed task-agnostic architectures for image classification tasks and language tasks. New work proposes a single architecture for vision-language tasks.
2 min read
CogView home website
Multimodal

Large Language Models for Chinese

Researchers unveiled competition for the reigning large language model GPT-3. Four models collectively called Wu Dao were described by Beijing Academy of Artificial Intelligence, a research collective funded by the Chinese government, according to Synced Review.
2 min read
System Oscar+ working
Multimodal

Sharper Eyes For Vision+Language

Models that interpret the interplay of words and images tend to be trained on richer bodies of text than images. Recent research worked toward giving such models a more balanced knowledge of the two domains.
2 min read
Art pieces with subjective commentary regarding their emotional impact
Multimodal

How Art Makes AI Feel

An automated art critic spells out the emotional impact of images. Led by Panos Achlioptas, researchers at Ecole Polytechnique, King Abdullah University, and Stanford University trained a deep learning system to generate subjective interpretations of art.
2 min read
AI-generated images with the model DALL-E
Multimodal

Tell Me a Picture

Two new models show a surprisingly sharp sense of the relationship between words and images. OpenAI, the for-profit research lab, announced a pair of models that have produced impressive results in multimodal learning: DALL·E.
2 min read
Examples of clothes image-text combo search
Multimodal

That Online Boutique, But Smarter

Why search for “a cotton dress shirt with button-down collar, breast pockets, barrel cuffs, scooped hem, and tortoise shell buttons in grey” when a photo and the words “that shirt, but grey” will do the trick? A new network understands the image-text combo.
2 min read
Animated drawing of hardware related to AI
Multimodal

Horsepower for Next-Gen Networks

The for-profit research organization OpenAI has a new supercomputer to help achieve its dream of building the world’s most sophisticated AI. Microsoft engineered the new hardware network to train immense models on thousands of images, texts, and videos simultaneously.
1 min read

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox