Transformer

144 Posts

Ancient Scrolls Recovered: Researchers decipher scrolls charred by Mount Vesuvius using AI.
Transformer

Ancient Scrolls Recovered: Researchers decipher scrolls charred by Mount Vesuvius using AI.

Three researchers decoded scrolls that had gone unread since they were turned into charcoal by the eruption of Mount Vesuvius in the year 79.
SingSong's process for manufacturing instrumental music to accompany input vocals.
Transformer

Sing a Tune, Generate an Accompaniment: SingSong, a tool that generates instrumental music for unaccompanied input vocals

A neural network makes music for unaccompanied vocal tracks. Chris Donahue, Antoine Caillon, Adam Roberts, and colleagues at Google proposed SingSong, a system that generates musical accompaniments for sung melodies. You can listen to its output here.
Google’s Multimodal Challenger: All you need to know about Gemini, Google's new multimodal model
Transformer

Google’s Multimodal Challenger: All you need to know about Gemini, Google's new multimodal model

Google unveiled Gemini, its bid to catch up to, and perhaps surpass, OpenAI’s GPT-4. Google demonstrated the Gemini family of models that accept any combination of text (including code), images, video, and audio and output text and images. The demonstrations and metrics were impressive...
Animated diagram depicting the problem setup and proposed method
Transformer

Robot, Find My Keys: A machine learning model for robots to predict the location of objects in households

Researchers proposed a way for robots to find objects in households where things get moved around. Andrey Kurenkov and colleagues at Stanford University introduced Node Edge Predictor, a model that learned to predict where objects were located in houses.
Taming Transformers: Researchers find new strategies to accelerate transformer architecture.
Transformer

Taming Transformers: Researchers find new strategies to accelerate transformer architecture.

The transformer architecture is astonishingly powerful but notoriously slow. Researchers have developed numerous tweaks to accelerate it — enough to warrant a look at how these alternatives work, their strengths, and their weaknesses.
Masked Pretraining for CNNs: ConvNeXt V2, the new model family that boosts ConvNet performance
Transformer

Masked Pretraining for CNNs: ConvNeXt V2, the new model family that boosts ConvNet performance

Vision transformers have bested convolutional neural networks (CNNs) in a number of key vision tasks. Have CNNs hit their limit? New research suggests otherwise.
Diffusion Transformed: A new class of diffusion models based on the transformer architecture
Transformer

Diffusion Transformed: A new class of diffusion models based on the transformer architecture

A tweak to diffusion models, which are responsible for most of the recent excitement about AI-generated images, enables them to produce more realistic output.
Where Is Meta’s Generative Play?: Why Meta still lacks a flagship generative AI service
Transformer

Where Is Meta’s Generative Play?: Why Meta still lacks a flagship generative AI service

While Microsoft and Google scramble to supercharge their businesses with text generation, Meta has yet to launch a flagship generative AI service. Reporters went looking for reasons why.
What the Brain Sees: How a text-to-image model generates images from brain scans
Transformer

What the Brain Sees: How a text-to-image model generates images from brain scans

A pretrained text-to-image generator enabled researchers to see — roughly — what other people looked at based on brain scans. Yu Takagi and Shinji Nishimoto developed a method that uses Stable Diffusion to reconstruct images viewed by test subjects...
Falcon Ascends: Falcon, the new open source commercial LLM, explained
Transformer

Falcon Ascends: Falcon, the new open source commercial LLM, explained

A team in the United Arab Emirates, a seven-state federation on the Arabian Peninsula, built the latest top-performing open source large language model.
Text-to-Image Editing Evolves: InstructPix2Pix for text-to-image editing, explained
Transformer

Text-to-Image Editing Evolves: InstructPix2Pix for text-to-image editing, explained

Text-to-image generators like DALL·E 2, Stable Diffusion, and Adobe’s new Generative Fill feature can revise images in a targeted way — say, change the fruit in a bowl from oranges to bananas — if you enter a few words that describe the change plus an indication of the areas to be changed.
Goodbye Prompt Engineering, Hello Prompt Generation: Automatic Prompt Engineer (APE) research summary.
Transformer

Goodbye Prompt Engineering, Hello Prompt Generation: Automatic Prompt Engineer (APE) research summary.

When you’re looking for answers from a large language model, some prompts are better than others. So how can you come up with the best one? A new model automates the process.
Example of interactive editing sessions with Meta's text generator PEER
Transformer

Collaborative Text Generator: A language model that collaborates with human writers

Text from current language models can be useful as a rough draft, but that leaves the polishing to human writers. A language model learned how to generate and respond to editorial directions.
Transformer-based system simulating simulate the Atari game "Pong"
Transformer

Efficient Reinforcement Learning: IRIS used reinforcement learning to master Atari games with little gameplay.

Both transformers and reinforcement learning models are notoriously data-hungry. They may be less so when they work together. Vincent Micheli and colleagues at the University of Geneva trained a transformer-based system to simulate Atari games using a small amount of gameplay.
Vision and Language Tightly Bound: Training on a single loss function improves multimiodal AI.
Transformer

Vision and Language Tightly Bound: Training on a single loss function improves multimiodal AI.

Recent multimodal models process both text and images as sequences of tokens, but they learn to represent these distinct data types using separate loss functions. Recent work unifies the loss function as well.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox