Two images showing the process of turning handwriting into text
Machine Learning Research

The Writing, Not the Doodles

Systems designed to turn handwriting into text typically work best on pages with a consistent layout, such as a single column unbroken by drawings, diagrams, or extraneous symbols. A new system removes that requirement.
Neural networks generating novel views of a 3D scene based on existing pictures
Machine Learning Research

3D Scene Synthesis for the Real World

Researchers have used neural networks to generate novel views of a 3D scene based on existing pictures plus the positions and angles of the cameras that took them. In practice, though, you may not know the precise camera
Architecture of vision-language tasks
Machine Learning Research

One Model for Vision-Language

Researchers have proposed task-agnostic architectures for image classification tasks and language tasks. New work proposes a single architecture for vision-language tasks.
Protein structures
Machine Learning Research

What AI Knows About Proteins

Transformer models trained on sequences of amino acids that form proteins have had success classifying and generating viable sequences. New research shows that they also capture information about protein structure.
A new metod for compressing images and yielding better classification
Machine Learning Research

What Machines Want to See

Researchers typically downsize images for vision networks to accommodate limited memory and accelerate processing. A new method not only compresses images but yields better classification.
FastNeRF accelerates the photorealistic 3D rendering method
Machine Learning Research

Virtual Reality in Real Time

Ideally, real-time 3D applications such as virtual and augmented reality transition smoothly between different viewpoints of a scene — but generating a fresh perspective can take time. New research speeds the process.
Minecraft video capture
Machine Learning Research

3D Object Factory

In the open-ended video game Minecraft, players extract blocks of virtual materials from a 3D environment to assemble objects of their own design, from trees to cathedrals. Researchers trained neural networks to generate these structures.
Diagram showing how Project Debater works
Machine Learning Research

Up for Debate

IBM’s Watson question-answering system stunned the world in 2011 when it bested human champions of the TV trivia game show Jeopardy! Although the Watson brand has fallen on hard times, the company’s language-processing prowess continues to develop.
Semantic Similarity Video Retrieval (SVR) working
Machine Learning Research

Toward Better Video Search

Researchers at the University of Bristol led by Michael Wray propose a new benchmark, Semantic Similarity Video Retrieval (SVR), that evaluates video retrieval systems by their ability to rank many similar videos. They also built a system that performed well on it.
System designed to isolate changes in the pose of a two-dimensional figure
Machine Learning Research

Motion Mapper

In some animated games, different characters can perform the same actions — say, walking, jumping, or casting spells. A new system learned from unlabeled data to transfer such motions from one character to another.
A generative adversarial network (GAN)
Machine Learning Research

Image Generation Transformed

A recent generative adversarial network (GAN) produced more coherent images using modified transformers that replaced fully connected layers with convolutional layers. A new GAN achieved a similar end using transformers in their original form.
Data related to SElf-supERvised (SEER), an image classifier pretrained on uncurated, unlabeled images
Machine Learning Research

Pretraining on Uncurated Data

It’s well established that pretraining a model on a large dataset improves performance on fine-tuned tasks. In sufficient quantity and paired with a big model, even data scraped from the internet at random can contribute to the performance boost.
Examples of image generators using GANsformer
Machine Learning Research

Attention for Image Generation

Attention quantifies how each part of one input affects the various parts of another. Researchers added a step that reverses this comparison to produce more convincing images.
Sequence related to image processing
Machine Learning Research

Vision Models Get Some Attention

Self-attention is a key element in state-of-the-art language models, but it struggles to process images because its memory requirement rises rapidly with the size of the input. New research addresses the issue with a simple twist on a convolutional neural network.
Tag-Retrieve-Compose-Synthesize (TReCS)
Machine Learning Research

Pictures From Words and Gestures

A new system combines verbal descriptions and crude lines to visualize complex scenes. Google researchers led by Jing Yu Koh proposed Tag-Retrieve-Compose-Synthesize (TReCS), a system that generates photorealistic images by describing what they want to see while mousing around on a blank screen.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox