Machine Learning Research

303 Posts

Memory-Efficient Optimizer: A method to reduce memory needs when fine-tuning AI models
Machine Learning Research

Memory-Efficient Optimizer: A method to reduce memory needs when fine-tuning AI models

Researchers devised a way to reduce memory requirements when fine-tuning large language models. Kai Lv and colleagues at Fudan University proposed low memory optimization (LOMO), a modification of stochastic gradient descent that stores less data than other optimizers during fine-tuning.
Better Images, Less Training: WĂĽrstchen, a speedy, high-quality image generator
Machine Learning Research

Better Images, Less Training: WĂĽrstchen, a speedy, high-quality image generator

The longer text-to-image models train, the better their output — but the training is costly. Researchers built a system that produced superior images after far less training.
LLMs Can Get Inside Your Head: AI models show promise in understanding human beliefs, research reveals
Machine Learning Research

LLMs Can Get Inside Your Head: AI models show promise in understanding human beliefs, research reveals

Most people understand that others’ mental states can differ from their own. For instance, if your friend leaves a smartphone on a table and you privately put it in your pocket, you understand that your friend continues to believe it was on the table.
More Consistent Generated Videos: Lumiere, a system that achieves unprecedented motion realism in video
Machine Learning Research

More Consistent Generated Videos: Lumiere, a system that achieves unprecedented motion realism in video

Text-to-video has struggled to produce consistent motions like walking and rotation. A new approach achieves more realistic motion.
Learning the Language of Geometry: AlphaGeometry, a system that nears expert proficiency in proving complex geometry theorems
Machine Learning Research

Learning the Language of Geometry: AlphaGeometry, a system that nears expert proficiency in proving complex geometry theorems

Machine learning algorithms often struggle with geometry. A language model learned to prove relatively difficult theorems. 
SingSong's process for manufacturing instrumental music to accompany input vocals.
Machine Learning Research

Sing a Tune, Generate an Accompaniment: SingSong, a tool that generates instrumental music for unaccompanied input vocals

A neural network makes music for unaccompanied vocal tracks. Chris Donahue, Antoine Caillon, Adam Roberts, and colleagues at Google proposed SingSong, a system that generates musical accompaniments for sung melodies. You can listen to its output here.
Inference time procedure for GILL
Machine Learning Research

Text or Images, Input or Output: GILL, an innovative approach to multimodal model training

GPT-4V introduced a large multimodal model that generates text from images and, with help from DALL-E 3, generates images from text. However, OpenAI hasn’t fully explained how it built the system. A separate group of researchers described their own method.
The Big Picture and the Details: I-JEPA, or how vision models understand the relationship between parts and the whole
Machine Learning Research

The Big Picture and the Details: I-JEPA, or how vision models understand the relationship between parts and the whole

A novel twist on self-supervised learning aims to improve on earlier methods by helping vision models learn how parts of an image relate to the whole.
Multitask Vision Transformer
Machine Learning Research

Multitask Vision Transformer

The original DINO showed that a vision transformer pretrained on unlabeled images could learn representations that were sufficient for classifying and segmenting images. In an update of that work, the model learned representations useful in a wider variety of tasks.
Animated diagram depicting the problem setup and proposed method
Machine Learning Research

Robot, Find My Keys: A machine learning model for robots to predict the location of objects in households

Researchers proposed a way for robots to find objects in households where things get moved around. Andrey Kurenkov and colleagues at Stanford University introduced Node Edge Predictor, a model that learned to predict where objects were located in houses.
Screenshot of a pedestrian detector
Machine Learning Research

Seeing Darker-Skinned Pedestrians: Children and people with darker skin face higher street risks with object detectors, research finds.

In a study, models used to detect people walking on streets and sidewalks performed less well on adults with darker skin and children of all skin tones.
Figure showing how PaLM-E operates on multimodal sentences
Machine Learning Research

This Language Model Speaks Robot: PaLM-E, the model that improves robot control with large language model expertise

A pretrained large language model has helped a robot resolve high-level commands into sequences of subtasks. It can do this more precisely with additional training — both on language-vision tasks and robotics tasks. 
Taming Transformers: Researchers find new strategies to accelerate transformer architecture.
Machine Learning Research

Taming Transformers: Researchers find new strategies to accelerate transformer architecture.

The transformer architecture is astonishingly powerful but notoriously slow. Researchers have developed numerous tweaks to accelerate it — enough to warrant a look at how these alternatives work, their strengths, and their weaknesses.
Assembly pseudocode before and after applying the AlphaDev swap move
Machine Learning Research

AI Builds Better Sorting Algorithms: AlphaDev, a new system for high-speed sorting of lists and numbers

Online sorting algorithms run trillions of times a day to organize lists according to users’ interests. New work found faster alternatives. Daniel J. Mankowitz and colleagues at Google developed AlphaDev, a system that learned to generate algorithms that sort three...
Synthetic Data Helps Image Classification: StableRep, a method that trains vision transformers on images generated by Stable Diffusion
Machine Learning Research

Synthetic Data Helps Image Classification: StableRep, a method that trains vision transformers on images generated by Stable Diffusion

Generated images can be more effective than real ones in training a vision model to classify images. Yonglong Tian, Lijie Fan, and colleagues at Google and MIT introduced StableRep, a self-supervised method that trains vision transformers on images generated by...
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox