Machine Learning Research

345 Posts

Collection of various toys, including a raccoon, a bus, and a tree.
Machine Learning Research

2D-to-3D Goes Mainstream: AI systems from Stability AI and Shutterstock transform 2D images into 3D meshes in seconds

Traditionally, building 3D meshes for gaming, animation, product design, architecture, and the like has been labor-intensive. Now the ability to generate 3D meshes from a single image is widely available.
Hierarchical K-means diagram with data clustering across multiple layers.
Machine Learning Research

Balancing Web Data Distributions: Automated method organizes large datasets for better model performance

Datasets that were scraped from the web tend to be unbalanced, meaning examples of some classes (say, cats) are plentiful while examples of others (say, caterpillars) are scarce.
Gemma Scope 2
Machine Learning Research

Making LLMs Explainable: Google’s Gemma Scope probes how large language models think

Researchers have probed the inner workings of individual layers of large language models. A new tool applies this approach to all layers.
Short, Medium and Long Context RAG
Machine Learning Research

Models Ranked for Hallucinations: Measuring language model hallucinations during information retrieval

How often do large language models make up information when they generate text based on a retrieved document? A study evaluated the tendency of popular models to hallucinate while performing retrieval-augmented generation (RAG). 
Throughput and latency at different context lengths
Machine Learning Research

Long Context Gets Up to Speed: AI21 Labs’ Jamba 1.5 outpaces transformers in long-text processing

A new model generates tokens faster than current transformers, especially when processing long inputs.
A graphic shows an any-to-any multimodal model, with text mapping to RGB or geometric modalities.
Machine Learning Research

Multimodal to the Max: 4M-21 multimodal model excels in handling diverse input and output types

Researchers introduced a model that handles an unprecedented number of input and output types, including many related to performing computer vision tasks.
The SWE-bench full leaderboard shows Cosine Genie outperforming its competitors.
Machine Learning Research

Agentic Coding Strides Forward: Genie coding assistant outperforms competitors on SWE-bench by over 30 percent

An agentic coding assistant boosted the state of the art in an important benchmark by more than 30 percent.
Given an initial data pool of 128M samples, we train ViT-B/32 CLIP models for a total of 640M samples.
Machine Learning Research

Scaling Laws for Data Quality: Scaling laws reveal the impact of data quality in vision-language model training

When training vision-language models, developers often remove lower-quality examples from the training set. But keeping only the highest-quality examples may not be ideal, researchers found.
How Qwen2-Audio performs against the competitors.
Machine Learning Research

Open Models for Math and Audio: Alibaba advances open-weight LLMs with Qwen2 Math and Audio variants

Alibaba followed up its open-weights Qwen2 large language models with specialized variations.
Google Imagen 3 Raises the Bar: Google’s Imagen 3 outperforms rivals in text-to-image benchmarks
Machine Learning Research

Google Imagen 3 Raises the Bar: Google’s Imagen 3 outperforms rivals in text-to-image benchmarks

Image generation continued its rapid march forward with a new version of Google’s flagship text-to-image model.
Conceptual illustration of The A I Scientist, an end-to-end LLM-driven scientific discovery process.
Machine Learning Research

AI Agents for AI Research: Agentic workflow generates novel scientific research papers

While some observers argue that large language models can’t produce truly original output, new work prompted them to generate novel scientific research.
Machine Translation Goes Agentic: TransAgents, a system that boosts literary translation with a multi-agent workflow
Machine Learning Research

Machine Translation Goes Agentic: TransAgents, a system that boosts literary translation with a multi-agent workflow

Literary works are challenging to translate. Their relative length, cultural nuances, idiomatic expressions...
Art Attack: ArtPrompt, a technique that exploits ASCII art to bypass LLM safety measures
Machine Learning Research

Art Attack: ArtPrompt, a technique that exploits ASCII art to bypass LLM safety measures

Seemingly an innocuous form of expression, ASCII art opens a new vector for jailbreak attacks on large language models (LLMs), enabling them to generate outputs that their developers tuned them to avoid producing.
Synthetic Data Factory: AgentInstruct, a framework for generating diverse synthetic data for LLM fine-tuning
Machine Learning Research

Synthetic Data Factory: AgentInstruct, a framework for generating diverse synthetic data for LLM fine-tuning

Researchers increasingly fine-tune models on synthetic data, but generated datasets may not be sufficiently diverse. New work used agentic workflows to produce diverse synthetic datasets.
Expressive Synthetic Talking Heads: Microsoft's VASA-1 delivers more lifelike talking-head videos
Machine Learning Research

Expressive Synthetic Talking Heads: Microsoft's VASA-1 delivers more lifelike talking-head videos

Previous systems that produce a talking-head video from a photo and a spoken-word audio clip animate the lips and other parts of the face separately.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox