Machine Learning Research

363 Posts

Temporal pyramids in rows (left) and position encoding in space-time pyramid shown in the pyramidal flow matching process.
Machine Learning Research

Faster, Cheaper Video Generation: Pyramidal Flow Matching, a cost-cutting method for training video generators

Researchers devised a way to cut the cost of training video generators. They used it to build a competitive open source text-to-video model and promised to release the training code.
Comparison table of pre-trained models like Mistral, Llama, and Gemma, showcasing performance across evaluation metrics.
Machine Learning Research

Mistral AI Sharpens the Edge: Mistral AI unveils Ministral 3B and 8B models, outperforming rivals in small-scale AI

Mistral AI launched two models that raise the bar for language models with 8 billion or fewer parameters, small enough to run on many edge devices.
Diagram of a transformer model using Jina embeddings and LoRA adapters, tailored for tasks like sentiment classification.
Machine Learning Research

Better Text Embeddings: Jina AI launches jina-embeddings-v3, a text embedding model with task-specific adapters

Text embedding models are often used to retrieve text, cluster text, determine similarity between texts, and generate initial embeddings for text classifiers. A new embedding model comes with adapters that specialize it to each of these use cases.
Diagram illustrating the process of developing, deploying, and promoting a malicious LLM application.
Machine Learning Research

AI’s Criminal Underground Revealed: Researchers uncover black market for AI-driven cybercrime services

Researchers probed the black market for AI services that are designed to facilitate cybercrime.
A smartphone on a table showing an incoming call with voice waveform displayed on screen.
Machine Learning Research

Voice-to-Voice and More for GPT-4o API: OpenAI unveils tools for speech, vision, and cost-efficiency at DevDay

OpenAI launched a suite of new and updated tools to help AI developers build applications and reduce costs.
A demonstration of video editing through text input, altering a runner’s background and costume.
Machine Learning Research

Familiar Faces, Synthetic Soundtracks: Meta debuts Movie Gen for text-to-video generation with consistent characters

Meta upped the ante for text-to-video generation with new systems that produce consistent characters and matching soundtracks.
A GIF showcasing a dynamic spreadsheet interaction using AI, with cells being populated and analyzed automatically.
Machine Learning Research

Enabling LLMs to Read Spreadsheets: A method to process large spreadsheets for accurate question answering

Large language models can process small spreadsheets, but very large spreadsheets often exceed their limits for input length. Researchers devised a method that processes large spreadsheets so LLMs can answer questions about them.
An interior design assistant tool analyzing an image of a modern living room.
Machine Learning Research

Llama Herd Expands: Meta updates Llama models with vision-language, edge sizes, and agentic APIs

Meta extended its Llama family of models into two new categories: vision-language and sizes that are small enough to fit in edge devices.
Man playing table tennis against a robotic arm, which returns the ball during the match.
Machine Learning Research

Robot Server: Google’s table tennis robot triumphs over beginners

A robot that plays table tennis beats human beginners and entertains experts.
More, Better Open Source Options: Alibaba releases Qwen 2.5 models, raising the bar for open weight LLMs
Machine Learning Research

More, Better Open Source Options: Alibaba releases Qwen 2.5 models, raising the bar for open weight LLMs

The parade of ever more capable LLMs continues with Qwen 2.5.
Reducing Memorization in LLMs: A technique that masks tokens in large language models, protecting data privacy
Machine Learning Research

Reducing Memorization in LLMs: A technique that masks tokens in large language models, protecting data privacy

Studies have established that large language models can memorize the text passages they’ve been trained on repeatedly and regurgitate them when prompted in adversarial and, though rarely, in benign ways.
OpenAI's model scores on the GPQA Diamond tests in biology, chemistry, and physics, along with their overall score.
Machine Learning Research

OpenAI Forges Chains of Thought: OpenAI’s o1 models excel in reasoning, outperform GPT-4o in math and coding

Preliminary versions of OpenAI’s new model family were trained explicitly to think step-by-step, yielding outstanding marks in math, science, and coding — but users can’t see their reasoning steps.
Collection of various toys, including a raccoon, a bus, and a tree.
Machine Learning Research

2D-to-3D Goes Mainstream: AI systems from Stability AI and Shutterstock transform 2D images into 3D meshes in seconds

Traditionally, building 3D meshes for gaming, animation, product design, architecture, and the like has been labor-intensive. Now the ability to generate 3D meshes from a single image is widely available.
Hierarchical K-means diagram with data clustering across multiple layers.
Machine Learning Research

Balancing Web Data Distributions: Automated method organizes large datasets for better model performance

Datasets that were scraped from the web tend to be unbalanced, meaning examples of some classes (say, cats) are plentiful while examples of others (say, caterpillars) are scarce.
Gemma Scope 2
Machine Learning Research

Making LLMs Explainable: Google’s Gemma Scope probes how large language models think

Researchers have probed the inner workings of individual layers of large language models. A new tool applies this approach to all layers.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox