Machine Learning Research

408 Posts

Diagram illustrating Moshi’s use of an LLM to process user audio input, inner monologue, and output.
Machine Learning Research

Okay, But Please Don’t Stop Talking: Moshi, an open alternative to OpenAI’s Realtime API for Speech

Even cutting-edge, end-to-end, speech-to-speech systems like ChatGPT’s Advanced Voice Mode tend to get interrupted by interjections like “I see” and “uh-huh” that keep human conversations going. Researchers built an open alternative that’s designed to go with the flow of overlapping speech.
Line charts showing performance improvements in math and science with 2.0 Flash Thinking models.
Machine Learning Research

Gemini Thinks Faster: Google’s Gemini 2.0 Flash Thinking advances in reasoning, outperforms DeepSeek-R1

Google updated the December-vintage reasoning model Gemini 2.0 Flash Thinking and other Flash models, gaining ground on OpenAI o1 and DeepSeek-R1.
Flowchart illustrating the automation of opening, editing, and saving a Word document using PyAutoGUI.
Machine Learning Research

Training for Computer Use: UI-TARS shows strong computer use capabilities in benchmarks

As Anthropic, Google, OpenAI, and others roll out agents that are capable of computer use, new work shows how underlying models can be trained to do this.
Bar chart animation showing accuracy improvements in AIME 2024 competition math models.
Machine Learning Research

Reasoning in High Gear: o3-mini, a faster, more affordable reasoning model for coding, math, and science

OpenAI introduced a successor to its o1 models that’s faster, less expensive, and especially strong in coding, math, and science.
Bar chart comparing active vs. random sampling effects on length, diversity, and toxicity after fine-tuning.
Machine Learning Research

Fine-Tuning Fine Points: Active inheritance, a smarter way to fine-tune models on synthetic data

The practice of fine-tuning models on synthetic data is becoming well established. But synthetic training data, even if it represents the training task well, may include characteristics like toxicity that impart unwelcome properties in the trained model’s output...
AI assistant processes ‘Find me a family-friendly campsite’ and suggests options.
Machine Learning Research

Computer Use Gains Momentum: OpenAI’s Operator automates online tasks with a new AI agent

OpenAI introduced an AI agent that performs simple web tasks on a user’s behalf.
Diagram of a reinforcement learning system for training LLMs, showing data and weight flow processes.
Machine Learning Research

Reinforcement Learning Heats Up: How DeepSeek-R1 and Kimi k1.5 use reinforcement learning to improve reasoning

Reinforcement learning is emerging as an avenue for building large language models with advanced reasoning capabilities.
Workflow for inverse design using deep learning to predict S-parameters and radiation in structures.
Machine Learning Research

Generated Chip Designs Work in Mysterious Ways: Researchers used deep learning and an evolutionary algorithm to design chips in minutes

Designing integrated circuits typically requires years of human expertise. Recent work set AI to the task with surprising results.
Bar chart comparing accuracy and percentile scores of DeepSeek models and OpenAI models across benchmarks.
Machine Learning Research

DeepSeek Sharpens Its Reasoning: DeepSeek-R1, an affordable rival to OpenAI’s o1

A new open model rivals OpenAI’s o1, and it’s free to use or modify.
X-CLR loss: training models to link text captions and image similarity.
Machine Learning Research

Calibrating Contrast: X-CLR, an approach to contrastive learning for better vision models

Contrastive loss functions make it possible to produce good embeddings without labeled data. A twist on this idea makes even more useful embeddings.
DeepSeek-V3 accuracy across benchmarks compared to other AI models.
Machine Learning Research

DeepSeek Ups the Open Weights Ante: DeepSeek-V3 redefines LLM performance and cost efficiency

A new model from Hangzhou upstart DeepSeek delivers outstanding performance and may change the equation for training costs.
Diagram of Localize-and-Stitch merging fine-tuned models by combining critical weights into one model.
Machine Learning Research

Better Performance From Merged Models: Localize-and-Stitch improves methods for merging and fine-tuning multiple models

Merging multiple fine-tuned models is a less expensive alternative to hosting multiple specialized models. But, while model merging can deliver higher average performance across several tasks, it often results in lower performance on specific tasks. New work addresses this issue.
A narrow library aisle filled with shelves stacked with countless books.
Machine Learning Research

Massively More Training Text: Harvard unveils a million-book corpus for AI training

Harvard University amassed a huge new text corpus for training machine learning models.
Claude 3 Opus performs the Self-Exfiltration task, balancing renewable goals and corporate priorities.
Machine Learning Research

Models Can Use Tools in Deceptive Ways: Researchers expose AI models' deceptive behaviors

Large language models have been shown to be capable of lying when users unintentionally give them an incentive to do so. Further research shows that LLMs with access to tools can be incentivized to use them in deceptive ways.
Top use cases for Claude.ai, with percentages for tasks like app development and content creation.
Machine Learning Research

What LLM Users Want: Anthropic reveals how users interact with Claude 3.5

Anthropic analyzed 1 million anonymized conversations between users and Claude 3.5 Sonnet. The study found that most people used the model for software development and also revealed malfunctions and jailbreaks.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox