Machine Learning Research

447 Posts

Dual line graphs showing factual QA accuracy and NLL against memory size for NQ and TQA datasets in AI models.

Memory Layers for More-Factual Output: Meta researchers build Llama-style models that recall details without needing more computing resources

Improving a large language model’s factual accuracy typically requires making it bigger, which in turn, involves more computation. Researchers devised an architecture that enables models to recall relevant details without significantly increasing the amount of computation required.

Comparison table of AI models ranked by LCB score and Codeforces rating with percentiles for competitive programming.

Machine Learning Research

Open, Compact Code Generator: DeepCoder-14B-Preview further fine-tunes reasoning models for coding

An open-source code generator performs comparably to the reasoning models DeepSeek-R1 and OpenAI o1 with a much smaller model.

Table comparing AI model accuracy on math and reasoning benchmarks including AIME, HMMT, OmniMath, GPQA-D, and Codeforces.

Machine Learning Research

Reasoning Models With Recipes: Microsoft unveils training details for Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning

Microsoft published its latest recipe for training reasoning models, substantially expanding what is still a fairly small base of public knowledge.

Chart showing LLM accuracy increasing with reasoning tokens across math and science benchmarks like AIME24 and GPQA.

Machine Learning Research

One Weird Trick for Better Reasoning: Researchers fine-tune LLM for reasoning with only 1,000 examples

Researchers showed that supervised fine-tuning on as few as 1,000 examples can enable a pretrained large language model to reason — and a clever gambit can boost its performance to rival that of top reasoning models.

Man at desk overwhelmed by robot coworkers in office setting with city and tree views.

Machine Learning Research

The User Is Always… a Genius!: OpenAI pulls GPT-4o update after users report sycophantic behavior

OpenAI’s most widely used model briefly developed a habit of flattering users, with laughable and sometimes worrisome results.

LLM performance benchmark table comparing Qwen, OpenAI, Gemini, and others on coding, math, and language tasks.

Machine Learning Research

Qwen3 Takes On DeepSeek-R1: Alibaba releases the Qwen3 family of open LLMs with optional reasoning

Alibaba’s new model family may unseat DeepSeek-R1’s four-month reign as the top open-weights large language model.

Diagram of LLM-based preference approximation and multimodal sequential recommendation for personalized product suggestions.

Machine Learning Research

Inferring Customer Preferences: LLMs boost shopping recommendations by decoding what users want

Large language models can improve systems that recommend items to purchase by inferring customer preferences.

AI music generation interface showing waveform and text prompts like deep house, djembe, and saxophone.

Machine Learning Research

Music Generation for Pros: Google upgrades its AI music tools for professional use

Google refreshed its experimental tools for composers and producers.

Animation showing GPT Image 1 generating AI images: emotions, surreal scenes, satire, fantasy, and photo-realistic edits.

Machine Learning Research

New Image Generator for OpenAI API: OpenAI launches API access to GPT Image 1, ChatGPT’s viral image generator

ChatGPT’s image generator is available via API.

AI diagram showing generator and scorer loop to produce final output based on test image of a cat.

Machine Learning Research

Text-Only LLM Goes Multimodal: LLMs learn to caption images, video, and audio without further training

Large language models excel at processing text but can’t interpret images, video, or audio directly without further training on those media types. Researchers devised a way to overcome this limitation.

Person interacting with a humanoid robot using virtual reality headset and controllers.

Machine Learning Research

Hugging Face Rolls Out Open Robot: Hugging Face acquires Pollen Robotics, launches Reachy 2 robot for open-source research

Hugging Face has made a name by providing open AI models. Now it’s providing an open robot.

Comparison chart of GPT-4.1, o3, and o4-mini with other models on coding, math, tool use, and multimodal reasoning benchmarks.

Machine Learning Research

OpenAI Launches Cost-Effective Alternatives: OpenAI replaces GPT-4.5 with GPT-4.1 Family, plus o3 and o4-mini, new models focused on reasoning and coding

OpenAI refreshed its roster of models and scheduled the largest, most costly one for removal.

Diagram of latent transformer model using byte-level encoding, patching, and cross-attention for next-byte prediction.

Machine Learning Research

Toward LLMs That Understand Misspellings: New byte-based model beats Llama 3 on spelling, noise, and translation

Researchers built a model that’s more robust to noisy inputs like misspellings, smarter about character-level information like the number of R's in strawberry, and potentially better able to understand unfamiliar languages that might share groups of letters with familiar languages.

Diagram of Modal Context Protocol showing MCP client-server architecture, APIs, and local and remote data sources.

Machine Learning Research

Open Standard for Tool Use and Data Access Gains Momentum: OpenAI adopts Model Context Protocol to boost LLM tool integration

OpenAI embraced Model Context Protocol, providing powerful support for an open standard that connects large language models to tools and data.

AI benchmark comparison chart showing Gemini 2.5 Pro, GPT-4.5, Claude, Grok, and others across science, math, code, and reasoning.

Machine Learning Research

Google Unveils Gemini 2.5: Google’s Gemini 2.5 Pro Experimental outperforms top AI models

Google’s new flagship model raised the state of the art in a variety of subjective and objective tests.

Machine Learning Research

Memory Layers for More-Factual Output: Meta researchers build Llama-style models that recall details without needing more computing resources

Open, Compact Code Generator: DeepCoder-14B-Preview further fine-tunes reasoning models for coding

Reasoning Models With Recipes: Microsoft unveils training details for Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning

One Weird Trick for Better Reasoning: Researchers fine-tune LLM for reasoning with only 1,000 examples

The User Is Always… a Genius!: OpenAI pulls GPT-4o update after users report sycophantic behavior

Qwen3 Takes On DeepSeek-R1: Alibaba releases the Qwen3 family of open LLMs with optional reasoning

Inferring Customer Preferences: LLMs boost shopping recommendations by decoding what users want

Music Generation for Pros: Google upgrades its AI music tools for professional use

New Image Generator for OpenAI API: OpenAI launches API access to GPT Image 1, ChatGPT’s viral image generator

Text-Only LLM Goes Multimodal: LLMs learn to caption images, video, and audio without further training

Hugging Face Rolls Out Open Robot: Hugging Face acquires Pollen Robotics, launches Reachy 2 robot for open-source research

OpenAI Launches Cost-Effective Alternatives: OpenAI replaces GPT-4.5 with GPT-4.1 Family, plus o3 and o4-mini, new models focused on reasoning and coding

Toward LLMs That Understand Misspellings: New byte-based model beats Llama 3 on spelling, noise, and translation

Open Standard for Tool Use and Data Access Gains Momentum: OpenAI adopts Model Context Protocol to boost LLM tool integration

Google Unveils Gemini 2.5: Google’s Gemini 2.5 Pro Experimental outperforms top AI models

Subscribe to The Batch