Large Language Models (LLMs)

132 Posts

Dual line graphs showing factual QA accuracy and NLL against memory size for NQ and TQA datasets in AI models.
Large Language Models (LLMs)

Memory Layers for More-Factual Output: Meta researchers build Llama-style models that recall details without needing more computing resources

Improving a large language model’s factual accuracy typically requires making it bigger, which in turn, involves more computation. Researchers devised an architecture that enables models to recall relevant details without significantly increasing the amount of computation required.
Comparison table of AI models ranked by LCB score and Codeforces rating with percentiles for competitive programming.
Large Language Models (LLMs)

Open, Compact Code Generator: DeepCoder-14B-Preview further fine-tunes reasoning models for coding

An open-source code generator performs comparably to the reasoning models DeepSeek-R1 and OpenAI o1 with a much smaller model.
Table comparing AI model accuracy on math and reasoning benchmarks including AIME, HMMT, OmniMath, GPQA-D, and Codeforces.
Large Language Models (LLMs)

Reasoning Models With Recipes: Microsoft unveils training details for Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning

Microsoft published its latest recipe for training reasoning models, substantially expanding what is still a fairly small base of public knowledge.
Chart showing LLM accuracy increasing with reasoning tokens across math and science benchmarks like AIME24 and GPQA.
Large Language Models (LLMs)

One Weird Trick for Better Reasoning: Researchers fine-tune LLM for reasoning with only 1,000 examples

Researchers showed that supervised fine-tuning on as few as 1,000 examples can enable a pretrained large language model to reason — and a clever gambit can boost its performance to rival that of top reasoning models.
LLM performance benchmark table comparing Qwen, OpenAI, Gemini, and others on coding, math, and language tasks.
Large Language Models (LLMs)

Qwen3 Takes On DeepSeek-R1: Alibaba releases the Qwen3 family of open LLMs with optional reasoning

Alibaba’s new model family may unseat DeepSeek-R1’s four-month reign as the top open-weights large language model.
Diagram of LLM-based preference approximation and multimodal sequential recommendation for personalized product suggestions.
Large Language Models (LLMs)

Inferring Customer Preferences: LLMs boost shopping recommendations by decoding what users want

Large language models can improve systems that recommend items to purchase by inferring customer preferences.
Comparison chart of GPT-4.1, o3, and o4-mini with other models on coding, math, tool use, and multimodal reasoning benchmarks.
Large Language Models (LLMs)

OpenAI Launches Cost-Effective Alternatives: OpenAI replaces GPT-4.5 with GPT-4.1 Family, plus o3 and o4-mini, new models focused on reasoning and coding

OpenAI refreshed its roster of models and scheduled the largest, most costly one for removal.
Diagram of latent transformer model using byte-level encoding, patching, and cross-attention for next-byte prediction.
Large Language Models (LLMs)

Toward LLMs That Understand Misspellings: New byte-based model beats Llama 3 on spelling, noise, and translation

Researchers built a model that’s more robust to noisy inputs like misspellings, smarter about character-level information like the number of R's in strawberry, and potentially better able to understand unfamiliar languages that might share groups of letters with familiar languages.
Illustration of a businessman in a blue suit sitting alone at the head of a long boardroom table with black chairs.
Large Language Models (LLMs)

The Fall and Rise of Sam Altman: Inside Sam Altman’s brief ouster from OpenAI

A behind-the-scenes account provides new details about the abrupt firing and reinstatement of OpenAI CEO Sam Altman in November 2023.
Diagram of Modal Context Protocol showing MCP client-server architecture, APIs, and local and remote data sources.
Large Language Models (LLMs)

Open Standard for Tool Use and Data Access Gains Momentum: OpenAI adopts Model Context Protocol to boost LLM tool integration

OpenAI embraced Model Context Protocol, providing powerful support for an open standard that connects large language models to tools and data.
AI benchmark comparison chart showing Gemini 2.5 Pro, GPT-4.5, Claude, Grok, and others across science, math, code, and reasoning.
Large Language Models (LLMs)

Google Unveils Gemini 2.5: Google’s Gemini 2.5 Pro Experimental outperforms top AI models

Google’s new flagship model raised the state of the art in a variety of subjective and objective tests.
Llama 4 Behemoth benchmark chart comparing coding, reasoning, and multilingual scores with Claude, Gemini, and GPT-4.5.
Large Language Models (LLMs)

Llama’s Mixture of Vision-Language Experts: Meta releases Llama 4 models, claims edge over AI competitors

Meta updated its popular open-weights models, claiming performance superior to closed competitors in three size classes.
AI tutoring system interface showing real-time context integration, privacy, and expert-like feedback generation.
Large Language Models (LLMs)

LLM Support for Tutors: GPT-4 boosts remote tutors’ performance in real time, study finds

Students benefit from tutoring, but training tutors is expensive. A study shows that large language models can boost tutors’ effectiveness in real time.
Comparison table of Gemini and Gemma models across benchmarks like MMLU, MATH, and GPQA with radar charts.
Large Language Models (LLMs)

Vision-Language, Compact and Open: Google releases Gemma 3 vision-language models with open weights

Google updated its open-weights family of large language models to include versions that handle image and video inputs.
GIF of AI-assisted art: A landscape is edited, a cyborg sketch turns photorealistic, and a cat reads a newspaper, showing human input for copyright
Large Language Models (LLMs)

Some AI-Generated Works Are Copyrightable: U.S. Copyright Office says that no new laws are needed for AI-generated works

The United States Copyright Office determined that existing laws are sufficient to decide whether a given AI-generated work is protected by copyright, making additional legislation unnecessary.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox