Large Language Models (LLMs)

179 Posts

Chart illustrates exact and approximate memorization percentages in different Gemma models.
Large Language Models (LLMs)

Masking Private Data in Training Sets: Google researchers released VaultGemma, an open-weights model redacting personal information

Large language models often memorize details in their training data, including private information that may appear only once, like a person’s name, address, or phone number. Researchers built the first open-weights language model that’s guaranteed not to remember such facts.
Chart displays MiniMax-M2 with high intelligence and competitive pricing, outshining other models.
Large Language Models (LLMs)

Open-Weights Coding Leader: MiniMax-M2’s lightweight footprint and low costs belie that its top performance

An open-weights model from Shanghai-based MiniMax challenges top proprietary models on key benchmarks for coding and agentic tasks.
Kids in costumes face a locked door, with candy visible behind bars, symbolizing restricted web data.
Large Language Models (LLMs)

Web Data Diminishes: What if online publishers make it harder and more expensive to train models?

For decades, AI developers have treated the web as an open faucet of training data. Now publishers are shutting the tap. Will web data dry up?
A rabbit leads a viking-costumed person into a hole, holding a bag of toys, against a forest backdrop.
Large Language Models (LLMs)

Chatbots Lead Users Into Rabbit Holes: When paranoia, delusions, and other signs of mental illness meet AI

Conversations with chatbots are loosening users’ grips on reality, fueling the sorts of delusions that can trigger episodes of severe mental illness. Are AI models driving us insane?
Flowchart details GEPA algorithm, featuring candidate filtering and performance improvement loops.
Large Language Models (LLMs)

Better Agentic Prompts Automatically: Authors devised GEPA, an algorithm for better prompts to improve agentic systems’ performance

Honing an agent’s prompt can yield better results than fine-tuning the underlying large language model via reinforcement learning.
Graph showing increasing security risks from 9% to 92% as MCP servers rise from 1 to 10.
Large Language Models (LLMs)

MCP Poses Security Risks: Experts identify holes in the popular Model Context Protocol for attackers to access data

The ability to easily connect large language models to tools and data sources has made Model Context Protocol popular among developers, but it also opens security holes, research shows.
A performance comparison table highlights Ling-1T's success in reasoning and coding tasks against rivals.
Large Language Models (LLMs)

Reasoning Without “Thinking”: All about Ant Group’s Ling-1T, an open, non-reasoning model that outperforms closed competitors

Reasoning models typically learn to undertake a separate process of “thinking” through their output of before they produce final response. Ant Group built a top non-reasoning model that can take similar steps as part of its immediate response.
Close-up of a violin scroll and pegs, symbolizing precision needed in fine-tuning AI models.
Large Language Models (LLMs)

Fine-Tuning Simplified: Thinking Machines’ new Tinker API makes it easier to fine-tune models on many GPUs

The first offering from Thinking Machines Lab, the startup founded by former OpenAI CTO Mira Murati, aims to simplify — and democratize — the process of fine-tuning AI models.
Graphs compare DeepSeek models showing reduced cost per million tokens with V3.2-Exp over V3.1-Terminus.
Large Language Models (LLMs)

DeepSeek Cuts Inference Costs: DeepSeek-V3.2-Exp streamlines processing using a "lightning indexer," boosting efficiency

DeepSeek’s latest large language model can cut inference costs by more than half and processes long contexts dramatically faster relative to its predecessor.
Flowchart of Text-to-LoRA model processes task embeddings into LoRA adapters, showing weights and losses.
Large Language Models (LLMs)

LoRA Adapters On Tap: Text-to-LoRA generates task-specific LoRA adapters directly from natural language descriptions

The approach known as LoRA streamlines fine-tuning by training a small adapter that modifies a pretrained model’s weights at inference. Researchers built a model that generates such adapters directly.
Bar chart comparing performance of Qwen3 models against others in diverse tasks, highlighting Qwen3-Max.
Large Language Models (LLMs)

Qwen3 Goes Big (and Smaller): Alibaba expands Qwen3 family with a 1 trillion-parameter Max model, open-weights Qwen3-VL, and the Qwen3-Omni voice model

Alibaba rounded out the Qwen3 family with its biggest large language model to date as well as smaller models that process text, images, video, and/or audio.
Icons for files, pictures, and shopping connect through nodes to a dollar sign, illustrating AI-driven profit pathways.
Large Language Models (LLMs)

OpenAI, Meta Diversify AI Product Lines: OpenAI and Meta launch social video apps while ChatGPT adds Pulse and Instant Checkout

OpenAI and Meta, which have been content to offer standalone chatbots or tuck them into existing products, introduced dueling social video networks and other initiatives designed to boost revenue and engagement.
Comparison table highlighting Claude Sonnet 4.5's top scores in coding and reasoning benchmarks, featuring improved capabilities.
Large Language Models (LLMs)

Claude Levels Up: Anthropic launches Claude Sonnet 4.5 and the Claude Agent SDK, and overhauls Claude Code for developers

Anthropic updated its mid-size Claude Sonnet model, making it the first member of the Claude family to reach version 4.5. It also enhanced the Claude Code agentic coding tool with long-desired features.
Flowchart shows data reordering, probability sampling, and effective gradient updating in reinforcement learning.
Large Language Models (LLMs)

Faster Reinforcement Learning: New technique auto-selects training examples to speed up fine-tuning

Fine-tuning large language models via reinforcement learning is computationally expensive, but researchers found a way to streamline the process.
Chart details ChatGPT conversations. Writing (28.1%), info-seeking (21.3%), and guidance (28.3%) lead.
Large Language Models (LLMs)

What ChatGPT Users Want: ChatGPT users now more likely to be young, female, and seeking info, study shows

What do ChatGPT’s 700 million weekly active users do with it? OpenAI teamed up with a Harvard economist to find out.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox