Large Language Models (LLMs)

183 Posts

Diagram shows AI traits with pipelines for "evil" vs. "helpful" responses to user queries on animal treatment.
Large Language Models (LLMs)

Toward Steering LLM Personality: Persona Vectors allow model builders to identify and edit out sycophancy, hallucinations, and more

Large language models can develop character traits like cheerfulness or sycophancy during fine-tuning. Researchers developed a method to identify, monitor, and control such traits.
Two figures, symbolizing Microsoft and Anthropic, handshake to represent partnership and collaboration.
Large Language Models (LLMs)

Microsoft and Anthropic Form Alliance: Claude becomes the first leading language model available from all three cloud giants

Having recently revised its agreement with longtime partner OpenAI, Microsoft pledged to invest billions of dollars in Anthropic, one of OpenAI’s top competitors.
Image illustrates the Self-Search method, simulating web searches to improve model accuracy in tests.
Large Language Models (LLMs)

More-Efficient Agentic Search: Researchers fine-tune models to search their own parameters to boost recall

Large language models may have learned knowledge that’s relevant to a given prompt, but they don’t always recall it consistently. Fine-tuning a model to search its parameters as though it were searching the web can help it find knowledge in its own weights.
Chart highlights Kimi K2’s top performance in agentic tasks, outperforming rivals in reasoning and coding.
Large Language Models (LLMs)

Top Agentic Results, Open Weights: Kimi K2 Thinking outperforms proprietary models with new techniques for agentic tool use

The latest open-weights large language model from Moonshot AI challenges top proprietary LLMs at agentic tasks by executing hundreds of tool calls sequentially and pausing to think between each.
Chart illustrates exact and approximate memorization percentages in different Gemma models.
Large Language Models (LLMs)

Masking Private Data in Training Sets: Google researchers released VaultGemma, an open-weights model redacting personal information

Large language models often memorize details in their training data, including private information that may appear only once, like a person’s name, address, or phone number. Researchers built the first open-weights language model that’s guaranteed not to remember such facts.
Chart displays MiniMax-M2 with high intelligence and competitive pricing, outshining other models.
Large Language Models (LLMs)

Open-Weights Coding Leader: MiniMax-M2’s lightweight footprint and low costs belie that its top performance

An open-weights model from Shanghai-based MiniMax challenges top proprietary models on key benchmarks for coding and agentic tasks.
Kids in costumes face a locked door, with candy visible behind bars, symbolizing restricted web data.
Large Language Models (LLMs)

Web Data Diminishes: What if online publishers make it harder and more expensive to train models?

For decades, AI developers have treated the web as an open faucet of training data. Now publishers are shutting the tap. Will web data dry up?
A rabbit leads a viking-costumed person into a hole, holding a bag of toys, against a forest backdrop.
Large Language Models (LLMs)

Chatbots Lead Users Into Rabbit Holes: When paranoia, delusions, and other signs of mental illness meet AI

Conversations with chatbots are loosening users’ grips on reality, fueling the sorts of delusions that can trigger episodes of severe mental illness. Are AI models driving us insane?
Flowchart details GEPA algorithm, featuring candidate filtering and performance improvement loops.
Large Language Models (LLMs)

Better Agentic Prompts Automatically: Authors devised GEPA, an algorithm for better prompts to improve agentic systems’ performance

Honing an agent’s prompt can yield better results than fine-tuning the underlying large language model via reinforcement learning.
Graph showing increasing security risks from 9% to 92% as MCP servers rise from 1 to 10.
Large Language Models (LLMs)

MCP Poses Security Risks: Experts identify holes in the popular Model Context Protocol for attackers to access data

The ability to easily connect large language models to tools and data sources has made Model Context Protocol popular among developers, but it also opens security holes, research shows.
A performance comparison table highlights Ling-1T's success in reasoning and coding tasks against rivals.
Large Language Models (LLMs)

Reasoning Without “Thinking”: All about Ant Group’s Ling-1T, an open, non-reasoning model that outperforms closed competitors

Reasoning models typically learn to undertake a separate process of “thinking” through their output of before they produce final response. Ant Group built a top non-reasoning model that can take similar steps as part of its immediate response.
Close-up of a violin scroll and pegs, symbolizing precision needed in fine-tuning AI models.
Large Language Models (LLMs)

Fine-Tuning Simplified: Thinking Machines’ new Tinker API makes it easier to fine-tune models on many GPUs

The first offering from Thinking Machines Lab, the startup founded by former OpenAI CTO Mira Murati, aims to simplify — and democratize — the process of fine-tuning AI models.
Graphs compare DeepSeek models showing reduced cost per million tokens with V3.2-Exp over V3.1-Terminus.
Large Language Models (LLMs)

DeepSeek Cuts Inference Costs: DeepSeek-V3.2-Exp streamlines processing using a "lightning indexer," boosting efficiency

DeepSeek’s latest large language model can cut inference costs by more than half and processes long contexts dramatically faster relative to its predecessor.
Flowchart of Text-to-LoRA model processes task embeddings into LoRA adapters, showing weights and losses.
Large Language Models (LLMs)

LoRA Adapters On Tap: Text-to-LoRA generates task-specific LoRA adapters directly from natural language descriptions

The approach known as LoRA streamlines fine-tuning by training a small adapter that modifies a pretrained model’s weights at inference. Researchers built a model that generates such adapters directly.
Bar chart comparing performance of Qwen3 models against others in diverse tasks, highlighting Qwen3-Max.
Large Language Models (LLMs)

Qwen3 Goes Big (and Smaller): Alibaba expands Qwen3 family with a 1 trillion-parameter Max model, open-weights Qwen3-VL, and the Qwen3-Omni voice model

Alibaba rounded out the Qwen3 family with its biggest large language model to date as well as smaller models that process text, images, video, and/or audio.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox