Large Language Models (LLMs)

213 Posts

A graph shows assistant behavior shifting between helpful and role-playing, with conversation bubbles.
Large Language Models (LLMs)

Assistants That Assist Consistently: Large language models can drift drift from helpful personas to harmful ones, but new research aims to stabilize them

Typically, large language models are trained to act as helpful, harmless, honest assistants. However, during long or emotionally charged conversations, traits can emerge that are less beneficial. Researchers devised a way to steady the assistant personas of LLMs.
GLM-5.1 excels in SWE-Bench Pro and Terminal-Bench 2.0, leading in coding and reasoning tests.
Large Language Models (LLMs)

GLM 5.1 Aims for Long-Running Tasks: Z.ai’s GLM 5.1 evaluates interim results and may change its approach hundreds of times before it delivers final output

Z.ai updated its flagship open-weights large language model to work autonomously on single tasks for up to eight hours.
Image depicts persona generator creating synthetic personas, with outputs analyzed for diversity metrics.
Large Language Models (LLMs)

Simulating Diverse Human Cohorts: Persona generation simulates human characters across a controllable range of points of view

If you want to understand how the public will respond to your offerings, large language models can simulate users who answer questions about capabilities, features, promotions, or prices.
Table compares AI models' performance across benchmarks, showing Claude Mythos Preview leading.
Large Language Models (LLMs)

Claude Mythos Preview Raises Security Worries: Why Claude’s advanced Mythos Preview model will be limited-release-only

Anthropic took unusual steps to prepare the world for a forthcoming large language model that it said poses extraordinary risks to cybersecurity.
Two graphs show TTT-E2E maintains stable loss and latency across increasing context lengths up to 128k.
Large Language Models (LLMs)

Learning Long Context at Inference: Test-Time Training End-to-End (TTT-E2E) retrains model weights to handle long inputs

Large language models typically become less accurate and slower when they process longer contexts, but researchers enabled an LLM to keep accuracy stable and inference time constant as its context grew.
Top graph (blue) shows GPT-5 score drop; bottom graph (orange) shows RLM maintaining higher scores.
Large Language Models (LLMs)

Context As An External Variable: Recursive Language Models offer path to aramatically expand beyond the context window

When processing long contexts, large language models often lose track of details or devolve into nonsense. Researchers reduced these effects by managing context externally.
The chart compares Nemotron 3 models’ performance in accuracy and processing speed against other AI models.
Large Language Models (LLMs)

Open-Source Speed Demon: Nvidia’s open Nemotron 3 Super 120B-A12B model sets new paces in its class

Nvidia, the dominant supplier of AI chips, released a competitive open-source large language model whose speed tops its size class — the first open-weights leader to come from the United States since last year, when Meta delivered Llama 4.
Infographic on mobile AI use in 2025: 149B downloads, $167B revenue, 5.3T usage hours, 3.6 hours/day, 34 apps/month.
Large Language Models (LLMs)

AI on Mobile Skyrockets: State of Mobile 2026 Report shows AI chatbot, search, and assistant growth outpaces gaming, social, and more

Downloads of mobile AI apps and resulting revenue are surging.
Table shows GPT-5.4 outperforms in GDPval and Tau2-bench Telecom, setting new state-of-the-art scores.
Large Language Models (LLMs)

GPT-5.4’s Higher Performance, Higher Price: OpenAI’s GPT-5.4 Pro and GPT-5.4 Thinking challenge Google’s Gemini 3.1 Pro Preview as best all-around AI model

OpenAI updated its flagship models, extending the ability to use tools and setting the state of the art on a handful of benchmarks, and priced them at the top of the market. Its coding and agentic abilities have enabled Codex, OpenAI’s competitor to Anthropic’s Claude Code, to leap ahead.
Diagram depicts a math problem-solving workflow from problem generation to verification and revision.
Large Language Models (LLMs)

Agent Solves Stubborn Math Problems: Google’s Aletheia uses Gemini 3 Deep Think to find original mathematics solutions

LLMs have achieved gold-medal performance in math competitions. An agentic system showed strength in mathematical research as well.
Bar graph depicts rising efficiency in AI models from 2023 to 2025, highlighting energy gains.
Large Language Models (LLMs)

Can Local AI Stand In for the Cloud?: Stanford and Together.AI researchers chart edge models’ performance in intelligence per watt

Projected demand for output from large language models is spurring a massive buildout of data centers. Researchers asked whether smaller models running on local devices could meaningfully lighten that load.
A line graph showing S&P Software & Services Index dropping sharply from February 20 to February 24.
Large Language Models (LLMs)

Investors Panic Over Agentic AI: Claude Cowork plugins trigger a SaaS stock selloff, but partnerships lead to slight rebound

Makers of software that runs large companies saw their share prices plunge as investors worried that AI systems could undermine their businesses. This week, their stocks rebounded somewhat as Anthropic partnered with some of the same companies.
Two comparison tables show AI model performance across varied benchmarks, highlighting LFM2.5-1.2B.
Large Language Models (LLMs)

Faster Reasoning at the Edge: Liquid AI’s small reasoning model mixes attention with convolutional layers for efficiency

Reasoning models in the 1 to 2 billion-parameter range typically require more than 1 gigabyte of RAM to run. Liquid AI released one that runs in less than 900 megabytes, and does it with exceptional speed and efficiency.
Benchmark table shows GLM-5 outperforming other models in reasoning, coding, and general agent tasks.
Large Language Models (LLMs)

GLM-5 Scales Up: Z.ai’s updated model boasts top open-weights Intelligence Index score

Z.ai more than doubled the size of its flagship large language model to deliver outstanding performance among open-weights competitors.
A SpaceX rocket hovers in Earth’s atmosphere, representing SpaceX and xAI’s strategic shift toward space-based AI projects.
Large Language Models (LLMs)

xAI Blasts Off: SpaceX acquires xAI, announces plans for data centers In space

Elon Musk’s SpaceX acquired xAI, opening the door to richer financing of the merged entity’s AI research, a tighter focus on space applications of AI, and — if Musk’s dreams are realized — solar-powered data centers in space.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox