Large Language Models (LLMs)

208 Posts

Top graph (blue) shows GPT-5 score drop; bottom graph (orange) shows RLM maintaining higher scores.
Large Language Models (LLMs)

Context As An External Variable: Recursive Language Models offer path to aramatically expand beyond the context window

When processing long contexts, large language models often lose track of details or devolve into nonsense. Researchers reduced these effects by managing context externally.
The chart compares Nemotron 3 models’ performance in accuracy and processing speed against other AI models.
Large Language Models (LLMs)

Open-Source Speed Demon: Nvidia’s open Nemotron 3 Super 120B-A12B model sets new paces in its class

Nvidia, the dominant supplier of AI chips, released a competitive open-source large language model whose speed tops its size class — the first open-weights leader to come from the United States since last year, when Meta delivered Llama 4.
Infographic on mobile AI use in 2025: 149B downloads, $167B revenue, 5.3T usage hours, 3.6 hours/day, 34 apps/month.
Large Language Models (LLMs)

AI on Mobile Skyrockets: State of Mobile 2026 Report shows AI chatbot, search, and assistant growth outpaces gaming, social, and more

Downloads of mobile AI apps and resulting revenue are surging.
Table shows GPT-5.4 outperforms in GDPval and Tau2-bench Telecom, setting new state-of-the-art scores.
Large Language Models (LLMs)

GPT-5.4’s Higher Performance, Higher Price: OpenAI’s GPT-5.4 Pro and GPT-5.4 Thinking challenge Google’s Gemini 3.1 Pro Preview as best all-around AI model

OpenAI updated its flagship models, extending the ability to use tools and setting the state of the art on a handful of benchmarks, and priced them at the top of the market. Its coding and agentic abilities have enabled Codex, OpenAI’s competitor to Anthropic’s Claude Code, to leap ahead.
Diagram depicts a math problem-solving workflow from problem generation to verification and revision.
Large Language Models (LLMs)

Agent Solves Stubborn Math Problems: Google’s Aletheia uses Gemini 3 Deep Think to find original mathematics solutions

LLMs have achieved gold-medal performance in math competitions. An agentic system showed strength in mathematical research as well.
Bar graph depicts rising efficiency in AI models from 2023 to 2025, highlighting energy gains.
Large Language Models (LLMs)

Can Local AI Stand In for the Cloud?: Stanford and Together.AI researchers chart edge models’ performance in intelligence per watt

Projected demand for output from large language models is spurring a massive buildout of data centers. Researchers asked whether smaller models running on local devices could meaningfully lighten that load.
A line graph showing S&P Software & Services Index dropping sharply from February 20 to February 24.
Large Language Models (LLMs)

Investors Panic Over Agentic AI: Claude Cowork plugins trigger a SaaS stock selloff, but partnerships lead to slight rebound

Makers of software that runs large companies saw their share prices plunge as investors worried that AI systems could undermine their businesses. This week, their stocks rebounded somewhat as Anthropic partnered with some of the same companies.
Two comparison tables show AI model performance across varied benchmarks, highlighting LFM2.5-1.2B.
Large Language Models (LLMs)

Faster Reasoning at the Edge: Liquid AI’s small reasoning model mixes attention with convolutional layers for efficiency

Reasoning models in the 1 to 2 billion-parameter range typically require more than 1 gigabyte of RAM to run. Liquid AI released one that runs in less than 900 megabytes, and does it with exceptional speed and efficiency.
Benchmark table shows GLM-5 outperforming other models in reasoning, coding, and general agent tasks.
Large Language Models (LLMs)

GLM-5 Scales Up: Z.ai’s updated model boasts top open-weights Intelligence Index score

Z.ai more than doubled the size of its flagship large language model to deliver outstanding performance among open-weights competitors.
A SpaceX rocket hovers in Earth’s atmosphere, representing SpaceX and xAI’s strategic shift toward space-based AI projects.
Large Language Models (LLMs)

xAI Blasts Off: SpaceX acquires xAI, announces plans for data centers In space

Elon Musk’s SpaceX acquired xAI, opening the door to richer financing of the merged entity’s AI research, a tighter focus on space applications of AI, and — if Musk’s dreams are realized — solar-powered data centers in space.
A performance table shows Claude Opus 4.6 outperforming competitors in terminal coding, computer use, tool use, search, and problem-solving.
Large Language Models (LLMs)

Claude Opus 4.6 Reasons More Over Harder Problems: Anthropic updates flagship model, places first on Intelligence Index

Anthropic updated its flagship large language model to handle longer, more complex agentic tasks.
Lines connect multiple Wikipedia globe logos, symbolizing data exchange and partnerships.
Large Language Models (LLMs)

AI Giants Share Wikipedia’s Costs: Wikimedia Foundation strikes deals with Amazon, Meta, Microsoft, Mistral AI, and Perplexity

On its 25th anniversary, Wikipedia celebrated with high-profile deals to make its data easier for AI companies to train their models in exchange for financial support.
Diagram shows sales, campaign, social posts before and after LLM simulation feedback loops.
Large Language Models (LLMs)

Training For Engagement Can Degrade Alignment: “Moloch’s Bargain” shows fine-tuning can affect social values

Individuals and organizations increasingly use large language models to produce media that helps them compete for attention. Does fine-tuning LLMs to encourage engagement, purchases, or votes affect their alignment with social values? Researchers found that it does.
AI models’ performance shown in bars; GPT-5.2 highest at 51, reflecting updated benchmarks.
Large Language Models (LLMs)

Artificial Analysis Revamps Intelligence Index: Independent AI testing authority turns from saturated knowledge benchmarks to harder business tests

Artificial Analysis, which tests AI systems, updated the component evaluations in its Intelligence Index to better reflect large language models’ performance in real-world use cases.
Apple logo side by side with Google's logo, symbolizing their AI partnership.
Large Language Models (LLMs)

Apple’s Foundation Models Will Be Gemini: Apple announced a partnership with Google to power Siri and other AI features

Apple cut a multi-year deal with Google to use Gemini models as the basis of AI models that reside on Apple devices.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox