Large Language Models (LLMs)

218 Posts

Flowchart shows book text split, input summary, model training, and memorization testing in LLM workflow.
Large Language Models (LLMs)

Fine-Tuning LLMs to Expand on Summaries Unearths Pretraining Texts: Fine-Tuning can strip models of copyright alignment guidelines

Fine-tuning large language models on a seemingly benign task that would be useful to writers — expanding plot summaries into paragraphs of polished fiction — causes them to regurgitate substantial portions of books on which they were pretrained.
Comments highlight using Singapore nodes for AI access, tying to gray market themes discussed.
Large Language Models (LLMs)

Inside the Gray Market for LLM Access: Middlemen package extra tokens, hijack IDs to resell, distill models

An ecosystem of API proxy servers enables AI developers in China to access top U.S. models at deeply discounted prices.
Flowchart depicting LLMs memorizing and responding to state media, affecting language-specific outputs.
Large Language Models (LLMs)

Qwen3.7-Max Adds Speed and Power: Alibaba's latest proprietary model challenges U.S. rivals

Alibaba updated its flagship large language model for long-running agentic work, pushing it into the top rank among LLMs built in China.
Diagram showing threat actor using AI to find vulnerabilities and bypass two-factor authentication.
Large Language Models (LLMs)

Cybersecurity Alarms Grow Louder: Google study shows LLM-generated malware is getting harder to track and stop

An AI-generated script to bypass two-factor authentication signals a dawning era of industrial-scale cyberattacks, according to a Google report.
A woman in martial arts attire faces off against a cartoon lobster in a futuristic cityscape.
Large Language Models (LLMs)

Hermes Agent Challenges OpenClaw: OpenClaw created a class of personal agents; upstart Hermes Agent is outworking it

OpenClaw, the immensely popular AI agent, has fast-rising competition.
A graph shows assistant behavior shifting between helpful and role-playing, with conversation bubbles.
Large Language Models (LLMs)

Assistants That Assist Consistently: Large language models can drift drift from helpful personas to harmful ones, but new research aims to stabilize them

Typically, large language models are trained to act as helpful, harmless, honest assistants. However, during long or emotionally charged conversations, traits can emerge that are less beneficial. Researchers devised a way to steady the assistant personas of LLMs.
GLM-5.1 excels in SWE-Bench Pro and Terminal-Bench 2.0, leading in coding and reasoning tests.
Large Language Models (LLMs)

GLM-5.1 Aims for Long-Running Tasks: Z.ai’s GLM 5.1 evaluates interim results and may change its approach hundreds of times before it delivers final output

Z.ai updated its flagship open-weights large language model to work autonomously on single tasks for up to eight hours.
Image depicts persona generator creating synthetic personas, with outputs analyzed for diversity metrics.
Large Language Models (LLMs)

Simulating Diverse Human Cohorts: Persona generation simulates human characters across a controllable range of points of view

If you want to understand how the public will respond to your offerings, large language models can simulate users who answer questions about capabilities, features, promotions, or prices.
Table compares AI models' performance across benchmarks, showing Claude Mythos Preview leading.
Large Language Models (LLMs)

Claude Mythos Preview Raises Security Worries: Why Claude’s advanced Mythos Preview model will be limited-release-only

Anthropic took unusual steps to prepare the world for a forthcoming large language model that it said poses extraordinary risks to cybersecurity.
Two graphs show TTT-E2E maintains stable loss and latency across increasing context lengths up to 128k.
Large Language Models (LLMs)

Learning Long Context at Inference: Test-Time Training End-to-End (TTT-E2E) retrains model weights to handle long inputs

Large language models typically become less accurate and slower when they process longer contexts, but researchers enabled an LLM to keep accuracy stable and inference time constant as its context grew.
Top graph (blue) shows GPT-5 score drop; bottom graph (orange) shows RLM maintaining higher scores.
Large Language Models (LLMs)

Context As An External Variable: Recursive Language Models offer path to aramatically expand beyond the context window

When processing long contexts, large language models often lose track of details or devolve into nonsense. Researchers reduced these effects by managing context externally.
The chart compares Nemotron 3 models’ performance in accuracy and processing speed against other AI models.
Large Language Models (LLMs)

Open-Source Speed Demon: Nvidia’s open Nemotron 3 Super 120B-A12B model sets new paces in its class

Nvidia, the dominant supplier of AI chips, released a competitive open-source large language model whose speed tops its size class — the first open-weights leader to come from the United States since last year, when Meta delivered Llama 4.
Infographic on mobile AI use in 2025: 149B downloads, $167B revenue, 5.3T usage hours, 3.6 hours/day, 34 apps/month.
Large Language Models (LLMs)

AI on Mobile Skyrockets: State of Mobile 2026 Report shows AI chatbot, search, and assistant growth outpaces gaming, social, and more

Downloads of mobile AI apps and resulting revenue are surging.
Table shows GPT-5.4 outperforms in GDPval and Tau2-bench Telecom, setting new state-of-the-art scores.
Large Language Models (LLMs)

GPT-5.4’s Higher Performance, Higher Price: OpenAI’s GPT-5.4 Pro and GPT-5.4 Thinking challenge Google’s Gemini 3.1 Pro Preview as best all-around AI model

OpenAI updated its flagship models, extending the ability to use tools and setting the state of the art on a handful of benchmarks, and priced them at the top of the market. Its coding and agentic abilities have enabled Codex, OpenAI’s competitor to Anthropic’s Claude Code, to leap ahead.
Diagram depicts a math problem-solving workflow from problem generation to verification and revision.
Large Language Models (LLMs)

Agent Solves Stubborn Math Problems: Google’s Aletheia uses Gemini 3 Deep Think to find original mathematics solutions

LLMs have achieved gold-medal performance in math competitions. An agentic system showed strength in mathematical research as well.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox