Large Language Models (LLMs)

227 Posts

Flowchart of an ESMC-6B model with sequence encoding layers, language model, and diffusion transformer output.
Large Language Models (LLMs)

Biological Molecules as Language: ESMFold2 approaches AlphaFold 3 performance but with an open, Transformer-based architecture

Google’s AlphaFold models pioneered the task of finding the shapes of biologically active molecules, opening new pathways for drug development.
AFM 3 Core model architecture visualizes DRAM and NAND processes in AI with focus on sparsely-activated LLM operations.
Large Language Models (LLMs)

Large-Model AI for Apple Devices: 2026's Apple Foundation Models bring AI to MacBooks, iPhones, and the cloud

The third generation of Apple Foundation Models — fruit of Apple’s collaboration with Google — introduces a variation on the mixture-of-experts architecture that runs on local devices. 
AI performance chart shows GLM, GPT models competing in reasoning, coding benchmarks. Models highlight performance.
Large Language Models (LLMs)

Top Agentic Performance, Low Cost: GLM-5.2, designed for coding and long-running agentic jobs, now the top open model

Z.ai released an open-weights model that rivals proprietary leaders for autonomous agentic tasks.
Flowchart illustrates the POPE method, transitioning from guided to unguided problem-solving in reinforcement learning.
Large Language Models (LLMs)

Reinforcement Learning With Hints: Privileged On-Policy Exploration (POPE) trains models to expand on partial solutions

Reinforcement learning can’t train a model to solve a difficult problem if the model doesn’t discover all the right steps.
Performance table shows Nemotron's scores across benchmarks, highlighting its strengths and weaknesses.
Large Language Models (LLMs)

Nvidia’s Nemotron Goes Big: Nvidia Nemotron 3 Ultra bets on speed and openness to win customers

Nvidia’s largest-yet model is among the best-performing from a developer based in the U.S. and among the most open developed by anyone.
A line graph compares SWE-Bench Pro and DeepSWE, showing various models' performance percentages.
Large Language Models (LLMs)

Agentic Tests Beyond the Bug Hunt: DeepSWE, ProgramBench, and ITBench-AA push agents harder than SWE-bench

SWE-bench, a family of benchmarks that focuses on an LLM’s ability to fix software bugs, is giving way to new tests that evaluate agent software-engineering performance in more challenging ways.
Diagram illustrates LLMs processing state-coordinated media, affecting linguistic responses and predictions.
Large Language Models (LLMs)

State Media Influences LLM Responses: Significant portions of AI training material reflect national propaganda

Popular large language models have adopted the biases of governments that control the free flow of information, particularly when those models generate output in the languages of countries where such governments are in power, researchers found.
Chart compares performance of Composer 2.5 against Opus 4.7, GPT-5.5, and Composer 2 in benchmarks.
Large Language Models (LLMs)

Cursor Fits Its Model to Its Agent: Composer 2.5 for Cursor rivals GPT-5.5's coding abilities at lower price

Cursor’s latest software engineering model rivals the performance of leading competitors like Claude Opus 4.7 and GPT 5.5 for a fraction of the price.
Claude Mythos 5 excels, achieving top scores in agentic coding and cybersecurity compared to rivals.
Large Language Models (LLMs)

Behold Mythos!: Anthropic released Claude Mythos 5 and Claude Fable 5, a public version with safeguards

After months of headlines that teased a large language model with extraordinary capabilities, Anthropic launched Claude Mythos 5, which can crack software previously believed to be secure, and Claude Fable 5, a version for general use that limits what users can do in an unprecedented way.
Flowchart shows book text split, input summary, model training, and memorization testing in LLM workflow.
Large Language Models (LLMs)

Fine-Tuning LLMs to Expand on Summaries Unearths Pretraining Texts: Fine-Tuning can strip models of copyright alignment guidelines

Fine-tuning large language models on a seemingly benign task that would be useful to writers — expanding plot summaries into paragraphs of polished fiction — causes them to regurgitate substantial portions of books on which they were pretrained.
Comments highlight using Singapore nodes for AI access, tying to gray market themes discussed.
Large Language Models (LLMs)

Inside the Gray Market for LLM Access: Middlemen package extra tokens, hijack IDs to resell, distill models

An ecosystem of API proxy servers enables AI developers in China to access top U.S. models at deeply discounted prices.
Flowchart depicting LLMs memorizing and responding to state media, affecting language-specific outputs.
Large Language Models (LLMs)

Qwen3.7-Max Adds Speed and Power: Alibaba's latest proprietary model challenges U.S. rivals

Alibaba updated its flagship large language model for long-running agentic work, pushing it into the top rank among LLMs built in China.
Diagram showing threat actor using AI to find vulnerabilities and bypass two-factor authentication.
Large Language Models (LLMs)

Cybersecurity Alarms Grow Louder: Google study shows LLM-generated malware is getting harder to track and stop

An AI-generated script to bypass two-factor authentication signals a dawning era of industrial-scale cyberattacks, according to a Google report.
A woman in martial arts attire faces off against a cartoon lobster in a futuristic cityscape.
Large Language Models (LLMs)

Hermes Agent Challenges OpenClaw: OpenClaw created a class of personal agents; upstart Hermes Agent is outworking it

OpenClaw, the immensely popular AI agent, has fast-rising competition.
A graph shows assistant behavior shifting between helpful and role-playing, with conversation bubbles.
Large Language Models (LLMs)

Assistants That Assist Consistently: Large language models can drift drift from helpful personas to harmful ones, but new research aims to stabilize them

Typically, large language models are trained to act as helpful, harmless, honest assistants. However, during long or emotionally charged conversations, traits can emerge that are less beneficial. Researchers devised a way to steady the assistant personas of LLMs.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox