Z.ai more than doubled the size of its flagship large language model to deliver outstanding performance among open-weights competitors.
What’s new: GLM-5 is designed for long-running agentic tasks. It tops other open-weights models in Artificial Analysis’ Intelligence Index.
- Input/output: Text in (up to 200,000 tokens), text out (up to 128,000 tokens)
- Architecture: Mixture-of-experts transformer, 744 billion parameters, 40 billion active parameters per token
- Features: Function calling, reasoning, context caching
- Performance: Best among open-weights models on Artificial Analysis Intelligence Index, 𝜏²-Bench Telecom, Vending Bench 2, and Chatbot Arena Code
- Availability/price: Web interface free, weights available via Hugging Face for commercial and noncommercial uses under MIT license, API $1.00/$0.20/$3.20 per million input/cached/output tokens, coding plans $27 to $216 per quarter
- Undisclosed: Specific architecture, training data, and method
How it works: Z.ai disclosed few details about the GLM-5’s architecture and training.
- The company pretrained GLM-5 on 28.5 trillion tokens, up from the 23 trillion tokens for GLM-4.5.
- For post-training, the company used slime, open-source software for reinforcement learning, originated by Z.ai, in which data generation and training are independent processes. The company says this infrastructure improved training throughput, enabling more iterations during reinforcement learning.
- GLM-5 uses DeepSeek sparse attention, which reduces computation over long contexts by processing only the most relevant portions of long inputs rather than attending to every token.
Performance: GLM-5 achieved the highest performance among open-weights models in some coding and agentic tasks but generally trailed proprietary frontier models.
- On Artificial Analysis’ Intelligence Index, a weighted average of 10 evaluations that focus on economically useful work, GLM-5 with reasoning enabled (50) surpassed the previous open-weights leader, Kimi K2.5 set to reasoning (47). It trailed Claude Opus 4.6 set to adaptive reasoning (53) and GPT-5.2 set to xhigh reasoning (51).
- GLM-5 also showed strength in agentic tasks. On 𝜏²-Bench Telecom, which tests the ability of conversational agents to collaborate with users in technical support scenarios, GLM-5 achieved 98 percent (with reasoning) and 97 percent (without reasoning), while Qwen3-Max-Thinking (98.2 percent) set the state of the art. On Vending-Bench 2, a simulated business scenario designed to measure agentic performance over long contexts, GLM-5 ($4,432.12) outperformed all open-weights models tested including Kimi K2.5 ($1,198.46).
- In the Chatbot Code Arena, where human judges compare models head-to-head, GLM-5 (1449 Elo) ranked first among open-weights models. It ranked sixth overall, trailing Claude Opus 4.6 (1567 Elo), tied with Gemini 3 Pro, and outperforming Kimi K2.5 (1447 Elo).
Why it matters: On Artificial Analysis’ Intelligence Index, GLM-5 nearly matches proprietary leaders Claude Opus 4.6 and GPT-5.2. The shrinking gaps between open-weights and proprietary models give developers high-performance options to modify and/or run on their own hardware.
We’re thinking: The center of gravity in open-weights AI has shifted decisively eastward. Developers in China have been responsible for a succession of leading open-weights large language models lately, including GLM 4.5, Kimi K2, Qwen3-VL-235B-A22B, and Kimi K2.5.