Qwen3.5 Outperforms Bigger Models, Leads Vision Benchmarks Alibaba’s latest flagship models are open-weights MoE performers in sizes from less than 1B parameters

Published

Mar 20, 2026

Reading time

3 min read

The Qwen3.5 family of open-weights vision-language models includes impressive larger models as well as a smaller one that outperforms an OpenAI open-weights model 10 times its size.

What’s new: Alibaba released the Qwen3.5 family of eight open weights vision-language models. The largest are Qwen3.5-397B-A17B (397 billion parameters, 17 billion active per token), which offers open weights, and Qwen3.5-Plus, a hosted version of Qwen3.5-397B-A17B that supports agentic applications by providing a larger input context and built-in tools that it can select autonomously. Four medium-size models include the open-weights Qwen3.5-122B-A10B, Qwen3.5-35B-A3B, and Qwen3.5-27B, plus Qwen3.5-Flash, a hosted version of Qwen3.5-53B-A3B that’s outfitted for agentic applications. Among the smaller Qwen3.5 members of the family — Qwen3.5-9B, Qwen3.5-4B, Qwen3.5-2B, and Qwen3.5-0.8B — the 9-billion and 4-billion parameter variations rival the performance of much larger models.

Input/output: Text, image, video in (open-weights models 254,000 tokens extensible to 1 million tokens, hosted models up to 1 million tokens by default), text out (up to 64,000 tokens)
Architecture: Mixture-of-experts or dense transformer with mixed attention and Gated DeltaNet layers, unspecified vision encoder
Performance: Excellent vision performance overall; Qwen3.5-9B (9 billion parameters) outperforms gpt-oss-120B (120 billion parameters) on many language tasks.
Availability: Open weights are freely available under the Apache 2.0 license; API for hosted open-weights models via Alibaba Cloud Model Studio (prices vary according to the specific model, $0.20-$0.60/$2-$3.6 per 1 million input/output tokens); API for Qwen3.5-Plus $0.4/$0.04/$2.4; API for Qwen3.5-Flash $0.1/$0.01/$0.4 per 1 million input/cached/output tokens.
Features: 201 languages, tool use, web search, chain-of-thought reasoning
Undisclosed: Vision encoder, training data, and methods

How it works: Alibaba shared little information about how it built the Qwen3.5 family.

Qwen3.5 is built on the Qwen3-Next architecture, a variation on the Qwen3-30B-A3B architecture and training method that’s modified to increase training efficiency and stability.
Qwen3.5 was trained on a “significantly larger scale of visual-text tokens” than Qwen3.

Results: Tested by Alibaba, all Qwen3.5 models excelled at vision tasks, outperforming much larger models, and some turned in competitive results in language tasks as well. Qwen3.5-9B and Qwen3.5-4B showed the most impressive performance overall, shining in both vision and language tasks, even compared to much larger models. The two smallest variations lack comparative metrics.

On 28 of 44 vision benchmarks, Qwen3.5-397B-A17B outperformed GPT-5.2, Claude 4.5 Opus, and Gemini-3 Pro, whose parameter counts are undisclosed but almost certainly much larger. In a variety of language tasks, Qwen3.5-397B-A17B outperformed either GPT-5.2, Claude 4.5 Opus, or Gemini-3 Pro, but generally it did not outperform all three.
On most language and vision benchmarks tested, Qwen3.5-122B-A10B and Qwen3.5-27B exceeded GPT-5-mini (parameter count undisclosed). Generally, Qwen3.5-122B-A10B, a mixture-of-experts architecture that activates 10 billion parameters per token, outperformed Qwen3.5-27B, a dense architecture of 27 billion parameters. Qwen3.5-35B-A3B generally underperformed the smaller Qwen3.5-27B and Qwen3.5-122B-A10B, but nonetheless it outperformed GPT-5-mini in 58 of 74 benchmarks tested.
Qwen3.5-9B outdistanced OpenAI’s language model gpt-oss-120b — more than 10 times bigger — on most of the language benchmarks tested except reasoning and coding tasks. Similarly, Qwen3.5-4B outperformed OpenAI’s language model gpt-oss-20b on most language benchmarks tested except reasoning and coding tasks. On most of the vision benchmarks tested, both Qwen3.5-9B and Qwen3.5-4B outperformed the vision-language models GPT-5-nano and Gemini-2.5-Flash-Lite.

Behind the news: Shortly after rolling out the Qwen3 family, Lin Junyang, the team’s technical lead and a key architect of the models, abruptly resigned with a post on the X social network that read, “Bye my beloved qwen.” The Chinese tech-news outlet 36kr.com subsequently reported that four other members of the team resigned in his wake. In a January public appearance, Lin had said, “We are stretched thin — just meeting delivery demands consumes most of our resources,” Bloomberg reported. Alibaba responded by putting the Qwen project under tighter supervision by senior leadership and promising to invest further in AI development.

Why it matters: All Qwen3.5 models deliver stellar vision performance for their sizes, but the smaller models — especially Qwen3.5-9B — are small enough to run on consumer laptops while delivering performance that previously required an 80GB GPU like a Nvidia H100.

We’re thinking: Vision-language models with reasoning capability that are small enough to run locally means reduced cost, better privacy, and new vistas for vision-language applications.

Subscribe to The Batch