Gemini 3.5 Flash Pairs Smarts With Speed Google's updated Flash levels up, approaching top models but raising prices

Published
Reading time
4 min read
Gemini 3.5 Flash shows improved performance, surpassing previous model scores in most benchmarks.
Loading the Elevenlabs Text to Speech AudioNative Player...

Google’s faster model brings substantive gains at a substantially higher price, part of a rising trend in prices per token.

What’s new: Google launched Gemini 3.5 Flash, an update of its mid-tier multimodal model. The new version offers improvements in agentic capabilities, visual understanding, and speed at a price three times that of its predecessor Gemini 3 Flash.

  • Input/output: Text, images, audio, video in (up to 1 million tokens), text out (up to 64,000 tokens, 204 tokens per second
  • Architecture: Mixture-of-experts transformer 
  • Features: Adjustable reasoning levels (minimal, low, medium, high), thought preservation (which holds reasoning tokens in the context to retain reasoning across multi-turn conversations, similar to Kimi K2.6’s preserved thinking feature), tool use (computer use not yet available)
  • Performance: Tops Artificial Analysis’s APEX-Agents-AA benchmark and MMMU-Pro multimodal benchmark Flash; trails leading models on overall intelligence, knowledge, and coding
  • Availability/price: Free via Gemini app, Google AI Studio, Google Antigravity (within a compute limit that refreshes every 5 hours up to a weekly limit), Google Search AI mode; Gemini Enterprise, Gemini Enterprise Agent Platform, API $1.50/$0.15/$9.00 per million input/cached/output tokens
  • Undisclosed: Parameter count, training data and methods, architectural details

How it works: Google disclosed few details about how it built Gemini 3.5 Flash.

  • Gemini 3.5 Flash is “based on” Gemini 3 Flash, which itself is based on Gemini 3 Pro, according to its model card.
  • It’s a mixture-of-experts transformer that was multimodally pretrained on text, code, images, audio, and video scraped from the web alongside licensed materials, Google user data, and synthetic data.
  • It was fine-tuned via reinforcement learning on datasets that covered multi-step reasoning, solving problems, and proving theorems.

Performance: Gemini 3.5 Flash performs just behind the first rank of multimodal models. It makes substantial gains over its predecessor in agentic capability and speed according to independent tests, including some state-of-the-art measures. On the Artificial Analysis Intelligence Index, it came in either fifth or seventh (depending on the reasoning levels of various models) behind Qwen 3.7 Max set to reasoning (level unspecified), but — except Qwen 3.7 Max, which debuted the same week — every model that scores higher on intelligence runs substantially slower.

  • According to Artificial Analysis, on MMMU-Pro, which measures visual reasoning across multiple academic disciplines, Gemini 3.5 Flash set to high reasoning achieved 84 percent accuracy, the highest recorded, with Gemini 3.1 Pro Preview (82 percent) in second place.
  • On APEX-Agents-AA, which tests long-running agentic tasks drawn from investment banking, management consulting, and corporate law, Gemini 3.5 Flash (47.1 percent accuracy) took the top spot on the first attempt, nearly 10 percentage points ahead of second-place GPT-5.5 (37.7 percent accuracy). On GDPval-AA (real-world agentic tasks), Gemini 3.5 Flash set to high reasoning (1,656 Elo) exceeded Gemini 3.1 Pro Preview set to an unspecified level of reasoning (1,314 Elo), and behind GPT-5.5 set to xhigh (1,769 Elo).
  • On ARC-AGI-2 (a test of abstract visual reasoning), Gemini 3.5 Flash set to high reasoning scored 72.1 percent on the ARC Prize leaderboard, behind Gemini 3.1 Pro Preview (77.1 percent) and GPT-5.5 set to xhigh reasoning (85.0 percent).
  • On AA-Omniscience, a knowledge benchmark that awards points for correct answers and penalizes hallucinated guesses, Gemini 3.5 Flash set to reasoning (23) trails Gemini 3.1 Pro Preview set to reasoning (33) and Claude Opus 4.7 set to max reasoning (26).
  • As of May 24, 2026, on Arena.ai’s leaderboards, which rank models in blind head-to-head human comparisons, Gemini 3.5 Flash ranked ninth in the Text Arena (1,480 Elo) and tenth in the WebDev coding arena (1,506 Elo). Anthropic’s Claude Opus 4.6 and 4.7 models occupy the top three positions in both arenas. Within the Text Arena’s category breakdowns, Gemini 3.5 Flash ranked first (1,521 Elo) in math but 31st in coding (1,507 Elo).

Behind the news: Google debuted Gemini 3.5 Flash at Google I/O 2026, its annual gathering for developers. Here are other AI-related announcements from that event:

  • Google overhauled Antigravity, its AI coding tool, to emphasize managing agents and de-emphasize its resemblance to popular IDEs like Microsoft’s VSCode. Antigravity’s command-line version replaces the open-source Gemini CLI.  
  • The company introduced Omni, a family of multimodal models, beginning with Omni Flash, a lightweight model that can generate video from text, image, audio, and video, or any combination of these inputs. Omni Flash is available to Google AI Plus, Pro and Ultra subscribers through the Gemini app and Google Flow, but not yet in the API. 
  • Gemini 3.5 Flash enables Google Search to permit more conversational, chatbot-like search queries, power agents to do online research on users’ behalf, and replace Search’s traditional ten top links with more AI-generated summaries that answer users’ questions with cited sources.  

Why it matters: Gemini 3.5 Flash changes what “Flash” means. Introduced as a smaller, faster model tier after Gemini Ultra, Pro, and Nano, for now, Flash is Google’s mid-tier multimodal model, more akin to Anthropic’s Sonnet than Haiku. The model’s speed may be worth the additional tokens it generates for developers who build agents that require multiple turns as well as low-latency applications like chatbots, search, and image and video analysis.

We’re thinking: Google said Gemini 3.5 Flash often runs at less than half the cost of competing models. But Artificial Analysis found that, running the tests in the Intelligence Index, it actually costs more than Gemini 3.1 Pro. The Flash designation no longer implies a clear cost advantage for developers who run agentic workloads. Anthropic, OpenAI, and Google have raised per-token prices on their newer flagship and Flash-tier models. Gemini 3.5 Flash fits the pattern.

Share

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox