Gemini 3.5 Flash Research Brief

2026-05-20

ai google gemini llm deepmind benchmarks pricing agentic-ai

Bottom Line

Google released Gemini 3.5 Flash on May 19, 2026 at Google I/O, positioning it as their strongest "Flash" model yet - claiming frontier-level intelligence at high speed for agentic workloads. The benchmarks are genuinely impressive (beating Gemini 3.1 Pro in several coding/agentic tasks), but the pricing is the biggest story. At $1.50/$9.00 per 1M tokens, it costs 3x more than the previous Flash and nearly as much as Pro-tier models. The community is sharply divided - some call it a game-changer, others call it a "Flash in name only."

Key Findings

Outperforms Gemini 3.1 Pro on key benchmarks (Terminal-Bench 2.1: 76.2%, GDPval-AA: 1656 Elo) despite being a "Flash" model
Priced at $1.50/$9.00 per 1M input/output tokens - a 3x jump from 3.0 Flash ($0.50/$3.00) and close to Gemini 2.5 Pro ($1.25/$10.00)
4x faster output than other frontier models according to Google, hitting ~218 tok/s in practice
1M token context window, supports thinking, code execution, function calling, search grounding, structured outputs
Knowledge cutoff: January 2025 - relatively current
Now the default model in Gemini app and AI Mode in Google Search globally
Gemini 3.5 Pro is coming "next month" with even higher capabilities

Background

Gemini 3.5 Flash is the latest in Google's Gemini model family, announced at Google I/O 2026 alongside Gemini Omni (a physical-world simulation model), Gemini Spark (a personal AI agent), and a new agent-first development platform called Google Antigravity.

The model lineage: Gemini 2.5 Flash ($0.30/$2.50) → Gemini 3.0 Flash ($0.50/$3.00) → Gemini 3.1 Flash-Lite ($0.25/$1.50) → Gemini 3.5 Flash ($1.50/$9.00). Notably, there is no "3.1 Flash" non-lite - Google skipped straight to 3.5, bundling a significant capability jump with a significant price jump.

The model is authored by Koray Kavukcuoglu, Jeff Dean, Oriol Vinyals, and Noam Shazeer - Google DeepMind's core team.

Technical Details

Model code: gemini-3.5-flash (stable, not preview)

Inputs: Text, Image, Video, Audio, PDF Output: Wording only

Property	Value
Input token limit	1,048,576
Output token limit	65,536
Knowledge cutoff	January 2025
Thinking	Supported
Code execution	Supported
Function calling	Supported
Search grounding	Supported
Google Maps grounding	Supported
Structured outputs	Supported
URL context	Supported
Batch API	Supported
Context caching	Supported
Flex/Priority inference	Supported
Image generation	Not supported
Audio generation	Not supported
Live API	Not supported
Computer use	Not supported

Benchmarks

Google's published benchmarks for 3.5 Flash:

Benchmark	Score
Terminal-Bench 2.1	76.2%
GDPval-AA	1656 Elo
MCP Atlas	83.6%
CharXiv Reasoning (multimodal)	84.2%

The model reportedly beats Sonnet 4.6 across disciplines in Google's published benchmarks. But independent testing tells a more nuanced story (see Limitations below).

Pricing

3.5 Flash vs the Gemini lineup

Model	Input/1M	Output/1M	Tier
3.1 Flash-Lite	$0.25	$1.50	Budget
2.5 Flash	$0.30	$2.50	Mid
3.0 Flash	$0.50	$3.00	Mid
3.5 Flash	$1.50	$9.00	Upper
2.5 Pro	$1.25	$10.00	Premium
3.1 Pro	$2.00	$12.00	Premium

Batch pricing for 3.5 Flash: $0.75 input / $4.50 output (50% off)

The 3x price jump from 3.0 Flash to 3.5 Flash is unprecedented in the Gemini Flash tier. Output pricing at $9/1M tokens is only $3 less than 3.1 Pro's $12.

Cost vs competitors

3.5x cheaper than GPT-5.2 for input tokens
4.6x cheaper for output tokens vs GPT-5.2
10x cheaper than Claude Opus 4.5 on input, 8x on output

Enterprise Adoption

Google lists impressive early adopters already using 3.5 Flash in production:

Shopify - parallel subagents analyzing complex data for merchant growth forecasts at global scale
Macquarie Bank - accelerating customer onboarding by reasoning over 100+ page documents
Salesforce - integrating into Agentforce for complex multi-turn tool calling with multiple subagents
Ramp - smarter OCR via multimodal understanding of invoices + reasoning over historical patterns
Xero - autonomous multi-week workflows (e.G., identifying suppliers for 1099 tax forms)
Databricks - real-time monitoring, reasoning across massive datasets to diagnose and propose fixes

Additional partners shown on the DeepMind page: JetBrains, Figma, Replit, Cursor, Warp, Harvey, Astrocade, Presentations.Ai, Latitude, Box, Workday, Salesforce, Geotab, Resemble AI.

The Big Controversy: Price vs. Total Cost

The Hacker News community (103 comments) and Reddit raised a critical point: Artificial Analysis benchmark testing showed 3.5 Flash costs MORE in total than 3.1 Pro to complete the same evaluation suite ($1,552 vs $892) while scoring slightly lower in intelligence (55 vs 57).

Metric	Gemini 3.1 Pro	Gemini 3.5 Flash
Intelligence Score	57	55
Total Benchmark Cost	$892	$1,552
Input Token Price	$2/1M	$1.50/1M
Output Token Price	$12/1M	$9/1M

In practice, 3.5 Flash uses significantly more tokens per task - likely due to longer reasoning chains and thinking tokens being counted in the output price. Lower per-token pricing does not translate to lower total cost.

Community Sentiment

Bull case

Frontier intelligence at ~half the cost of Claude Opus / GPT-5.2
Perfect for high-volume agentic loops and multi-step workflows
4x faster than other frontier models
Now the default for 650M+ Gemini users - "Pro-level reasoning" becomes the new baseline
Arena.Ai: "Gemini 3.5 Flash's pricing shifts the Pareto frontier in Text. 8 models from Google DeepMind dominate the Wording Arena Pareto curve."

Bear case

"Flash family but costs like a Pro" - $9 vs $12 output compared to Pro
Total cost can exceed Pro models due to higher token consumption
Google API reliability described as "flaky" compared to OpenAI/Anthropic
Hallucination rate: 91% in one test - 3 percentage points higher than 2.5 Flash
Confusing naming/versioning (3.1 Pro exists but 3.1 Flash non-lite doesn't)
Local models like Qwen 3.6 becoming viable alternatives

Limitations and Critiques

No image/audio generation - unlike some competitors at this price point, output is text-only
No Live API or Computer Use - limits real-time and agentic UI use cases
Knowledge cutoff January 2025 - over a year old at release
Hallucination rate remains high (91% in independent testing); "more accuracy, but when it's wrong, it's confidently wrong"
Date confusion bugs - model sometimes insists it's 2024
Token usage variability - complex reasoning tasks can more than double token usage vs. Simpler queries
Google API reliability - cache misses, flaky responses compared to competitors
The "Flash" label is misleading at this price point - it's a premium-tier model in Flash branding

Open Questions

Will Google introduce a cheaper 3.5 Flash-Lite to fill the gap left by the price jump?
How will 3.5 Pro (coming next month) differentiate when 3.5 Flash already costs nearly as much as Pro?
Can Google fix its API reliability issues as it scales to trillion-token daily volumes?
Does the thinking-token consumption make 3.5 Flash actually cost-competitive vs competitors in production?
Is the intelligence score gap vs 3.1 Pro (55 vs 57 on Artificial Analysis) meaningful or within noise?

Practical Takeaways

For high-volume agentic tasks: 3.5 Flash is a strong contender if you need speed + frontier intelligence and can absorb the higher per-task token costs
For cost-sensitive workloads: Stick with 3.0 Flash ($0.50/$3.00) or 3.1 Flash-Lite ($0.25/$1.50) - they still handle 80% of tasks well
For production reliability: Test Google's API thoroughly before committing - community reports suggest it's less stable than OpenAI/Anthropic
For critical outputs: Always verify - the 91% hallucination rate means human review is non-negotiable
Route by task complexity: Use Flash for volume, premium models (Claude, GPT-5.X) for critical reasoning

Sources

Google Blog: Gemini 3.5 announcement
Google AI Dev: Gemini 3.5 Flash docs
Google: API Pricing
Google DeepMind: Gemini Flash model page
Hacker News discussion - 65 points, 103 comments
Reddit r/singularity: Gemini 3.5 Flash looks worse than it seems on Artificial Analysis
Thomas Wiegold: Gemini 3 Flash review
CNBC: Google unveils AI model Gemini 3.5