Gemini 3.5 Flash Research Brief
Bottom Line
Google released Gemini 3.5 Flash on May 19, 2026 at Google I/O, positioning it as their strongest "Flash" model yet - claiming frontier-level intelligence at high speed for agentic workloads. The benchmarks are genuinely impressive (beating Gemini 3.1 Pro in several coding/agentic tasks), but the pricing is the biggest story. At $1.50/$9.00 per 1M tokens, it costs 3x more than the previous Flash and nearly as much as Pro-tier models. The community is sharply divided - some call it a game-changer, others call it a "Flash in name only."
Key Findings
- Outperforms Gemini 3.1 Pro on key benchmarks (Terminal-Bench 2.1: 76.2%, GDPval-AA: 1656 Elo) despite being a "Flash" model
- Priced at $1.50/$9.00 per 1M input/output tokens - a 3x jump from 3.0 Flash ($0.50/$3.00) and close to Gemini 2.5 Pro ($1.25/$10.00)
- 4x faster output than other frontier models according to Google, hitting ~218 tok/s in practice
- 1M token context window, supports thinking, code execution, function calling, search grounding, structured outputs
- Knowledge cutoff: January 2025 - relatively current
- Now the default model in Gemini app and AI Mode in Google Search globally
- Gemini 3.5 Pro is coming "next month" with even higher capabilities
Background
Gemini 3.5 Flash is the latest in Google's Gemini model family, announced at Google I/O 2026 alongside Gemini Omni (a physical-world simulation model), Gemini Spark (a personal AI agent), and a new agent-first development platform called Google Antigravity.
The model lineage: Gemini 2.5 Flash ($0.30/$2.50) → Gemini 3.0 Flash ($0.50/$3.00) → Gemini 3.1 Flash-Lite ($0.25/$1.50) → Gemini 3.5 Flash ($1.50/$9.00). Notably, there is no "3.1 Flash" non-lite - Google skipped straight to 3.5, bundling a significant capability jump with a significant price jump.
The model is authored by Koray Kavukcuoglu, Jeff Dean, Oriol Vinyals, and Noam Shazeer - Google DeepMind's core team.
Technical Details
Model code: gemini-3.5-flash (stable, not preview)
Inputs: Text, Image, Video, Audio, PDF Output: Wording only
| Property | Value |
|---|---|
| Input token limit | 1,048,576 |
| Output token limit | 65,536 |
| Knowledge cutoff | January 2025 |
| Thinking | Supported |
| Code execution | Supported |
| Function calling | Supported |
| Search grounding | Supported |
| Google Maps grounding | Supported |
| Structured outputs | Supported |
| URL context | Supported |
| Batch API | Supported |
| Context caching | Supported |
| Flex/Priority inference | Supported |
| Image generation | Not supported |
| Audio generation | Not supported |
| Live API | Not supported |
| Computer use | Not supported |
Benchmarks
Google's published benchmarks for 3.5 Flash:
| Benchmark | Score |
|---|---|
| Terminal-Bench 2.1 | 76.2% |
| GDPval-AA | 1656 Elo |
| MCP Atlas | 83.6% |
| CharXiv Reasoning (multimodal) | 84.2% |
The model reportedly beats Sonnet 4.6 across disciplines in Google's published benchmarks. But independent testing tells a more nuanced story (see Limitations below).
Pricing
3.5 Flash vs the Gemini lineup
| Model | Input/1M | Output/1M | Tier |
|---|---|---|---|
| 3.1 Flash-Lite | $0.25 | $1.50 | Budget |
| 2.5 Flash | $0.30 | $2.50 | Mid |
| 3.0 Flash | $0.50 | $3.00 | Mid |
| 3.5 Flash | $1.50 | $9.00 | Upper |
| 2.5 Pro | $1.25 | $10.00 | Premium |
| 3.1 Pro | $2.00 | $12.00 | Premium |
Batch pricing for 3.5 Flash: $0.75 input / $4.50 output (50% off)
The 3x price jump from 3.0 Flash to 3.5 Flash is unprecedented in the Gemini Flash tier. Output pricing at $9/1M tokens is only $3 less than 3.1 Pro's $12.
Cost vs competitors
- 3.5x cheaper than GPT-5.2 for input tokens
- 4.6x cheaper for output tokens vs GPT-5.2
- 10x cheaper than Claude Opus 4.5 on input, 8x on output
Enterprise Adoption
Google lists impressive early adopters already using 3.5 Flash in production:
- Shopify - parallel subagents analyzing complex data for merchant growth forecasts at global scale
- Macquarie Bank - accelerating customer onboarding by reasoning over 100+ page documents
- Salesforce - integrating into Agentforce for complex multi-turn tool calling with multiple subagents
- Ramp - smarter OCR via multimodal understanding of invoices + reasoning over historical patterns
- Xero - autonomous multi-week workflows (e.G., identifying suppliers for 1099 tax forms)
- Databricks - real-time monitoring, reasoning across massive datasets to diagnose and propose fixes
Additional partners shown on the DeepMind page: JetBrains, Figma, Replit, Cursor, Warp, Harvey, Astrocade, Presentations.Ai, Latitude, Box, Workday, Salesforce, Geotab, Resemble AI.
The Big Controversy: Price vs. Total Cost
The Hacker News community (103 comments) and Reddit raised a critical point: Artificial Analysis benchmark testing showed 3.5 Flash costs MORE in total than 3.1 Pro to complete the same evaluation suite ($1,552 vs $892) while scoring slightly lower in intelligence (55 vs 57).
| Metric | Gemini 3.1 Pro | Gemini 3.5 Flash |
|---|---|---|
| Intelligence Score | 57 | 55 |
| Total Benchmark Cost | $892 | $1,552 |
| Input Token Price | $2/1M | $1.50/1M |
| Output Token Price | $12/1M | $9/1M |
In practice, 3.5 Flash uses significantly more tokens per task - likely due to longer reasoning chains and thinking tokens being counted in the output price. Lower per-token pricing does not translate to lower total cost.
Community Sentiment
Bull case
- Frontier intelligence at ~half the cost of Claude Opus / GPT-5.2
- Perfect for high-volume agentic loops and multi-step workflows
- 4x faster than other frontier models
- Now the default for 650M+ Gemini users - "Pro-level reasoning" becomes the new baseline
- Arena.Ai: "Gemini 3.5 Flash's pricing shifts the Pareto frontier in Text. 8 models from Google DeepMind dominate the Wording Arena Pareto curve."
Bear case
- "Flash family but costs like a Pro" - $9 vs $12 output compared to Pro
- Total cost can exceed Pro models due to higher token consumption
- Google API reliability described as "flaky" compared to OpenAI/Anthropic
- Hallucination rate: 91% in one test - 3 percentage points higher than 2.5 Flash
- Confusing naming/versioning (3.1 Pro exists but 3.1 Flash non-lite doesn't)
- Local models like Qwen 3.6 becoming viable alternatives
Limitations and Critiques
- No image/audio generation - unlike some competitors at this price point, output is text-only
- No Live API or Computer Use - limits real-time and agentic UI use cases
- Knowledge cutoff January 2025 - over a year old at release
- Hallucination rate remains high (91% in independent testing); "more accuracy, but when it's wrong, it's confidently wrong"
- Date confusion bugs - model sometimes insists it's 2024
- Token usage variability - complex reasoning tasks can more than double token usage vs. Simpler queries
- Google API reliability - cache misses, flaky responses compared to competitors
- The "Flash" label is misleading at this price point - it's a premium-tier model in Flash branding
Open Questions
- Will Google introduce a cheaper 3.5 Flash-Lite to fill the gap left by the price jump?
- How will 3.5 Pro (coming next month) differentiate when 3.5 Flash already costs nearly as much as Pro?
- Can Google fix its API reliability issues as it scales to trillion-token daily volumes?
- Does the thinking-token consumption make 3.5 Flash actually cost-competitive vs competitors in production?
- Is the intelligence score gap vs 3.1 Pro (55 vs 57 on Artificial Analysis) meaningful or within noise?
Practical Takeaways
- For high-volume agentic tasks: 3.5 Flash is a strong contender if you need speed + frontier intelligence and can absorb the higher per-task token costs
- For cost-sensitive workloads: Stick with 3.0 Flash ($0.50/$3.00) or 3.1 Flash-Lite ($0.25/$1.50) - they still handle 80% of tasks well
- For production reliability: Test Google's API thoroughly before committing - community reports suggest it's less stable than OpenAI/Anthropic
- For critical outputs: Always verify - the 91% hallucination rate means human review is non-negotiable
- Route by task complexity: Use Flash for volume, premium models (Claude, GPT-5.X) for critical reasoning
Sources
- Google Blog: Gemini 3.5 announcement
- Google AI Dev: Gemini 3.5 Flash docs
- Google: API Pricing
- Google DeepMind: Gemini Flash model page
- Hacker News discussion - 65 points, 103 comments
- Reddit r/singularity: Gemini 3.5 Flash looks worse than it seems on Artificial Analysis
- Thomas Wiegold: Gemini 3 Flash review
- CNBC: Google unveils AI model Gemini 3.5