Gemini 3.5 Flash Research Brief

Bottom Line

Google released Gemini 3.5 Flash on May 19, 2026 at Google I/O, positioning it as their strongest "Flash" model yet - claiming frontier-level intelligence at high speed for agentic workloads. The benchmarks are genuinely impressive (beating Gemini 3.1 Pro in several coding/agentic tasks), but the pricing is the biggest story. At $1.50/$9.00 per 1M tokens, it costs 3x more than the previous Flash and nearly as much as Pro-tier models. The community is sharply divided - some call it a game-changer, others call it a "Flash in name only."


Key Findings

  • Outperforms Gemini 3.1 Pro on key benchmarks (Terminal-Bench 2.1: 76.2%, GDPval-AA: 1656 Elo) despite being a "Flash" model
  • Priced at $1.50/$9.00 per 1M input/output tokens - a 3x jump from 3.0 Flash ($0.50/$3.00) and close to Gemini 2.5 Pro ($1.25/$10.00)
  • 4x faster output than other frontier models according to Google, hitting ~218 tok/s in practice
  • 1M token context window, supports thinking, code execution, function calling, search grounding, structured outputs
  • Knowledge cutoff: January 2025 - relatively current
  • Now the default model in Gemini app and AI Mode in Google Search globally
  • Gemini 3.5 Pro is coming "next month" with even higher capabilities

Background

Gemini 3.5 Flash is the latest in Google's Gemini model family, announced at Google I/O 2026 alongside Gemini Omni (a physical-world simulation model), Gemini Spark (a personal AI agent), and a new agent-first development platform called Google Antigravity.

The model lineage: Gemini 2.5 Flash ($0.30/$2.50) → Gemini 3.0 Flash ($0.50/$3.00) → Gemini 3.1 Flash-Lite ($0.25/$1.50) → Gemini 3.5 Flash ($1.50/$9.00). Notably, there is no "3.1 Flash" non-lite - Google skipped straight to 3.5, bundling a significant capability jump with a significant price jump.

The model is authored by Koray Kavukcuoglu, Jeff Dean, Oriol Vinyals, and Noam Shazeer - Google DeepMind's core team.


Technical Details

Model code: gemini-3.5-flash (stable, not preview)

Inputs: Text, Image, Video, Audio, PDF Output: Wording only

Property Value
Input token limit 1,048,576
Output token limit 65,536
Knowledge cutoff January 2025
Thinking Supported
Code execution Supported
Function calling Supported
Search grounding Supported
Google Maps grounding Supported
Structured outputs Supported
URL context Supported
Batch API Supported
Context caching Supported
Flex/Priority inference Supported
Image generation Not supported
Audio generation Not supported
Live API Not supported
Computer use Not supported

Benchmarks

Google's published benchmarks for 3.5 Flash:

Benchmark Score
Terminal-Bench 2.1 76.2%
GDPval-AA 1656 Elo
MCP Atlas 83.6%
CharXiv Reasoning (multimodal) 84.2%

The model reportedly beats Sonnet 4.6 across disciplines in Google's published benchmarks. But independent testing tells a more nuanced story (see Limitations below).


Pricing

3.5 Flash vs the Gemini lineup

Model Input/1M Output/1M Tier
3.1 Flash-Lite $0.25 $1.50 Budget
2.5 Flash $0.30 $2.50 Mid
3.0 Flash $0.50 $3.00 Mid
3.5 Flash $1.50 $9.00 Upper
2.5 Pro $1.25 $10.00 Premium
3.1 Pro $2.00 $12.00 Premium

Batch pricing for 3.5 Flash: $0.75 input / $4.50 output (50% off)

The 3x price jump from 3.0 Flash to 3.5 Flash is unprecedented in the Gemini Flash tier. Output pricing at $9/1M tokens is only $3 less than 3.1 Pro's $12.

Cost vs competitors

  • 3.5x cheaper than GPT-5.2 for input tokens
  • 4.6x cheaper for output tokens vs GPT-5.2
  • 10x cheaper than Claude Opus 4.5 on input, 8x on output

Enterprise Adoption

Google lists impressive early adopters already using 3.5 Flash in production:

  • Shopify - parallel subagents analyzing complex data for merchant growth forecasts at global scale
  • Macquarie Bank - accelerating customer onboarding by reasoning over 100+ page documents
  • Salesforce - integrating into Agentforce for complex multi-turn tool calling with multiple subagents
  • Ramp - smarter OCR via multimodal understanding of invoices + reasoning over historical patterns
  • Xero - autonomous multi-week workflows (e.G., identifying suppliers for 1099 tax forms)
  • Databricks - real-time monitoring, reasoning across massive datasets to diagnose and propose fixes

Additional partners shown on the DeepMind page: JetBrains, Figma, Replit, Cursor, Warp, Harvey, Astrocade, Presentations.Ai, Latitude, Box, Workday, Salesforce, Geotab, Resemble AI.


The Big Controversy: Price vs. Total Cost

The Hacker News community (103 comments) and Reddit raised a critical point: Artificial Analysis benchmark testing showed 3.5 Flash costs MORE in total than 3.1 Pro to complete the same evaluation suite ($1,552 vs $892) while scoring slightly lower in intelligence (55 vs 57).

Metric Gemini 3.1 Pro Gemini 3.5 Flash
Intelligence Score 57 55
Total Benchmark Cost $892 $1,552
Input Token Price $2/1M $1.50/1M
Output Token Price $12/1M $9/1M

In practice, 3.5 Flash uses significantly more tokens per task - likely due to longer reasoning chains and thinking tokens being counted in the output price. Lower per-token pricing does not translate to lower total cost.


Community Sentiment

Bull case

  • Frontier intelligence at ~half the cost of Claude Opus / GPT-5.2
  • Perfect for high-volume agentic loops and multi-step workflows
  • 4x faster than other frontier models
  • Now the default for 650M+ Gemini users - "Pro-level reasoning" becomes the new baseline
  • Arena.Ai: "Gemini 3.5 Flash's pricing shifts the Pareto frontier in Text. 8 models from Google DeepMind dominate the Wording Arena Pareto curve."

Bear case

  • "Flash family but costs like a Pro" - $9 vs $12 output compared to Pro
  • Total cost can exceed Pro models due to higher token consumption
  • Google API reliability described as "flaky" compared to OpenAI/Anthropic
  • Hallucination rate: 91% in one test - 3 percentage points higher than 2.5 Flash
  • Confusing naming/versioning (3.1 Pro exists but 3.1 Flash non-lite doesn't)
  • Local models like Qwen 3.6 becoming viable alternatives

Limitations and Critiques

  1. No image/audio generation - unlike some competitors at this price point, output is text-only
  2. No Live API or Computer Use - limits real-time and agentic UI use cases
  3. Knowledge cutoff January 2025 - over a year old at release
  4. Hallucination rate remains high (91% in independent testing); "more accuracy, but when it's wrong, it's confidently wrong"
  5. Date confusion bugs - model sometimes insists it's 2024
  6. Token usage variability - complex reasoning tasks can more than double token usage vs. Simpler queries
  7. Google API reliability - cache misses, flaky responses compared to competitors
  8. The "Flash" label is misleading at this price point - it's a premium-tier model in Flash branding

Open Questions

  • Will Google introduce a cheaper 3.5 Flash-Lite to fill the gap left by the price jump?
  • How will 3.5 Pro (coming next month) differentiate when 3.5 Flash already costs nearly as much as Pro?
  • Can Google fix its API reliability issues as it scales to trillion-token daily volumes?
  • Does the thinking-token consumption make 3.5 Flash actually cost-competitive vs competitors in production?
  • Is the intelligence score gap vs 3.1 Pro (55 vs 57 on Artificial Analysis) meaningful or within noise?

Practical Takeaways

  • For high-volume agentic tasks: 3.5 Flash is a strong contender if you need speed + frontier intelligence and can absorb the higher per-task token costs
  • For cost-sensitive workloads: Stick with 3.0 Flash ($0.50/$3.00) or 3.1 Flash-Lite ($0.25/$1.50) - they still handle 80% of tasks well
  • For production reliability: Test Google's API thoroughly before committing - community reports suggest it's less stable than OpenAI/Anthropic
  • For critical outputs: Always verify - the 91% hallucination rate means human review is non-negotiable
  • Route by task complexity: Use Flash for volume, premium models (Claude, GPT-5.X) for critical reasoning

Sources