Model Drop: Gemini 3.5 Flash

Google's step up is smarter and faster

May 19, 2026

Google DeepMind just shipp Gemini 3.5 Flash at Google I/O 2026, an agentic Flash-tier model. This is the first Flash release in the series that beats its predecessor’s Pro model on the benchmarks. Woof.

Model: Gemini 3.5 Flash (gemini-3.5-flash on the API, version string 3.5-flash-05-2026).

Model type: Multimodal input (text, image, video, audio, PDF), text output. No native image, audio, or video output.

Ship date: May 19, 2026

Maker: Google DeepMind

Pricing: $1.50 / $9.00 per million input / output tokens on the global tier of the Gemini API. $0.15 per million for cached input. Non-global regions priced at $1.65 / $9.90. Roughly 3x Gemini 3 Flash ($0.50 / $3.00), still ~40% cheaper than Gemini 3.1 Pro ($2.50 / $15) on both ends. Included in Google AI Plus, Pro, and Ultra subscriptions for consumer use.

Available on: Gemini app, AI Mode in Search, Google AI Studio, Google Antigravity, Android Studio, the Gemini API, Vertex AI, and Gemini Enterprise / Gemini Enterprise Agent Platform.

Headline benchmarks: Terminal-Bench 2.1 76.2% (Gemini 3.1 Pro: 70.3%, Opus 4.7: 69.4%, GPT-5.5: 82.7%). MCP Atlas 83.6% (3.1 Pro: 78.2%). Finance Agent v2 57.9% (3.1 Pro: 43.0%). CharXiv Reasoning 84.2%. MMMU-Pro 83.6% (per Artificial Analysis, the highest score recorded). GDPval-AA 1656 Elo (3 Flash: 1204). Artificial Analysis Intelligence Index 55 (up 9 from 3 Flash). Hallucination rate 61%, down 31 points from 3 Flash. Two regressions worth naming: Humanity’s Last Exam 40.2% (3.1 Pro: 44.4%) and ARC-AGI-2 72.1% (3.1 Pro: 77.1%).

Other info: 1,048,576 input token context window, 65,536 output token cap. Knowledge cutoff January 2026. Output speed ~289 tokens/sec (Pichai’s keynote number, ~4x other frontier models). Tool use includes function calling, structured output, search integration, and code execution. The 3.5 Pro variant is in internal use and “rolling out next month” per Pichai.

More details: Introducing Gemini 3.5 (Google blog)

What shipped

Google DeepMind opened Google I/O 2026 with Gemini 3.5 Flash and pitched it as the agentic model the rest of the lineup orchestrates. The framing is unusual for a Flash release. It’s a cheaper, faster sibling that outperforms its predecessor’s Pro tier on agentic and coding benchmarks while costing less than half. The model is generally available today across Google’s (way too robust) product line, and the Enterprise Agent Platform. Tulsee Doshi, the senior director who fronted the press call, sketched the two-model strategy plainly: “3.5 Pro becomes your orchestrator, your planner, and then it actually can leverage Flash to be the various sub-agents.”

Pro is coming next month.

What’s new

Flash that beats the previous Pro. Across the benchmarks Google’s marketing leans on (Terminal-Bench 2.1, MCP Atlas, Finance Agent v2, GDPval-AA, OSWorld-Verified), 3.5 Flash posts higher numbers than the Gemini 3.1 Pro it’s deprecating. That hasn’t happened in the Gemini family before.
Built for sub-agents. Doshi’s “Pro orchestrates, Flash executes” framing is reflected in the model itself. The Finance Agent v2 score and the 1656 GDPval-AA Elo are gains on long-running, multi-turn evaluations where the model is acting, not answering.
Multimodal in across every modality Gemini ships. Text, image, video, audio, and PDF all in. Output stays text-only. This isn’t new to the Gemini family but it’s now standard at the Flash tier, which removes a reason to step up to Pro for routing work.
Hallucination drop. Artificial Analysis puts hallucination rate at 61%, down 31 points from 3 Flash.
Two regressions on reasoning. HLE drops 4.2 points, ARC-AGI-2 drops 5.0 points versus 3.1 Pro. The Flash specialization is paid for somewhere, and it’s paid for in the kinds of long-form abstract reasoning Pro tiers are usually graded on.

How and where to use it

Where it runs, what it’s actually good for, and where you’ll burn cycles regretting it.

Where it’s available

Gemini app and AI Mode in Search for consumers across Google AI Plus, Pro, and Ultra tiers.
Google AI Studio and the Gemini API for developers.
Google Antigravity for Google’s agentic dev workflows.
Android Studio for on-IDE coding.
Vertex AI and Gemini Enterprise for enterprise routing. Pricing is uniform on the global tier; non-global regions cost 10% more.

What it’s good at

Long-horizon agent workloads where you need a cheap, fast executor underneath a Pro-tier planner.
Multimodal ingestion (image, video, audio, PDF) means it can handles the document-extraction and screen-reading parts of a workflow.
289 tokens/sec on a coding loop is a quality-of-life upgrade.

What it’s bad at / shouldn’t be used for

Anything that grades on abstract reasoning where the regressions show up (HLE drop of 4.2 points, ARC-AGI-2 drop of 5.0 points versus 3.1 Pro).
Math-heavy or proof-style work, where GPT-5.5 at FrontierMath Tier 4 35.4% is still the model to reach for.
Cost-sensitive bulk workloads where Kimi K2.6 at $0.60 / $2.50 will undercut you by 60%+ before you’ve shipped your first eval.
Workloads where the lack of a published system card is a regulatory dealbreaker.
Anything where you were already paying for 3.1 Pro and assumed Flash was a downgrade.

First impressions

The positives

Artificial Analysis ran 3.5 Flash through their full benchmark suite within hours of launch and landed on the cleanest framing of the speed story:

“Gemini 3.5 Flash achieves speeds of over 280 output tokens per second, ~70% faster than Gemini 3 Flash. It scores 55 on the Artificial Analysis Intelligence Index, up 9 points from Gemini 3 Flash, driven primarily by agentic performance gains and hallucination reduction.”

The 9-point Intelligence Index jump is the largest Google has posted on a Flash release, and the Pareto chart they ship alongside their writeups now has 3.5 Flash sitting north and west of every other Flash-tier model on the market.

Sundar Pichai framed 3.5 Flash on the I/O keynote stage as a category move rather than a SKU bump:

“Our first in a series of models combining frontier intelligence with action. A very capable model, at the frontier and comparable to the best models, but it’s still very fast. Four times faster than other frontier models on output tokens per second.”

CEOs say things like this on every Flash launch and it’s usually marketing, but the benchmark deltas back this one up. The Terminal-Bench 2.1, MCP Atlas, Finance Agent v2, and GDPval-AA wins versus 3.1 Pro aren’t cherry-picked, they’re the four benchmarks Google’s enterprise customers actually grade agents on.

VentureBeat’s read on the enterprise framing landed on the line every procurement spreadsheet is going to remember:

“Google says Gemini 3.5 Flash can slash enterprise AI costs by more than $1 billion a year.”

The negatives

Artificial Analysis buried the most damaging line in the back half of their writeup:

“Gemini 3.5 Flash is over 5x more costly to run the Intelligence Index than Gemini 3 Flash, and 75% more costly than Gemini 3.1 Pro.”

Token price tripled and per-task input token usage jumped because agentic evals burn more turns. Net effect on real workloads: a 5.5x cost increase on the full Intelligence Index suite for a 9-point quality lift. That’s a worse ratio than Gemini has ever shipped at the Flash tier, and it puts the model in direct head-to-head with Kimi K2.6 at $0.60 / $2.50 on the workloads where price-per-capability is the deciding factor.

The TechCrunch coverage is the only mainstream launch-day piece that named the safety overhang:

“Google is facing scrutiny around AI safety, including a lawsuit after a man nearly committed a mass casualty event and died by suicide following weeks of chatting with Gemini last year.”

Shipping a more capable agent without a system card alongside an active wrongful-death lawsuit is a choice. Google didn’t publish a 3.5 Flash safety card, didn’t disclose Preparedness Framework classifications, and didn’t commit to a date for publishing one.

Jake’s take

3.5 Flash is the first Flash-tier model I’d actually consider using. The MCP Atlas and Finance Agent v2 gains are the ones I care about. If 3.5 Pro lands in June at the orchestrator role Doshi described and the two of them actually clip together cleanly, then I would consider trying it out (while I usually skip Google releases altogether).

However, the pricing is genuinely bad relative to where Flash tier was last quarter and Artificial Analysis isn’t wrong that the 5.5x cost increase to run a full eval suite is the worst Pareto shift Google has shipped in this family. Pair that with no system card, an active Gemini-related wrongful-death lawsuit, and two reasoning regressions Google didn’t lead with, and you’ve got a fairly dishonest launch. The deeper problem is the naming. When a Flash model beats its predecessor’s Pro, “Flash” stops meaning “the cheaper one” and starts meaning “the agentic one,” and Google hasn’t earned the goodwill yet for buyers to figure that out on their own.

Discussion about this post

Ready for more?