New Gemini 3.5 Flash features explained and how to use them

13 Jun 2026 18:37 8,656 views

Google delayed Gemini 3.5 Pro, but quietly shipped Gemini 3.5 Flash as the new default model. Here’s what Flash can actually do, how it compares on price and performance, and practical workflows you can start using today in the Gemini app and AI Studio.

Google pushed back the launch of Gemini 3.5 Pro, but in the meantime it quietly shipped something that might matter even more: Gemini 3.5 Flash. Flash is now the default model powering AI mode in Google Search and much of the Gemini experience, and it’s priced in a completely different bracket from other top models.

This guide walks through what Gemini 3.5 Flash can actually do, how the new “thinking levels” work, where it’s strong and where it regresses, and practical workflows you can start using today in the Gemini app and Google AI Studio.

What Gemini 3.5 Flash is and why it matters

Gemini 3.5 Flash is Google’s new fast, low-cost model that still delivers what feels like “Pro tier” performance for many real-world tasks. It ships with:

• A 1 million token context window (how much it can read at once)
• A 64,000 token output limit (how much it can respond with in one go)
• Multimodal support across text, images, audio, and video
• New controllable “thinking levels” that change how deeply it reasons

Most importantly, Flash is cheap compared to other frontier models. For API use, pricing is:

• $1.50 per million input tokens
• $9 per million output tokens

For comparison:

• Claude Opus 4.7: $5 input / $25 output per million tokens
• GPT 5.5: $5 input / $30 output per million tokens

On most production workloads, output tokens are where the real cost is. That $9 vs $25–30 gap adds up very quickly at scale.

Using Gemini Flash in the mobile app

Flash is already live inside the Gemini app, and a lot of its value shows up in everyday workflows. Here are some of the most useful ones.

1. Multimodal vision: turning fridge photos into recipes

Flash’s vision capabilities are noticeably better than earlier Gemini Pro models. For example, you can:

1. Open the Gemini app and upload a photo of your fridge or pantry.
2. Ask: “Look at everything in this fridge and give me a step-by-step dinner recipe using what’s here. Then list anything I’d need to grab from the store to complete it.”

Flash can correctly identify partially hidden items (like jars overlapping at the back of the fridge), avoid hallucinating ingredients that aren’t there, and generate:

• A full, step-by-step recipe
• A clean shopping list with only the missing items

Compared to Gemini 2.5 Pro, the output is faster and cleaner on this kind of multimodal task.

2. Native video understanding with timestamps and charts

Flash can analyze long video files directly in the Gemini app without external tools or manual transcription. A practical workflow:

1. Drag a long video file into a Gemini chat.
2. Prompt: “Give me the top five insights from this video with exact timestamps. Then find the data table shown around the 23-minute mark and recreate it as a Python chart.”

Gemini shows an “analyzing video” block you can expand to see what it’s doing. The model then returns:

• Timestamped insights that are accurate to within roughly 20 seconds
• Python code that pulls the data from the specified section
• A rendered chart directly in the chat window

The 64K output window means the code, chart, and analysis can all arrive in one continuous response without being cut off.

If you regularly process meeting recordings, webinars, or course videos, this workflow is worth testing immediately.

3. Turning messy voice notes into structured task lists

Flash is also strong at cleaning up unstructured audio. For example:

1. Record a 2-minute voice memo on your phone with all your scattered Monday-morning thoughts—projects, follow-ups, ideas.
2. Drop the audio file into the Gemini app.
3. Ask for a “clean task list with priorities.”

Flash will transcribe the memo, group related items, and rank them by urgency in seconds. If you usually start your day in a notes app, this can become a faster, more natural way to capture and structure your plan.

4. Explaining and fixing error messages from screenshots

Another small but high-impact use case: debugging cryptic errors.

1. Take a screenshot of an error message on your phone or computer.
2. Upload it to Gemini.
3. Ask it to explain what happened and what to do next.

Flash can read the screenshot, identify the root cause in plain language, and walk you through a step-by-step fix—without jargon or digging through forums. The same approach works for:

• App error dialogs
• Browser console errors
• Terminal and code errors

If you’ve been copy-pasting error messages into search for years, this one habit can save real time.

Long document analysis in AI Studio

In Google AI Studio, Flash’s large context window and thinking levels become especially useful for serious reading and analysis.

Analyzing contracts and long PDFs

Flash can comfortably ingest long PDFs like 40–50 page B2B contracts in a single shot. A practical prompt:

“Act as a contract lawyer. Read this entire agreement and flag every hidden fee, automatic renewal clause, and penalty that a client could miss on a first read.”

Because the full document fits into the context window, you don’t have to manually chunk it or worry about losing coherence across sections. Flash can cross-reference clauses across the entire file.

Using thinking levels for higher-stakes answers

AI Studio lets you set a “thinking level” from low to high. This changes how much internal reasoning the model does before responding:

• Low: Faster, cheaper, decent for simple tasks and summaries.
• High: Slower, more expensive, but does deeper step-by-step reasoning.

When you switch a contract analysis task from low to high thinking, AI Studio shows a “chain of thought” block where you can watch the model work through clauses and cross-references before producing the final answer.

In testing, the high thinking level caught extra penalty clauses and an auto-renewal trigger that the low setting missed. For contracts, legal documents, and financial filings, the rule of thumb is:

• Use high thinking when the cost of a wrong answer is real.
• Use low thinking for quick summaries, drafts, and low-risk tasks.

Note: when many users were migrated from Gemini 2.5 Pro to 3.5 Flash, the default thinking level in the Gemini app silently dropped from high to medium. If your outputs suddenly feel slightly worse, go into settings and manually set the thinking level back to high.

Vibe coding and live previews in AI Studio

Flash’s 64K output limit shines when generating large chunks of code.

One powerful workflow is “vibe coding” from a sketch:

1. Take a photo of a rough hand-drawn layout of an app.
2. In AI Studio, upload the image and prompt: “Build this as a React and Tailwind app with an Apple-inspired clean design aesthetic. Output the full component.”

Flash can generate a full React component—hundreds of lines—without truncating mid-function. Then you can paste the code into AI Studio’s built-in live preview panel to see the app render immediately, with working buttons and layout.

This all runs in the browser on a standard Pro account; no extra desktop app is required.

Structured data extraction from images (JSON output)

Flash also replaces many traditional OCR and data-extraction tools directly inside AI Studio.

Imagine you have 15 receipt photos from different countries, in different languages and formats. You want clean, structured data without writing any API code. Here’s how to do it:

1. Open AI Studio and go to the structured output panel.
2. Define your schema visually, for example:
• merchant_name
• date
• total_amount
• currency
• line_items
3. Upload all 15 receipt images in a single request.
4. Run the extraction.

Flash will:

• Read each receipt, regardless of language
• Normalize everything into the schema you defined
• Return valid JSON for each receipt
• Let you download the JSON file with one click

If you handle expense tracking, invoice processing, or any workflow where you used to pay for a separate OCR API, this can be a major simplification and cost saver. For more on using AI Studio for media workflows, you may also find this guide to generating AI voice in Google AI Studio helpful.

Agentic workflows with Google Workspace extensions

One of the most powerful shifts with Gemini is using it as an operator across your tools, not just a text generator.

In the Gemini app, you can enable Workspace extensions under Settings → Extensions. Turn on:

• Google Drive
• Google Docs
• Gmail
• Google Calendar (and others you need)

Then you can run agentic chains with a single prompt, for example:

“@Google Drive find my May sales report. @Google Docs create a new summary document from it. @Gmail draft a team update with a link to that new doc.”

Gemini will then:

1. Search Drive and open the correct file.
2. Read the content and generate a summary in a new Google Doc.
3. Draft a Gmail message with the link already inserted.

You never have to manually open Drive, Docs, or Gmail. The @-mention syntax is how you control which tools Gemini can touch, and clear prompts help keep it on track. This is the “agentic Gemini” moment Google has been building toward—moving from writing assistance to actual multi-step task execution.

If you’re interested in how these agentic capabilities fit into Google’s broader strategy, you can dive deeper in our breakdown of Gemini Deep Research and enterprise agents.

Where Gemini Flash falls short

Flash is impressive for the price, but there are a few important caveats that Google isn’t highlighting in its marketing.

1. Long-context retrieval regressed

On the MRCR v2 benchmark at a 128K token context, Gemini 3.5 Flash scores about 7.6 points lower than Gemini 3.1 Pro. MRCR measures how accurately a model can retrieve specific information from very long documents.

In practice, this means that while Flash can ingest huge contexts, it’s slightly less reliable at pinpointing precise details deep inside them compared to earlier Pro models. For high-stakes long-context tasks, you should:

• Use high thinking level
• Manually verify key facts and citations

2. Outputs are more verbose

Independent benchmarks and real-world testing show Flash tends to use roughly twice as many tokens on reasoning-heavy tasks compared to earlier models. That means:

• Longer answers by default
• Higher output token usage (and therefore cost) on complex tasks

You can manage this by asking explicitly for concise answers or bullet points when you don’t need full essays.

3. Silent default change in thinking level

When users were migrated from Gemini 2.5 Pro to 3.5 Flash, the default thinking level in the Gemini app quietly dropped from high to medium. Google didn’t announce this, so many people are seeing slightly worse reasoning without knowing why.

If you’re on a Pro account, it’s worth taking 15 seconds to:

1. Open Gemini app settings.
2. Find the thinking level configuration.
3. Change it from medium to high.

This one tweak can noticeably improve output quality across the board, especially for complex tasks.

How Flash changes the economics of AI

With Gemini 3.5 Flash, Google has effectively made “good enough for production” AI far cheaper than most competitors. For many workloads, you get:

• Pro-tier performance
• A huge context window
• Strong multimodal capabilities
• Agentic workflows across Google Workspace

All at a fraction of the cost of models like Claude Opus or GPT 5.x, especially on output-heavy tasks.

There are still scenarios where you might prefer a different model—particularly for precision long-context retrieval or very specialized reasoning—but for a wide range of day-to-day and production use cases, Flash is now a serious default choice.

Bottom line

Gemini 3.5 Flash is more than a placeholder while Pro is delayed. It’s a fast, multimodal, and surprisingly capable model that delivers Pro-level results for many workflows at a much lower price point.

Use it for:

• Multimodal tasks (images, video, audio)
• Long document analysis with high thinking
• Large code generation and live previews in AI Studio
• Structured data extraction from images into JSON
• Agentic chains across Google Workspace

Just be cautious with high-stakes long-context retrieval, keep an eye on verbosity and token usage, and make sure your thinking level is set where you actually want it. With those guardrails in place, Flash is one of the most practical upgrades in the Gemini lineup so far.