How to Generate AI Voice in Google AI Studio After the Latest Update
Google AI Studio has just rolled out a big update to its text-to-speech (TTS) tools, and the interface looks quite different. If you’ve been using it to create AI voiceovers for YouTube videos, tutorials, or short clips, you’ll now see new models, pacing options, and voice styles—plus a few quirks you should know about.
In this guide, you’ll learn how to generate AI voices with the new Gemini 3.1 Flash TTS model, when you should still use Gemini 2.5 Pro TTS instead, and how to set everything up for smooth, natural voiceovers.
Getting Started in Google AI Studio
If you’re new to Google AI Studio, the setup is simple:
1. Go to the Google AI Studio website.
2. Sign in with your Google account.
3. In the left sidebar, go to the Speech & Music section.
This is where you’ll find the text-to-speech models and presets you’ll use to generate AI voices.
Using Gemini 3.1 Flash TTS for Single-Speaker Voiceovers
After the update, you’ll see a new option called Gemini 3.1 Flash TTS (Preview). By default, some of the templates here are designed for multiple speakers, which isn’t ideal if you just want a single, clean narration track.
To set up a simple single-speaker voiceover:
1. Go to Speech & Music.
2. Choose a template that uses a single speaker (often labeled as an everyday assistant or similar).
3. You’ll see two main fields:
• Scene / Sample Context – This replaces the old “instruction” box. It’s where you describe how the voice should behave.
• Text Input – This is where you paste the actual script you want spoken.
You can leave the scene/context blank for simple use, or add a short description like: “YouTuber in the studio explaining a tutorial in a friendly, clear tone.”
If you already have a prompt you like (for example, one you use regularly for your YouTube videos), paste it into the context area and then put your script in the main text box.
Customizing Voice, Pace, Style, and Accent
One of the best parts of the new update is how much control you get over how the AI voice sounds.
Choosing a Voice
Under the speaker options, you’ll see a list of available voices. You can:
• Browse different speakers
• Click to preview how each one sounds
Pick a voice that matches your content—more energetic for promos, calmer for tutorials, etc.
Adjusting Pacing
The new pacing options let you control how fast the AI speaks:
• Natural – Balanced, everyday speech. Good default for most videos.
• Fast / Rapid Fire – Great for fast-paced tutorial videos or content where you want to pack in more information.
• Staccato – Short, clipped sentences with clear pauses between words. This can be useful for very clear, step-by-step instructions, though it may sound less natural for long narrations.
Picking a Style
You can also choose from different speaking styles, such as:
• Voice Smile – Warmer, more upbeat delivery.
• Newscaster – More formal and news-like.
• Whisper – Soft, whispering tone.
• Empathetic – Great for emotional or supportive content.
• Promo Hype – High-energy, promotional feel.
For many educational or how-to videos, Natural or Empathetic works very well.
Setting the Accent
Accent options include:
• American
• British
• Australian
• Translated
If you’re creating multilingual content, the Translated option can help adapt the voice for different languages or audiences. For most English YouTube content, American or British will be the most common choices.
Once you’ve set the voice, pace, style, and accent, paste your script into the text box and click Run. After a short wait, your audio will be generated and you can download it as a file.
When to Use Gemini 3.1 Flash TTS vs. Gemini 2.5 Pro TTS
The new Gemini 3.1 Flash TTS model is powerful, but it has some limitations you should be aware of—especially if you create longer videos.
Best Use Cases for Gemini 3.1 Flash TTS
Gemini 3.1 Flash TTS works well for:
• Short scripts (1–4 paragraphs)
• Quick intros, outros, or short clips
• Snippets for social media or short-form content
It also supports expressive audio tags, which let you fine-tune delivery in more advanced ways. However, for most standard YouTube voiceovers, you won’t need tags—you just need a clean, stable narration.
Why Gemini 2.5 Pro TTS Is Still Better for Long Voiceovers
If you’re creating longer videos—like 6–10 minute tutorials or full-length explainers—the older Gemini 2.5 Pro Preview TTS model is still more reliable.
Users have reported that Gemini 3.1 Flash TTS can produce errors when generating longer scripts (around 6–8 minutes or more). In contrast, Gemini 2.5 Pro TTS can usually handle 6–10 minute voiceovers without issues.
The catch: Gemini 2.5 Pro TTS no longer appears directly in the main Speech & Music dropdown. To access it:
1. Go to the Playground in Google AI Studio.
2. Open the Speech & Music section.
3. Right-click (or open the full model list) to find Gemini 2.5 Pro Preview TTS.
Once selected, choose a simple assistant template, set your voice, pace, and style, and paste in your usual prompt plus script. Even though it doesn’t support expressive tags, it’s excellent for stable, long-form narration.
If you’re exploring other ways to generate audio for your content, you might also like our guide on automatically generating music and sound effects with ACE Studio.
Tips for YouTube-Style Voiceover Prompts
To get consistent results, it helps to use a short, reusable prompt in the context/scene box. For example:
• “A pro YouTuber explaining a tutorial in a clear, friendly, and engaging tone.”
• “Speak like an experienced educator, concise and easy to follow, suitable for a step-by-step tutorial video.”
You can tweak this depending on your niche—tech reviews, news, storytelling, etc. If you’re interested in more voice-focused tools, check out how to clone voices locally with Voicebox as another option in your audio toolkit.
Final Thoughts
After the latest update, Google AI Studio gives you more control over pacing, style, and accents, but it also splits the workload between two main TTS models:
• Use Gemini 3.1 Flash TTS for short, expressive clips and quick voiceovers.
• Use Gemini 2.5 Pro Preview TTS for longer 6–10 minute scripts where stability matters most.
With the right model and a good prompt, you can generate high-quality AI voiceovers for your videos in just a few clicks.
Comments
No comments yet. Be the first to share your thoughts!