Google’s Omni Flash video model and the real future of AI filmmaking

05 Jun 2026 20:37 9,617 views

Google has launched Omni Flash, a new “nano banana for video” model built into Gemini and Google Flow. Here’s how it actually performs against Seedance, why Google’s world-model approach still matters, and why real-time engines like Odyssey’s Star Child 1 may be the real future of AI video.

AI video is moving fast, and Google is finally back in the spotlight with a new model that feels like “nano banana for video.” But while Omni Flash is grabbing headlines, the most exciting breakthroughs might actually be happening elsewhere—in real-time world models and new engines that treat AI video like a live, editable film set.

What is Google Omni Flash?

Google Omni Flash is Google’s new fast AI video model built on its Gemini/DeepMind stack. The easiest way to think about it right now is as “nano banana for video”: you can feed it different media types—images, short video clips, text prompts, or audio—and it outputs a short video.

Today, Omni Flash is limited to:

• Output: up to 10 seconds of video
• Resolution: 720p
• Access: only for paid Google AI Plus Pro or Ultra subscribers
• Mode: “Flash” – a fast, constrained version released for safety and performance reasons

Google has hinted that the long-term goal is true “omni” I/O: input anything, output anything. For now, it’s video-only—but built on top of Google’s broader world-model and data infrastructure.

Hands-on: Omni Flash vs Seedance 2.0

To see where Omni Flash stands, it was tested against Seedance 2.0 (one of today’s strongest omni video models) across several real-world creative tasks. The results were clear: Omni Flash is interesting, but it’s not ready for professional work.

VFX-style edits: explosions, sky beams, and transformations

In one test, drone footage of city towers was used as a base, with a prompt for a dramatic sky beam hitting the buildings. Omni Flash produced something that looked more like clip art pasted over the footage—far from photorealistic.

Seedance 2.0 did better, but even there the results weren’t production-ready. The pattern repeated across other tests:

• A man walking in the desert with an explosion behind him: Omni Flash struggled, Seedance produced a much more cinematic shot.
• A man transforming into an alien using multiple reference images and a reference transformation video: Omni Flash cut off before the transformation and leaned into a strange paper/piñata aesthetic, with noticeable diagonal banding artifacts. Seedance completed the transformation in a single prompt with a more coherent result.

Lip-sync and talking portraits

Another test tried a common trick: upload a still image of a person, pair it with a black video plus an audio clip, and have the model generate a talking, lip-synced portrait.

Omni Flash kept the general scene but introduced heavy oversharpening and an odd texture in the background. Worse, it replaced the uploaded voice with a different one entirely.

Seedance 2.0’s version was far from perfect, but it kept the audio and delivered a more convincing talking-head result overall.

IP issues and cinematic shots

In a shot where a woman walks across a salt flat and a spaceship crashes in the background, Omni Flash produced a Star Destroyer-style crash—raising obvious IP concerns. Surprisingly, the same prompt in Seedance also produced a similar Star Wars-style crash, though the Seedance version looked smoother and more cinematic (aside from some stuttering that would need cleanup in a tool like Topaz Video AI).

Other tests—like turning asteroids into meatballs in a truck chase, or adding a giant sky creature to a handheld car shot—consistently showed Seedance 2.0 outclassing Omni Flash in realism, motion, and overall cinematic feel.

Where Omni Flash actually shines: world understanding

One of the more interesting tests involved changing the era of a scene. The base clip showed a man walking down a modern street; the prompt asked to transform the environment into the 1920s.

Several models were tested:

• Cling: added some 1920s-ish elements, but left in modern details like Lime scooters.
• Luma: dressed the man in 1920s clothing, but still kept scooters and modern cars like Mini Coopers.
• Seedance 2.0: shifted the overall vibe toward the 1920s but changed the scene heavily and still left modern-looking cars.

Omni Flash, however, produced a scene that felt much more consistently 1920s. The physics and motion weren’t production-grade, but the model clearly understood the concept of the era better than the others.

This is where Google’s advantage shows: Omni Flash is plugged into Google’s broader world-model and data infrastructure. Over time, that should mean:

• Smarter, context-aware edits (e.g., “make this look like 1920s New York” or “turn this into a 1970s sci-fi film”).
• More intelligent rotoscoping and object-aware changes.
• Better reasoning about real-world objects, styles, and time periods—beyond pure pattern-matching.

Right now, Omni Flash is clearly behind Seedance 2.0 in raw video quality. But if Google keeps layering in its world understanding and real-time data, future versions could leapfrog models that rely mostly on pattern recognition.

What Omni Flash is (and isn’t) good for today

Even Google’s own DeepMind team has framed Omni Flash as a tool for general creators and casual creativity—not for professional VFX or high-end filmmaking.

Today, Omni Flash is best seen as:

• A playground for quick, fun experiments.
• A glimpse into how Google’s world model might eventually drive smarter video editing tools.
• A first step toward more intelligent, multi-modal “input anything, output anything” workflows.

If you need polished, cinematic AI video right now, Seedance 2.0 and other top-tier models are still the better bet. For a deeper comparison of image models in this space, you might also want to look at how Google’s image model stacks up in this breakdown of Nano Banana vs GPT Image 2.0.

Genie 3 and Google Maps: interactive worlds with missing context

Omni Flash wasn’t Google’s only creative announcement. Genie 3, Google’s experimental tool for turning images or prompts into interactive 3D-style worlds, now has a Google Maps integration.

How it works:

• You select a real-world location via a new Google Maps button (for example, the Space Needle in Seattle).
• Genie 3 generates an explorable 3D world based on that location.
• You can move around as a character inside that space.

In practice, it’s still very early. In one test, Genie 3 generated a world around the Space Needle—but the actual Space Needle was missing. The result was a fun, surreal environment (including a cat avatar that was more lion-sized than housecat), but it highlighted a key limitation: Genie 3 doesn’t yet fully understand the context or landmarks of the locations it’s using.

Even so, tools like Genie 3 point toward a future where AI can turn maps, photos, and sketches into playable, interactive environments—something that overlaps with both gaming and filmmaking.

Hydream vs Nano Banana: open source isn’t there yet

An open-source image model called Hydream has been pitched as a competitor to Google’s Nano Banana. On paper, that’s exciting: a free, hackable alternative to a top-tier image generator.

In practice, Hydream is nowhere near Nano Banana’s quality yet.

Examples from testing:

• A sushi-making diagram: Hydream produced something that looked okay at first glance, but the text was gibberish and the food looked unappetizing and anatomically wrong (think baloney and mac & cheese rolls). Nano Banana, by contrast, generated a clean, polished, infographic-style image.
• A man on a dirt road: Hydream’s composition made little sense (the road direction didn’t match the scene), while Nano Banana produced a coherent, grounded shot.
• A fantasy elf vs demon scene: Hydream’s result was passable but flat; Nano Banana’s looked like high-end concept art.
• A 1985 Mac video game: Hydream again struggled with text and coherence, while Nano Banana nailed the retro aesthetic—including a perfect Goonies poster in the background.

Hydream is a reminder that open-source models are catching up, but for now, commercial models like Nano Banana still dominate on quality and consistency. If you’re exploring image generators for creative work, it’s worth comparing these models side by side, as in our look at several free Chinese image and video generators.

Why AI agents still disappoint for creative workflows

AI agents are everywhere right now. Almost every major AI tool is launching some kind of agent that promises to handle multi-step workflows for you. In theory, that should be perfect for creative projects: upload your script, references, and style, and let the agent do the rest.

In practice, most creative agents are still slow, clunky, and less effective than just using the tools directly.

Character continuity: agents vs manual iteration

One popular suggestion was to use agents to enforce character continuity—keeping the same character look across multiple shots.

In a test using Luma’s agents:

• Two images were uploaded: a portrait of a person and a coffee shop interior.
• The agent was asked to generate a wide shot of that person at the coffee counter, and to double-check that the character matched the reference.

The agent took 2 minutes and 41 seconds and produced an image it claimed was a 94.9% biometric match. In reality, the character looked more like an older cousin: similar, but not the same, and the outfit didn’t match at all.

The same task was done manually in Magnific:

• The same references were uploaded.
• Multiple generations were triggered quickly by repeatedly hitting “generate.”
• Within that same time window, several results were produced—one of which matched the character and outfit far better than the agent’s single attempt.

The key issue: most agents use a waterfall-style workflow (do step A, then B, then C) instead of embracing rapid iteration. For creative work, iteration is everything. A human curator clicking “generate” a few times and picking the best result still beats a slow, linear agent in most cases.

Script-to-film agents: Runway and Utopi AI

Runway’s new agent promises to turn a script into a film by automatically generating characters, sets, and shots. When tested with a simple story about a man at home who gets called back to a lab and sent back in time, the agent instead produced a short trailer-style video.

Issues included:

• Random tonal shifts (e.g., sudden grayscale sections).
• Inconsistent style and character continuity.
• A focus on trailer-like narration instead of actually staging the scripted scene.

Utopi AI, another semi-agentic tool, followed a more guided, step-by-step process: upload a script, then walk through a procedural workflow to generate the film. The result was entertainingly bad—so rough that it was fun to watch, but nowhere near usable for serious work.

The takeaway: creative agents are promising, but right now they’re more of a curiosity than a productivity boost. For most filmmakers and artists, direct control plus fast iteration still wins.

Quinn’s compression breakthrough: cheaper, scalable AI video

One of the biggest bottlenecks in AI filmmaking isn’t just model quality—it’s cost. GPUs are expensive, and generating high-resolution video at scale quickly becomes unaffordable.

The team at Quinn has proposed a clever workaround: an image compression algorithm that shrinks images up to 32x smaller while preserving detail, including character likeness and text.

Why this matters:

• If you can compress frames 32x and then reliably reconstruct them with an AI-aware decoder, you can generate video at a fraction of the usual compute cost.
• Preserving faces and text is crucial; many current compression and upscaling methods turn fine details into mush. Quinn’s approach specifically focuses on keeping those details sharp.

In practice, this kind of compression could make large-scale AI video generation far more affordable and open up longer-form projects that are currently cost-prohibitive.

The real future: real-time world models and live-editable video

While Omni Flash and Seedance are battling it out in the render-and-wait world, the most disruptive work might be happening in real-time AI video.

Odyssey’s new world model, Star Child 1, generates both video and audio in real time. You don’t hit render and wait; the scene plays live, and you can steer it as it runs.

Real-time performance, not offline rendering

Examples from Star Child 1 include:

• A woman on a subway, narrating her thoughts as the world moves around her. The visuals and sound are generated on the fly.
• A man playing bongos, with the drum sounds synced in real time.
• A boat on the water, with surprisingly convincing real-time water simulation—something that’s traditionally very expensive in VFX pipelines.

You can also upload your own audio and have the model generate matching visuals in real time. The fidelity isn’t yet on par with Seedance or Google’s best offline models, but the paradigm is completely different.

Imagine this workflow:

• You’re editing a film in your NLE.
• Instead of rendering a shot, waiting, and re-rendering, you loop a section and direct the AI scene live: “Make the creature slower,” “Add more fog,” “Change the lighting to sunset,” etc.
• Once you like the performance, you lock it in and move on.

That’s the promise of real-time world models like Star Child 1. They turn AI video into something closer to a live 3D set you can direct, rather than a black box you send prompts to and hope for the best.

Agora 1: AI as a real-time game engine

Odyssey also introduced Agora 1, a real-time AI game engine that overlaps heavily with what Genie 3 is trying to do—but with more focus on actual gameplay mechanics and multiplayer interaction.

Agora 1 can:

• Generate interactive, real-time simulation environments.
• Support different gameplay systems.
• Let multiple players inhabit and interact within the same AI-generated world.

To showcase it, Odyssey built a GoldenEye-style multiplayer shooter you can play in real time against other people. Under the hood, the same kind of world-model tech that powers Star Child 1 is being used to generate and manage the environment and interactions.

For filmmakers, this is a glimpse of a future where the line between game engines and AI video tools disappears. You might block scenes, direct performances, and even test story ideas inside a live AI-powered environment before ever committing to a final render.

Seedance 2.1 and 2.0 Mini: what’s coming next

On the more traditional AI video side, Seedance is reportedly about to release two important updates:

• Seedance 2.1: rumored to be roughly 20% better than 2.0 in quality, with a possible jump to native 4K generation (something only Cling currently offers at that level).
• Seedance 2.0 Mini: a lighter, cheaper version priced around $0.07 per second, making high-quality AI video more accessible for longer projects and tighter budgets.

Given that Seedance 2.0 is already one of the strongest models on the market, even a 20% improvement plus 4K support could be a big deal for filmmakers who want top-tier visuals without complex upscaling pipelines.

Adobe + Google: creative tools inside Gemini

Another notable move: Adobe is partnering with Google to bring Adobe tools directly into Gemini. That means you’ll be able to do things like:

• Edit images with Photoshop-grade tools from inside Gemini.
• Trigger video editing actions from within a chat interface.
• Blend generative AI with familiar creative workflows instead of bouncing between disconnected apps.

This kind of integration suggests a future where your AI assistant isn’t just a prompt box—it’s a front-end for your entire creative toolkit.

Culture clash: when AI is better than we think

One amusing story making the rounds: an artist uploaded a photo of a real Monet painting and claimed it was an AI-generated image. The responses poured in:

• “This looks nothing like Monet.”
• “Feels like a high school attempt to copy his style.”
• “You can always tell when it’s AI.”

In reality, it was just… Monet. It’s a good reminder that a lot of the backlash against AI art is more about perception than objective quality. People often judge AI work more harshly simply because they know it’s AI—even when it’s indistinguishable from traditional media.

Where AI filmmaking is really headed

Putting it all together, here’s the big picture:

• Today’s best offline models (like Seedance 2.0) still beat Omni Flash on pure video quality.
• Google’s Omni Flash is weaker visually right now, but its world-model foundation could make it a powerhouse in future versions, especially for context-aware edits and era/style transformations.
• Open-source tools like Hydream are promising but not yet competitive with top commercial models like Nano Banana.
• AI agents for creative work are mostly overhyped today—they’re slower and less effective than direct, iterative use of the tools.
• Compression breakthroughs like Quinn’s could make large-scale AI video far cheaper and more accessible.
• Real-time world models like Odyssey’s Star Child 1 and engines like Agora 1 may be the real game-changers, turning AI video into something you direct live instead of rendering blindly.

We’re still early—but the direction is clear. The future of AI filmmaking looks less like “type a prompt, wait, and hope” and more like stepping onto a live, responsive, AI-powered set where you can shape the story in real time.