Latest News Video Generation Text-to-Speech

Google’s New Fabula Screenwriting Tool and Nvidia’s Image-to-City AI Are Huge for Filmmakers

15 May 2026 05:37 64,744 views

From Nvidia’s image-to-3D city generator to Google’s Fabula screenwriting assistant and next‑gen text‑to‑speech, AI filmmaking just took a big leap. Here’s what these new tools do, why they matter, and how they could change the way you plan, shoot, and finish your next project.

AI for filmmakers is moving fast—from generating entire 3D cities from a single image to tools that help you outline, beat out, and write your script in one place. If you’re building stories with AI, the latest wave of tools from Nvidia, Google, and others is worth paying attention to.

Nvidia’s LRA 2: Turn a Single Image into an Interactive 3D World

Nvidia has introduced a new tool called LRA 2 that can take a single input image and turn it into a full 3D world. From that world, you can move a virtual camera around, generate video, and even connect it to other Nvidia systems for interactive simulations.

Here’s what LRA 2 makes possible:

• Image to 3D world: Start with a photo and LRA 2 reconstructs the 3D geometry behind it, creating a navigable environment rather than just a flat scene.

• Camera moves and shots: Once the world is generated, you can explore it, frame shots, and create camera moves as if you were on a virtual set.

• Integration with Isaac Sim: Nvidia’s Isaac Sim (used heavily in robotics and simulation) can plug into these worlds, allowing you to run interactive scenarios—useful for both robotics testing and cinematic previs.

This is similar in spirit to tools like Genie 3, but with richer 3D information and more robust rendering. Nvidia has also released an interactive demo on Hugging Face so you can try it yourself.

Why This Matters for Filmmakers

Most AI video workflows today either:

• Build a simple 3D scene, render images, and animate them with text-to-video, or
• Upload assets to an AI video tool and hope it maintains some consistency.

The problem is that you often lose environmental consistency from shot to shot. With tools like LRA 2, you’re effectively building a 3D set:

• You generate a world once
• Direct camera moves inside that world
• Prompt for characters and actions
• Reuse the same environment across multiple shots and sequences

This is much closer to traditional filmmaking: you have a set, you block your actors, and you move the camera. Only now, the set is AI-generated from a single image.

More Open-Source 3D World Generators

Nvidia isn’t alone. The team at Tencent Hunan (creators of Miniax) has released its own open-source tool that also turns images into 3D worlds. You upload an image and get a navigable 3D environment in return.

The big takeaway: open-source tools are rapidly catching up to (and sometimes matching) closed, credit-based platforms. Results that would have cost studios millions just months ago are now achievable with free or low-cost tools that anyone can download and run.

Spark 2 and City-Scale 3D Environments

World Labs, a group behind a lot of the image-to-3D world tech used in popular aggregators, has released a render engine called Spark 2. It’s designed to make interacting with large 3D worlds more scalable for the web.

Two standout examples from their research:

• Hobbiton interior: A detailed Hobbit-hole you can walk through, with convincing geometry and interior detail.
• San Francisco at city scale: A model that lets you zoom from a wide city view down to street-level detail. It isn’t fully real-time yet, but it’s able to composite huge amounts of 3D information into a single navigable scene.

For filmmakers, this points toward a future where you can generate and explore entire cities, pick your angles, and then layer in characters and action with your favorite text-to-video model.

Wonders Zoom: “Enhance” for Real

A new research paper called Wonders Zoom tackles the classic sci-fi trope: “Zoom in. Enhance.” You feed in an image and can infinitely zoom into any region, with generative AI filling in and enhancing the details as you go.

It’s not forensic-level magic—you won’t reliably pull a legible license plate from a few pixels—but it’s powerful for storytelling. If you’re missing coverage or need closer shots from a single frame, tools like this can help generate new angles and details that feel consistent with the original image.

Seed Dance 2.0 and Runway: Removing Unwanted Characters from AI Video

Seed Dance 2.0 continues to be a strong AI video model, and there’s an interesting workflow for cleaning up shots when something unwanted appears in frame.

Workflow Overview

1. Generate your base shot: For example, a man walking down an office hallway generated with Seed Dance (via a platform like Freepik’s video tools).

2. Spot the issue: Maybe a random character walks into the background near the end of the clip.

3. Use Seed Dance again (inside Runway): Upload the original clip to Runway, select the Seed Dance 2.0-based multimodal model, and prompt something like: “Remove the woman walking out of the room at 14 seconds.”

In testing, it took around eight generations to fully remove the unwanted character. The final result worked, but there were some subtle distortions in reflections and background details.

Cost and Quality Trade-Offs

Each generation cost about $4, so eight attempts came out to roughly $32. That adds up quickly, and each regeneration risks some quality loss. When possible, it’s still better to generate your base footage as close to final as you can, rather than relying on heavy post-fix passes.

That said, Runway now offers Seed Dance in an “unlimited” tier (around $95/month), which could be compelling if you’re doing a lot of experimentation and want to iterate without worrying about per-clip charges.

Happy Horse: A New Challenger to Seed Dance

A new AI video model with the unforgettable name “Happy Horse” has been confirmed as a video generation model from Alibaba. Early rankings on LM Arena already list it as one of the top tools for AI video editing, potentially on par with Seed Dance 2.0.

The model isn’t publicly available yet, and it’s not entirely clear how it’s being benchmarked, but it’s expected to be powerful and likely open-source. Once released, it could become a serious option for filmmakers who prefer open tooling over closed platforms.

Midjourney v8.1: Big Upgrade for Cinematic Concept Art

Midjourney has rolled out version 8.1, a much-needed improvement over the earlier v8 alpha. For filmmakers who rely on AI for look development and concept frames, this update is significant.

Key improvements in Midjourney v8.1:

• Native 2K HD rendering for sharper images
• 3x faster than the previous v8 version
• Better overall quality and more cinematic results

In side-by-side tests:

• A “cinematic still of a man in a sci-fi mech suit in a futuristic market” looked far more polished and filmic in v8.1 than in v8.
• A shot of a snorkeler encountering a giant squid felt more realistic and less like rough concept art.
• For a “Cosmic Sushi” poster, v8.1 produced a cleaner, more obviously AI-perfect design, while v8’s version felt more hand-drawn and imperfect—in a good way.

The takeaway: v8.1 is a strong upgrade for cinematic imagery, but if you want something that feels more illustrated or imperfect, older versions can still be useful. Either way, Midjourney remains one of the best tools for quickly exploring the visual language of a film, especially when paired with AI editing tools like those in DaVinci Resolve 21.

Google Fabula: An AI Screenwriting Tool Built Around Story Beats

One of the most exciting announcements for filmmakers is Google Fabula, a new AI-powered screenwriting tool. There’s no public demo yet, but the interface and workflow are already clear—and very promising.

How Google Fabula Works

Fabula is built around three main panels:

• Left: Story input – You upload your characters, scenes, and overall story treatment. Think of this as your initial idea document.

• Middle: Story beats – This is where you break the story into beats, like sticky notes on a whiteboard. You can rearrange, refine, and experiment with structure.

• Right: Script output – Fabula turns your beats and treatment into an actual script, updating as you refine the beats and story details.

In other words, it mirrors the real writing process—brainstorming, structuring, then drafting—but keeps everything in a single, connected workspace.

Why Fabula Could Be a Big Deal

Because it’s from Google, Fabula could eventually integrate tightly with the rest of their creative stack:

• Image generation via models like Nano Banana Pro
• Video generation via tools like Google Veo
• Smart stitching and organization using embedding models like Embedding 2

Instead of jumping between a dozen separate apps, you could have a more end-to-end pipeline: outline your film, generate visuals, and assemble assets with tools that all speak the same language.

For writers and directors, Fabula hints at a near future where story structure, character arcs, and AI-generated visuals are all developed in one ecosystem.

Google’s Gemini 3.1 Flash TTS: Context-Aware Voice Direction

Google also released a new text-to-speech model called Gemini 3.1 Flash TTS (Text-to-Speech) Preview. The name is clunky, but the capabilities are impressive—especially for filmmakers working with AI voices.

Scene-Directed Voice Acting

Most TTS tools today let you tweak tone with simple tags like “sad” or “angry.” Gemini 3.1 Flash TTS goes further by letting you describe the scene and context in natural language, similar to how you’d direct an actor.

For example, you can specify:

• Scene: “A man is at a conference announcing a new tool.”
• Context: “He just finished his big keynote and is now revealing Gemini 3.1 Flash TTS.”

The model then delivers the line in a way that matches that scenario—energetic, confident, and stage-ready.

Change the scene to something like: “A man is breaking up with his girlfriend. He’s sad and sitting at a dinner table,” and the same line becomes quieter, more emotional, and more vulnerable.

How It Compares to 11 Labs

11 Labs has been the go-to for high-quality AI voices, and it still produces strong results. But its approach is more tag-based: you add something like “[sad]” to set the mood.

In comparison tests, both tools could sound “sad,” but Google’s model responded more deeply to the full scene description—capturing not just emotion, but performance style. This is closer to real directing, where you talk through the situation, not just the mood.

For filmmakers building AI-driven characters, this kind of scene-aware direction could be a game-changer, especially when combined with robust AI audio workstations like those covered in Ace Studio.

Hollywood, AI, and Real Productions

Outside the lab, AI is quietly reshaping real film projects and careers:

• Jobs and AI: A recent Hollywood Reporter deep dive found that many job changes in the industry aren’t directly caused by AI. In fact, roles like writers and artists are often more in demand, as productions need people who can creatively direct and shape AI outputs.

• AI in a $70M feature: Director Doug Liman is reportedly working on a $70 million feature about the history of Bitcoin, starring actors like Gal Gadot and Casey Affleck, that uses generative AI to build environments—similar in spirit to virtual production, but with AI-generated worlds.

• Memory startup from a star: Milla Jovovich is a founding member of an AI startup called MIM Palace, which aims to create a persistent, structured “memory palace” of user data that AI systems can navigate. It’s an attempt to give AI a more stable, organized long-term memory of your interactions.

All of this points to a broader trend: AI isn’t replacing filmmakers—it’s becoming part of the toolkit, from pre-production and writing to worldbuilding, voice acting, and final polish.

Community Projects and Learning

Alongside the tech, there’s a growing ecosystem of AI film communities, meetups, and standout projects:

• AI filmmaking meetups: Events are popping up in places like Palm Beach and Los Angeles, plus online workshops focused on building portfolios in the AI era.

• Notable AI film projects:
– A prologue to “The Chronicles of Bone,” with strong pacing, sound design, and a striking visual style.
– A Snickers spec commercial with a clever twist and high production value.
– “The Sister,” a short film that not only looks great but also includes a process breakdown, showing exactly how it was made.

The more artists share their workflows, the faster the entire field advances. If you’re experimenting with these tools, documenting and sharing your process can be just as valuable as the finished film.

AI is quickly becoming a full-stack filmmaking partner—from generating cities and sets to helping you write scripts, direct performances, and polish final shots. The tools above are early signs of where this is all headed: more control, more consistency, and more creative leverage for filmmakers who are willing to experiment.