How to make AI music videos with perfect lip sync using Seedance 2.0

07 Jun 2026 10:37 10,487 views

Learn a practical workflow for creating short AI music videos with tight, believable lip sync using Suno, Higgsfield Soul Cinema, Seedance 2.0, and CapCut. This guide shows you the one crucial step most people skip and walks through the full 24‑second video build from start to finish.

Getting AI music videos to look cinematic is one thing. Getting the lips to actually match the words is another challenge entirely. Most people upload a song, hit generate, and then wonder why the mouth movements feel off.

This guide walks through a simple, repeatable workflow for creating a 24-second AI music video with tight lip sync on every shot. You will use four tools together: Suno for the track, Higgsfield Soul Cinema for the character image, Seedance 2.0 for video generation, and CapCut for the final edit.

The secret to accurate AI lip sync

The key to good AI lip sync in Seedance 2.0 is not just the audio. The crucial step most people skip is feeding the actual lyrics into the video prompt alongside the audio slice.

Here is why that matters:

• The audio reference gives Seedance the rhythm and timing of the track.
• The lyrics in the prompt tell it exactly which words are being spoken, line by line.

Without the lyrics, Seedance mostly moves the mouth to the beat. With the lyrics, it can shape the mouth around specific words, which is what makes the sync feel believable. This same principle also powers highly accurate talking-head workflows, like in near-perfect AI lip-sync avatar tutorials.

Tools you will need

You only need four tools to follow this workflow:

• Suno – to generate the song and lyrics.
• Higgsfield Soul Cinema – to create a single, cinematic character image.
• Seedance 2.0 (inside Higgsfield) – to generate the AI video clips.
• CapCut – to slice the audio and assemble the final edit.

The whole process builds around one reference image, two audio slices, and the full lyrics.

Step 1: Generate a clear, slow track in Suno

Start by creating your song in Suno. For strong lip sync, clarity matters more than speed or complexity.

Use a prompt that asks for a smooth rap or vocal at around 95 BPM. This slightly slower tempo gives the model more room to shape each word clearly, which makes the lip sync easier for Seedance to read later.

Once Suno generates the track:

1. Listen all the way through and check that every word is clear and easy to follow.
2. When you are happy, download the audio file.
3. Copy the full lyrics that Suno displays and paste them into a notes app or document.

Those lyrics will be critical when you write your Seedance prompts, so keep them open and handy.

Step 2: Slice the audio into two 12-second clips

Seedance 2.0 currently accepts a maximum of 15 seconds of audio per clip. To build a 24-second music video, you will generate two separate 12-second videos and then cut them together.

Open CapCut and import your full Suno track. Then:

1. Trim the first slice from 0 to 12 seconds.
2. Trim the second slice from 12 to 24 seconds.
3. Export each slice as its own audio file and label them clearly, for example:
• song_part1_0-12s.wav
• song_part2_12-24s.wav

Keep track of which slice is which. Each audio slice must match the correct lyrics and video prompt. If you swap them later, the lip sync will break.

Step 3: Create a single cinematic character image

Next, you will generate one high-quality reference image that Seedance will use for every shot in the video. This is what keeps your character consistent across both clips.

In Higgsfield:

1. Open Cinema Studio and choose the Image tab.
2. Select Soul Cinema at 2K resolution.
3. Set the aspect ratio to 16:9.

Write a detailed, cinematic prompt for your character. For example, you might describe:

• A three-quarter angle view of the character (not fully side-on, not straight-on).
• A specific setting, such as an Italian villa courtyard.
• Costume details like a cream suit, Wayfarer sunglasses in one hand, and any accessories.
• Background elements like a marble fountain and cypress trees.
• Lighting and mood.

The three-quarter angle is important: the face is still fully visible for lip reading, but the slight turn makes the shot feel more like a real music video than a flat passport-style photo.

The more specific and cinematic your description, the more Seedance can carry those details into every generated shot. A well-styled character in a realistic environment almost always produces better results than a plain figure on a blank background.

When you are happy with the generated image, save it. This is your single reference image for both video clips.

Step 4: Prepare everything for Seedance 2.0

Before you move into video generation, make sure you have:

• The full Suno song (for final editing).
• Both 12-second audio slices from CapCut.
• The complete lyrics in a document.
• Your Soul Cinema character image (image one).

Seedance will use the same reference image for both clips, plus one audio slice and the matching lyrics for each.

Step 5: Generate video one in Seedance 2.0

Now it is time to create the first 12-second video.

In Higgsfield, go to the Video section and select Seedance 2.0. Then:

1. Upload your Soul Cinema character image as the reference image.
2. Upload the first audio slice (0–12 seconds) as the audio reference.
3. Go back to your lyrics document and copy the lines that cover the first 12 seconds of the song.
4. Paste those exact lines into the video prompt, before or alongside your shot descriptions.

This is the crucial lip-sync step: the lyrics in the prompt must match the audio slice you uploaded.

After the lyrics, describe the shots you want for this first clip. For example, you might specify three different angles in the courtyard, each lasting a few seconds. Seedance reads these descriptions in order, so the sequence of shots in your prompt becomes the sequence in the video.

Set your generation settings:

• Duration: 12 seconds
• Aspect ratio: 16:9
• Resolution: 1080p

Then click generate. When the clip is ready, play it through and check that the mouth movements match the words throughout all shots. If the lyrics and audio slice are aligned correctly, the lip sync should hold very well.

Step 6: Generate video two with a different shot style

For the second half of the song, you can change the camera style to keep the video visually interesting while still maintaining character and lip-sync consistency.

Stay in Seedance 2.0 and:

1. Keep the same reference image (your Soul Cinema character).
2. Upload the second audio slice (12–24 seconds) as the new audio reference.
3. Copy the lyrics that cover the second 12 seconds of the track and paste them into the new video prompt.

Now describe a different shot list. For example, you might choose four handheld shots for a more dynamic feel:

• A handheld shot of the character rapping straight to camera.
• A low-angle handheld shot looking up at the character.
• A close-up of his hand trailing through the marble fountain water.
• A return to the low-angle shot to close the verse.

Use the same generation settings as before:

• Duration: 12 seconds
• Aspect ratio: 16:9
• Resolution: 1080p

Generate the clip, then play it next to video one. Check that:

• The character looks consistent in both clips (same face, outfit, setting).
• The lip sync holds across the entire second verse.
• The two clips feel like parts of the same music video, not two unrelated scenes.

If you want a broader cinematic approach to Seedance and Suno, you can also look at workflows like using Seedance 2.0 and Suno for cinematic AI music videos.

Step 7: Assemble the final 24-second video in CapCut

With both Seedance clips ready, you can now build the final edit.

Open CapCut and:

1. Import video one, video two, and the full Suno song (not the slices).
2. Place video one on the timeline, then drop video two directly after it. Together, they should run for 24 seconds.
3. Add the full Suno track to the audio track underneath the videos.

To lock in the lip sync:

1. Find the exact frame where the first word is sung in the full track.
2. Align that moment with the first frame of video one on the timeline.

Because each Seedance clip was generated with its matching audio slice and lyrics, once you line up the start correctly, the lip sync should stay locked all the way through both clips.

Play the full 24-second video back. If everything looks and sounds right, export the final video from CapCut.

Recap: What makes this workflow work

This workflow is simple, but each step matters:

• Use a clear, slightly slower track so the words are easy to read.
• Slice the audio into 12-second chunks to fit Seedance’s limits.
• Generate a single, detailed Soul Cinema image to keep the character consistent.
• Always pair each audio slice with the exact matching lyrics in your Seedance prompt.
• Plan your shots in order so Seedance can build a cinematic sequence, not just a static talking head.

Once you understand the importance of combining audio timing with explicit lyrics, you can reuse this approach for different genres, characters, and locations—and build AI music videos where the lip sync finally feels right.