Music Generation Text-to-Speech Education & Learning

How to Run ACE 1.5 XL: The Best Free Local AI Music Generator

12 May 2026 04:37 100,458 views

ACE 1.5 XL is a powerful open‑source AI music generator that rivals top closed models like Suno and Udio. Learn what it can do, what hardware you need, and how to install and run it locally for free and unlimited music generation.

AI music generation just took a big leap forward. ACE 1.5 XL is an open-source model that can generate full songs with vocals, instruments, and even multiple languages—while running locally on a consumer GPU. It’s fast, free, and according to its own benchmarks, it even outperforms leading closed models like Suno v5 and Udio.

If you’ve been waiting for a serious, local alternative to cloud music generators, this is it.

What ACE 1.5 XL Can Do

ACE 1.5 XL is the upgraded version of the earlier ACE 1.5 model, with noticeably better audio quality, more consistent songs, and more natural vocals. It’s a text-to-music model: you describe the style and provide lyrics, and it generates a full track.

The model can handle a wide range of genres and use cases:

Vocal songs with clear, dynamic singing

Compared to older open models, vocals are cleaner, more expressive, and sit better in the mix. It can follow structured lyrics with sections like verse, pre-chorus, chorus, and bridge, and keep the song coherent over time.

Multiple languages and styles

ACE 1.5 XL isn’t limited to English. It can generate:

• Italian-style opera with powerful, sustained vocals
• Latin trap in Spanish
• J-pop with Eurobeat and trance influences
• Chinese bossa nova with nylon guitar and soft percussion

This makes it useful for global creators who want to experiment across genres and languages.

Children’s songs and lighter styles

The model can also generate cheerful, kid-friendly tracks—like bright ukulele songs with simple, catchy lyrics. This is great for educational content, kids’ apps, or background music for family-friendly videos.

Jazz and more complex arrangements

Earlier open music models often struggled with jazz, but ACE 1.5 XL can produce convincing jazz-style tracks with more realistic harmony and instrumentation.

Instrumentals and cinematic pieces

You’re not limited to vocal music. By tagging a prompt as “instrumental,” you can generate:

• Pure instrumentals (e.g., tango pieces)
• Hybrid tracks with instruments plus choir
• Orchestrated pieces where you specify when instruments enter (e.g., “flute continues, harp enters, cello enters”)

The model can follow these cues surprisingly well, bringing in instruments at the right moments.

How ACE 1.5 XL Compares to Other Models

According to its benchmarks, ACE 1.5 XL doesn’t just beat previous open-source models—it also scores higher than top closed systems like Udio and Suno v5 on metrics like:

• Song coherence
• Musicality
• Naturalness of vocals and instruments

On top of that, it’s extremely fast. For a 4-minute song, it can be up to 120x faster than some older models, especially when configured correctly.

If you’re interested in pushing other generative media tools locally, you might also like guides such as how to run LTX 2.3 for free for local text-to-video, which pairs well with ACE for full audio-visual workflows.

Hardware Requirements and Performance

ACE 1.5 XL is designed to run on consumer hardware, but you still need a reasonably capable GPU.

VRAM requirements

For the official XL models:

• Minimum (with offloading + quantization): 12 GB VRAM
• Recommended (no offloading): 20 GB VRAM
• With an additional language model ("thinking mode"): around 24 GB total is safer

If you only have 12 GB, you can still use it by enabling:

• CPU offload (part of the model runs on your CPU)
• int8 quantization (compresses the model to use less VRAM)

In theory, this slightly reduces quality, but in practice the difference is often hard to hear.

Quantized community builds

Because ACE 1.5 XL is open-source, the community has already started releasing smaller, quantized versions on platforms like Hugging Face. Some are under 10 GB in size, and more optimized formats (like GGUF) are likely to appear, further lowering the VRAM barrier over time.

Supported hardware

ACE 1.5 XL supports:

• NVIDIA GPUs
• AMD GPUs
• Apple Silicon (via Metal / MPS)

This makes it one of the most accessible high-end music generators for local use.

Step-by-Step: Installing ACE 1.5 XL Locally

The project provides a web-based interface that runs locally in your browser. Here’s the high-level setup process.

1. Install UV (Python environment manager)

UV is used to manage dependencies and create a virtual environment for ACE, so it doesn’t interfere with other tools on your system.

On Windows, you:

• Open PowerShell as Administrator
• Paste the install command from the project’s GitHub page
• Wait until the installation completes without errors

Once done, UV is ready to handle the rest of the setup.

2. Install Git

If you don’t already have Git installed:

• Download the installer from the official Git website
• Run the .exe
• Accept the default options and complete the installation

Git is needed to clone the ACE 1.5 repository.

3. Clone the ACE 1.5 repository

Choose where you want ACE installed (for example, your Desktop), then:

• Open that folder in File Explorer
• Type cmd in the address bar to open a Command Prompt in that folder
• Run the git clone command from the ACE GitHub page: https://github.com/ace-step/ACE-Step-1.5

This creates an ACE-1.5 folder containing all the project files.

4. Set up the environment with UV

Next, change into the new folder and let UV handle dependencies:

• In Command Prompt, run the cd command into the ACE 1.5 folder
• Run uv sync

UV will:

• Create a virtual environment
• Install all required Python packages
• Download PyTorch (around 3 GB)

When it finishes without errors, ACE’s backend is installed.

5. Download the XL model

The XL release includes three variants:

• Base: for training and fine-tuning only
• SFT: higher quality, more steps, slower
• Turbo: fewer steps, much faster, slightly lower quality

For most users, the Turbo XL model is the best starting point.

To download it:

• Go to the model’s page on Hugging Face
• Copy the Hugging Face CLI download command for the Turbo XL checkpoint
• Open Command Prompt in your ACE 1.5 folder (type cmd in the folder path bar)
• Paste and run the command

The Turbo model is about 20 GB, so the download can take a while. When it completes without errors, the model files will appear in the checkpoints folder.

Running the ACE 1.5 XL Interface

1. Start the web UI

To launch the interface:

• Open the ACE 1.5 folder
• Type cmd in the address bar to open Command Prompt there
• Run: uv run ace

On first launch, it may take a few minutes to load. The app will automatically detect your GPU and, if needed, enable CPU offloading when your VRAM is lower than the model size.

Once it’s ready, you’ll see a local URL (like http://127.0.0.1:xxxx). Ctrl+click it to open the web UI in your browser.

2. Configure core settings

Before generating music, you’ll configure a few options and click Initialize service.

Key settings include:

Language
• Set the UI language (e.g., English).

GPU and device
• The interface usually auto-detects your GPU and tier.
• Leave device on auto unless you need to force a specific backend.

Checkpoint file
• Point this to the checkpoints folder where the XL Turbo model was downloaded.

Language model ("thinking mode")
• Optional, but can improve lyrics, structure, and overall quality.
• Uses extra VRAM and slows things down slightly.
• You can disable it to save memory and speed up generation.

Performance options

• Flash attention: If installed, this can speed up generation by 20–30% and reduce memory usage.
• CPU offload: Enable this if your GPU can’t fit the full model (e.g., 12–16 GB VRAM).
• Compile model: Uses PyTorch compilation. The first generation is slower, but later ones are 10–20% faster.
• int8 quantization: Compresses the model to use less VRAM—very useful for GPUs under 16 GB.

Once you’ve set these, click Initialize service and wait for the model to load and (if enabled) compile/quantize.

3. Diffusion and generation settings

After initialization, move to the diffusion and generation sections.

Number of steps
• For the Turbo model, 4–8 steps are usually enough (e.g., 6 steps).
• The SFT model needs more (around 30–50 steps), which takes longer but can improve quality.

Inference method and sampler
• These control the underlying diffusion algorithm.
• For most users, the default settings work well.

If you enabled a language model, you’ll also see extra options to fine-tune how it influences lyrics and structure.

Generating Your First Song

1. Write your prompt and lyrics

In the Generation tab, you’ll see two main text boxes:

• Prompt: Describe the style and feel of the song.
• Lyrics: Provide the words, optionally structured with tags.

Example prompt:

“Euro-pop, catchy EDM, upbeat, rhythmic”

Example lyrics structure:

• Use tags like [verse], [pre-chorus], [chorus], [bridge], [intro], [outro] to guide the song’s layout.
• The model will follow these sections and try to keep musical and lyrical coherence.

For instrumentals, you can add an instrumental tag and skip lyrics entirely.

2. Optional parameters

Click Optional parameters to set:

• BPM (beats per minute)
• Key (e.g., C major, A minor)
• Time signature (e.g., 4/4, 3/4)

These controls don’t always behave perfectly, but they’re worth experimenting with if you want more musical precision.

You can also set:

• Batch size: How many variations to generate at once (e.g., 2 songs per run).
• Audio duration: Leave at -1 for automatic length, or specify a duration.

3. Generate and refine

Click Generate music to start. After a short wait, you’ll get one or more audio outputs you can preview directly in the interface.

If you’re not happy with the result, you can:

• Tweak the prompt (genre, mood, instrumentation).
• Adjust steps or sampler settings.
• Change lyrics or structure tags.
• Enable or disable the language model to see how it affects coherence.

Advanced Features to Explore

Beyond basic text-to-music, the ACE 1.5 XL interface includes more advanced tools:

• Reference audio: Upload an existing track to guide style and arrangement.
• Inpainting / editing: Regenerate or fix specific sections of a song without changing the whole track.
• Remixing: Use an existing song as a base and transform it into a new version.

These features make ACE 1.5 XL useful not just for generating from scratch, but also for iterative music production and creative sound design. If you’re building full AI-powered content workflows, combining tools like this with modern video generators (for example, platforms that support Seed Dance 2.0 or LTX-style models, as covered in our guide on bulk-creating Ghibli-style nostalgia videos) can give you end-to-end AI production pipelines.

Why ACE 1.5 XL Matters

ACE 1.5 XL is a milestone for open-source AI music:

• It rivals or beats top closed models in benchmarks.
• It runs locally on consumer hardware, including AMD and Apple Silicon.
• It’s fast, flexible, and highly configurable.
• It supports vocals, multiple languages, complex genres, and detailed instrument control.

If you care about owning your workflow, avoiding usage limits, and experimenting freely with AI music, this is one of the most powerful tools you can run on your own machine today.

Install it, play with prompts and genres, and see how far you can push local AI music generation.