How to Run ACE 1.5 XL: The Best Free Local AI Music Generator
AI music generation just took a big leap forward. ACE 1.5 XL is an open-source model that can generate full songs with vocals, instruments, and even multiple languages—while running locally on a consumer GPU. It’s fast, free, and according to its own benchmarks, it even outperforms leading closed models like Suno v5 and Udio.
If you’ve been waiting for a serious, local alternative to cloud music generators, this is it.
What ACE 1.5 XL Can Do
ACE 1.5 XL is the upgraded version of the earlier ACE 1.5 model, with noticeably better audio quality, more consistent songs, and more natural vocals. It’s a text-to-music model: you describe the style and provide lyrics, and it generates a full track.
The model can handle a wide range of genres and use cases:
Vocal songs with clear, dynamic singing
Compared to older open models, vocals are cleaner, more expressive, and sit better in the mix. It can follow structured lyrics with sections like verse, pre-chorus, chorus, and bridge, and keep the song coherent over time.
Multiple languages and styles
ACE 1.5 XL isn’t limited to English. It can generate:
• Italian-style opera with powerful, sustained vocals
• Latin trap in Spanish
• J-pop with Eurobeat and trance influences
• Chinese bossa nova with nylon guitar and soft percussion
This makes it useful for global creators who want to experiment across genres and languages.
Children’s songs and lighter styles
The model can also generate cheerful, kid-friendly tracks—like bright ukulele songs with simple, catchy lyrics. This is great for educational content, kids’ apps, or background music for family-friendly videos.
Jazz and more complex arrangements
Earlier open music models often struggled with jazz, but ACE 1.5 XL can produce convincing jazz-style tracks with more realistic harmony and instrumentation.
Instrumentals and cinematic pieces
You’re not limited to vocal music. By tagging a prompt as “instrumental,” you can generate:
• Pure instrumentals (e.g., tango pieces)
• Hybrid tracks with instruments plus choir
• Orchestrated pieces where you specify when instruments enter (e.g., “flute continues, harp enters, cello enters”)
The model can follow these cues surprisingly well, bringing in instruments at the right moments.
How ACE 1.5 XL Compares to Other Models
According to its benchmarks, ACE 1.5 XL doesn’t just beat previous open-source models—it also scores higher than top closed systems like Udio and Suno v5 on metrics like:
• Song coherence
• Musicality
• Naturalness of vocals and instruments
On top of that, it’s extremely fast. For a 4-minute song, it can be up to 120x faster than some older models, especially when configured correctly.
If you’re interested in pushing other generative media tools locally, you might also like guides such as how to run LTX 2.3 for free for local text-to-video, which pairs well with ACE for full audio-visual workflows.
Hardware Requirements and Performance
ACE 1.5 XL is designed to run on consumer hardware, but you still need a reasonably capable GPU.
VRAM requirements
For the official XL models:
• Minimum (with offloading + quantization): 12 GB VRAM
• Recommended (no offloading): 20 GB VRAM
• With an additional language model ("thinking mode"): around 24 GB total is safer
If you only have 12 GB, you can still use it by enabling:
• CPU offload (part of the model runs on your CPU)
• int8 quantization (compresses the model to use less VRAM)
In theory, this slightly reduces quality, but in practice the difference is often hard to hear.
Quantized community builds
Because ACE 1.5 XL is open-source, the community has already started releasing smaller, quantized versions on platforms like Hugging Face. Some are under 10 GB in size, and more optimized formats (like GGUF) are likely to appear, further lowering the VRAM barrier over time.
Supported hardware
ACE 1.5 XL supports:
• NVIDIA GPUs
• AMD GPUs
• Apple Silicon (via Metal / MPS)
This makes it one of the most accessible high-end music generators for local use.
Step-by-Step: Installing ACE 1.5 XL Locally
The project provides a web-based interface that runs locally in your browser. Here’s the high-level setup process.
1. Install UV (Python environment manager)
UV is used to manage dependencies and create a virtual environment for ACE, so it doesn’t interfere with other tools on your system.
On Windows, you:
• Open PowerShell as Administrator
• Paste the install command from the project’s GitHub page
• Wait until the installation completes without errors
Once done, UV is ready to handle the rest of the setup.
2. Install Git
If you don’t already have Git installed:
• Download the installer from the official Git website
• Run the .exe
• Accept the default options and complete the installation
Git is needed to clone the ACE 1.5 repository.
3. Clone the ACE 1.5 repository
Choose where you want ACE installed (for example, your Desktop), then:
• Open that folder in File Explorer
• Type cmd in the address bar to open a Command Prompt in that folder
• Run the git clone command from the ACE GitHub page: https://github.com/ace-step/ACE-Step-1.5
This creates an ACE-1.5 folder containing all the project files.
4. Set up the environment with UV
Next, change into the new folder and let UV handle dependencies:
• In Command Prompt, run the cd command into the ACE 1.5 folder
• Run uv sync
UV will:
• Create a virtual environment
• Install all required Python packages
• Download PyTorch (around 3 GB)
When it finishes without errors, ACE’s backend is installed.
5. Download the XL model
The XL release includes three variants:
• Base: for training and fine-tuning only
• SFT: higher quality, more steps, slower
• Turbo: fewer steps, much faster, slightly lower quality
For most users, the Turbo XL model is the best starting point.
To download it:
• Go to the model’s page on Hugging Face
• Copy the Hugging Face CLI download command for the Turbo XL checkpoint
• Open Command Prompt in your ACE 1.5 folder (type cmd in the folder path bar)
• Paste and run the command
The Turbo model is about 20 GB, so the download can take a while. When it completes without errors, the model files will appear in the checkpoints folder.
Running the ACE 1.5 XL Interface
1. Start the web UI
To launch the interface:
• Open the ACE 1.5 folder
• Type cmd in the address bar to open Command Prompt there
• Run: uv run ace
On first launch, it may take a few minutes to load. The app will automatically detect your GPU and, if needed, enable CPU offloading when your VRAM is lower than the model size.
Once it’s ready, you’ll see a local URL (like http://127.0.0.1:xxxx). Ctrl+click it to open the web UI in your browser.
2. Configure core settings
Before generating music, you’ll configure a few options and click Initialize service.
Key settings include:
Language
• Set the UI language (e.g., English).
GPU and device
• The interface usually auto-detects your GPU and tier.
• Leave device on auto unless you need to force a specific backend.
Checkpoint file
• Point this to the checkpoints folder where the XL Turbo model was downloaded.
Language model ("thinking mode")
• Optional, but can improve lyrics, structure, and overall quality.
• Uses extra VRAM and slows things down slightly.
• You can disable it to save memory and speed up generation.
Performance options
• Flash attention: If installed, this can speed up generation by 20–30% and reduce memory usage.
• CPU offload: Enable this if your GPU can’t fit the full model (e.g., 12–16 GB VRAM).
• Compile model: Uses PyTorch compilation. The first generation is slower, but later ones are 10–20% faster.
• int8 quantization: Compresses the model to use less VRAM—very useful for GPUs under 16 GB.
Once you’ve set these, click Initialize service and wait for the model to load and (if enabled) compile/quantize.
3. Diffusion and generation settings
After initialization, move to the diffusion and generation sections.
Number of steps
• For the Turbo model, 4–8 steps are usually enough (e.g., 6 steps).
• The SFT model needs more (around 30–50 steps), which takes longer but can improve quality.
Inference method and sampler
• These control the underlying diffusion algorithm.
• For most users, the default settings work well.
If you enabled a language model, you’ll also see extra options to fine-tune how it influences lyrics and structure.
Generating Your First Song
1. Write your prompt and lyrics
In the Generation tab, you’ll see two main text boxes:
• Prompt: Describe the style and feel of the song.
• Lyrics: Provide the words, optionally structured with tags.
Example prompt:
“Euro-pop, catchy EDM, upbeat, rhythmic”
Example lyrics structure:
• Use tags like [verse], [pre-chorus], [chorus], [bridge], [intro], [outro] to guide the song’s layout.
• The model will follow these sections and try to keep musical and lyrical coherence.
For instrumentals, you can add an instrumental tag and skip lyrics entirely.
2. Optional parameters
Click Optional parameters to set:
• BPM (beats per minute)
• Key (e.g., C major, A minor)
• Time signature (e.g., 4/4, 3/4)
These controls don’t always behave perfectly, but they’re worth experimenting with if you want more musical precision.
You can also set:
• Batch size: How many variations to generate at once (e.g., 2 songs per run).
• Audio duration: Leave at -1 for automatic length, or specify a duration.
3. Generate and refine
Click Generate music to start. After a short wait, you’ll get one or more audio outputs you can preview directly in the interface.
If you’re not happy with the result, you can:
• Tweak the prompt (genre, mood, instrumentation).
• Adjust steps or sampler settings.
• Change lyrics or structure tags.
• Enable or disable the language model to see how it affects coherence.
Advanced Features to Explore
Beyond basic text-to-music, the ACE 1.5 XL interface includes more advanced tools:
• Reference audio: Upload an existing track to guide style and arrangement.
• Inpainting / editing: Regenerate or fix specific sections of a song without changing the whole track.
• Remixing: Use an existing song as a base and transform it into a new version.
These features make ACE 1.5 XL useful not just for generating from scratch, but also for iterative music production and creative sound design. If you’re building full AI-powered content workflows, combining tools like this with modern video generators (for example, platforms that support Seed Dance 2.0 or LTX-style models, as covered in our guide on bulk-creating Ghibli-style nostalgia videos) can give you end-to-end AI production pipelines.
Why ACE 1.5 XL Matters
ACE 1.5 XL is a milestone for open-source AI music:
• It rivals or beats top closed models in benchmarks.
• It runs locally on consumer hardware, including AMD and Apple Silicon.
• It’s fast, flexible, and highly configurable.
• It supports vocals, multiple languages, complex genres, and detailed instrument control.
If you care about owning your workflow, avoiding usage limits, and experimenting freely with AI music, this is one of the most powerful tools you can run on your own machine today.
Install it, play with prompts and genres, and see how far you can push local AI music generation.
Comments
No comments yet. Be the first to share your thoughts!