How to run Bonsai Image, a 1‑bit local image generation model
Bonsai Image is a new ultra-compressed image generation model from Prism ML that brings fast, local image generation to more modest hardware. Instead of chasing state-of-the-art quality, it focuses on squeezing a Flux-based model down to 1–2 bits while still producing surprisingly usable images.
What is Bonsai Image?
Bonsai Image is a family of quantized image generation models derived from the Flux Klein 4B model. The key idea is aggressive compression: instead of using standard 16-bit floating point weights, Bonsai pushes the model down to 1-bit (binary) or 2-bit (ternary) weights.
This follows Prism ML’s earlier work on Bonsai 8B, a one-bit 8-billion-parameter language model that showed how far you can compress a model while keeping it functional. Bonsai Image applies similar tricks to diffusion-style image generation.
The result is a model that’s much smaller on disk, faster to run, and more accessible to people with consumer GPUs or Apple Silicon laptops, even if it can’t match the absolute best image generators like GPT Image 2 for photorealism and text rendering.
Binary vs ternary: 1‑bit and 2‑bit models explained
Bonsai Image comes in two main flavors:
Binary (1‑bit) – uses only two possible weight values, typically represented as -1 and +1. This is the smallest and most aggressively compressed option.
Ternary (2‑bit) – uses three possible weight values, -1, 0, and +1. It’s still heavily compressed, but keeps a bit more nuance and tends to produce higher-quality images.
Both versions rely on FP16 group-wise scaling to keep the model numerically stable despite the low bit-width. From a user perspective, the main differences are:
File size: the original FP16 Flux Klein 4B transformer is about 7.75 GB. The 2‑bit ternary Bonsai transformer drops to around 1.21 GB, and the 1‑bit binary version shrinks further to roughly 0.93 GB.
Quality vs compression: the ternary model generally produces more detailed, coherent images, especially for complex scenes or text. The binary model is more of a technical showcase: very small and fast, but with a noticeable quality drop.
Even with these tiny transformer files, the total VRAM usage is higher than the raw model size suggests because Bonsai Image still needs extra components like text encoders and VAEs. Expect several gigabytes of VRAM usage even with the smallest model.
Platform support and model variants
Prism ML provides multiple Bonsai Image variants, including builds specifically labeled MLX for Apple Silicon. Those are optimized for Apple’s MLX stack, making it easier to run the model on modern MacBooks and desktops.
On other platforms, you can run the standard PyTorch-based variants on Windows, macOS, or Linux with a compatible GPU. The demo repo includes scripts to simplify setup across operating systems.
Preparing your Windows system
The Windows setup is a bit more hands-on than a typical one-click installer, but it’s manageable if you follow the steps carefully. The demo repository includes a windows.md (or similar) guide that lists all prerequisites your system needs.
These prerequisites cover things like:
GPU drivers and CUDA support
Git and PowerShell usage
Python, Node.js, and npm for the web UI
If any of these steps are unclear, one practical approach is to copy the raw markdown file and ask your favorite LLM to walk you through each requirement on your specific Windows setup.
Cloning the Bonsai Image demo repo
Once your prerequisites are in place, you can start from a PowerShell window:
Open the Windows search bar, type PowerShell, and launch it.
Go to the Bonsai Image model page on Hugging Face and find the link to the demo repository in the resources section.
Click the green Code button, copy the repository URL, then in PowerShell run:
git clone <repo-url>
After cloning, change into the new directory, for example:
cd bonsai-image-demo
(You can start typing the folder name and press Tab to auto-complete.)
Running the setup script safely
The demo repo includes a setup script that installs dependencies and configures the environment. On Windows, you’ll typically copy a PowerShell command block from the quick start section and paste it into your terminal.
One important detail: the docs may show a command that sets a fake Hugging Face token. Even though the models are public, using that placeholder token will cause the download script to fail. Make sure you skip any command that sets an invalid or dummy token.
When the setup script runs correctly, you should see colored output (green and blue text) as it installs packages and configures the environment. The script also includes a safeguard that avoids installing packages that are too new, reducing the risk of pulling in a fresh zero-day supply-chain attack.
If you see some red error text early on, pay attention: in one common failure mode, the setup script aborts before installing everything (like npm), which later breaks the web UI. You may need to rerun the setup with specific environment flags to skip the part that failed, then let the rest complete.
Downloading the model weights
Before you can generate images, you need to download the actual model weights. The demo repo provides a PowerShell script for this. From the project root, run something like:
./scripts/download_model.ps1
By default, this will fetch the ternary (2‑bit) model. You can also specify a particular model name if you want a different variant, but be careful not to copy Linux-style commands directly into PowerShell, as paths and syntax differ.
Once the download finishes, the script will confirm where it saved the model files. At this point, you should have everything needed to run a simple sample generation.
Fixing common path and environment issues
After downloading the model, the sample generation command might still complain that no models are available or that it can’t validate certain paths. One common issue is that the script tries to validate both binary and ternary model paths, even if you’ve only downloaded one of them.
A practical workaround is to set an environment variable that points the missing path to the model you do have. For example, you can temporarily set the “binary” path to the same directory as the ternary model. If you later download the binary model and want to use it properly, you’ll need to undo or adjust that variable.
Another frequent problem is the web UI failing with an error like “npm is not found” or a message asking if setup ran successfully. This usually means the earlier red error stopped the setup script before it installed Node.js dependencies. The fix is to:
Set environment variables to skip the GPU setup portion that failed.
Skip the model download step (since you already did it manually).
Rerun the setup script so it can complete the remaining steps, including npm-related setup.
After that, the web UI should start correctly.
Launching the web UI on Windows
Once the environment is fully configured, you can move beyond command-line samples and use the browser-based interface.
The repo includes a script like scripts/serve.sh for Linux/macOS, but that won’t work directly on Windows. Instead, use the Windows-specific script or command provided in the docs (for example, a PowerShell script under scripts) to launch the server.
When it’s running, you can open the printed URL in your browser and you’ll see a simple interface with options for:
Entering prompts
Choosing resolution (quick vs poster, square vs wide, etc.)
Adjusting the number of diffusion steps
Switching between light and dark themes
Performance: speed, VRAM, and resolutions
One of the most impressive aspects of Bonsai Image is how fast it runs once the model is loaded. On a high-end laptop GPU (for example, an RTX 5090 mobile), 1024×1024 images can generate almost instantly at the default step count.
Even though the transformer files are under 1.5 GB, real-world VRAM usage is higher. With the ternary model loaded, you might see around 5–8.5 GB of VRAM in use during generation, depending on resolution and configuration. The binary model uses slightly less, but the difference is not as dramatic as the transformer file sizes alone might suggest, because the rest of the pipeline still needs memory.
Increasing resolution (for example, switching from a “quick” low-res setting to a “poster” size) increases memory usage and generation time. However, even at larger resolutions, Bonsai Image remains very responsive compared to many full-precision diffusion models.
Image quality: what the model does well (and not so well)
Bonsai Image is not trying to compete directly with top-tier models like ChatGPT Image 2 on photorealism or perfect text. Instead, it aims to show how far you can push compression while keeping the model useful.
In testing, the ternary model in particular does a surprisingly good job with:
Stylized prompts – for example, “ink wash style, gentle shadows” produces charming, consistent images of animals, characters, and objects.
Simple scenes – cute dogs, corgis, stylized portraits, and basic objects come out very well, especially in artistic styles.
Poster-style compositions – vintage or modern movie poster prompts generate layouts with characters, backgrounds, and action elements that feel coherent, even if the text is still mostly gibberish.
Where it struggles more:
Legible text – titles and labels are often close but not correct. You might get something that looks like a movie logo, but the letters won’t spell the exact words.
Complex infographics – prompts like “clean modern infographic explaining how GPUs work” produce the right overall layout (boxes, arrows, charts) but the content and text are garbled.
Highly detailed multi-character scenes – busy environments like retro LAN parties can look a bit uncanny, especially faces and hands, though the overall vibe still comes through.
Compared to the ternary model, the binary (1‑bit) version is clearly lower quality: details are softer, text is worse, and complex prompts degrade faster. But given the extreme compression, it’s impressive that it works at all.
Improving results with more steps
Bonsai Image exposes the number of diffusion steps as a simple slider or numeric field. Increasing this value often yields noticeably better results, especially on the ternary model.
For example:
Doubling steps for an infographic-style prompt can make shapes more defined and text slightly more legible.
Movie poster prompts with more steps produce sharper characters, more realistic buildings, and clearer props.
Portrait prompts at higher steps show better fabric wrinkles, facial details, and overall realism.
There are diminishing returns, and generation times increase with more steps, but on a decent GPU the trade-off is often worth it for higher-quality outputs.
Why Bonsai Image matters
Bonsai Image is less about beating the latest flagship models and more about pushing the boundaries of efficiency. By demonstrating that a heavily compressed 1–2 bit model can still generate compelling images, Prism ML is pointing toward a future where:
More people can run capable image generators locally on mid-range hardware.
Edge devices and low-power systems can host creative models without massive GPUs.
Developers can experiment with new compression and quantization techniques that balance quality and size.
Combined with their earlier one-bit language model, Bonsai Image shows a consistent direction: making AI models dramatically smaller and more accessible without completely sacrificing usefulness.
Who should try Bonsai Image?
Bonsai Image is a great fit if you:
Enjoy tinkering with local AI models and want something fast and lightweight.
Have a gaming PC or Apple Silicon Mac and want to experiment with compressed diffusion models.
Care more about speed, offline use, and technical novelty than absolute cutting-edge image quality.
If you’re building a production system that needs flawless text, perfect faces, and photorealism, you’ll still want to look at larger, higher-precision models. But as a glimpse into what’s possible with extreme quantization, Bonsai Image is both fun and genuinely impressive.
As compression techniques improve, we can expect more models like this—small, fast, and good enough for many creative tasks—to become standard tools in the local AI toolbox.
Comments
No comments yet. Be the first to share your thoughts!