The best free AI for fast 4K image upscaling on your PC

16 Jun 2026 03:07 58,759 views

Nvidia’s new Pixel Diffusion (PID) model can turn low-res photos into ultra-sharp 4K images in just a few seconds. Here’s how it works, why it beats older upscalers, and how to run it locally with ComfyUI for free.

High-resolution images used to mean long render times, heavy models, and a lot of GPU memory. Nvidia’s new open-source Pixel Diffusion (PID) model changes that. It can upscale images to 4K with sharp, realistic detail in just a few seconds, and you can run it locally for free.

What is Pixel Diffusion (PID)?

Pixel Diffusion, or PID, is Nvidia’s new state-of-the-art image upscaling model. It’s designed to take a lower-resolution image (like 512px or 1024px) and turn it into a crisp 2K or 4K image with far more detail and fewer artifacts than traditional upscalers.

Unlike many older methods, PID is both lightweight and fast. On a modern Nvidia GPU, it can upscale an image in under a second, and even on more modest hardware it typically finishes in just a few seconds.

How PID is different from traditional upscalers

Most modern image generators work in a compressed “latent space.” They generate a low-dimensional representation of the image, then decode that latent into pixels you can see. Upscalers often work by taking that latent or the final image and applying another model to increase resolution.

PID is called Pixel Diffusion because it operates directly in pixel space during the denoising step. Instead of decoding to pixels and then trying to fix things afterward, PID uses the high-resolution pixel grid as part of the generation process itself, guided by the latent from a base model like Flux, Z-image, or Stable Diffusion 3.

The result is sharper 4K images with fewer strange artifacts, better edges, and more consistent textures compared to many traditional super-resolution methods.

Quality comparison: PID vs other upscalers

PID has been benchmarked against one of the previous leading upscalers, Stable Diffusion’s SDXL-based method known as SeedVR2. In side-by-side comparisons:

Sharpness and detail: PID produces noticeably crisper edges on objects like buildings, windows, and character outlines. Fine textures such as fur, hair, and fabric patterns look more natural and less smudged.
Consistency: When you zoom into different areas of a PID-upscaled image, details remain coherent across the whole frame. SeedVR2 often becomes blurry or inconsistent in small regions.
Faithfulness: PID tends to respect the original image more closely. For example, plush toy textures and materials look realistic, while SeedVR2 sometimes “hallucinates” new details that don’t quite match the source.

On top of that, PID is significantly faster—up to about 5.9x faster than SeedVR2 in latency tests—while still delivering better or comparable visual quality in the vast majority of cases.

What you can do with PID

PID is especially useful if you care about high-resolution, production-ready images. Some practical use cases include:

Sharpening photos: Take a slightly soft or low-res photo (like a tiger, a cityscape, or a portrait) and turn it into a detailed 4K image with clear fur, facial features, and building edges.
Upscaling AI art: Generate a 1K image using a top-tier model such as Z-image, Flux 2, or Ernie Image, then run it through PID to get a clean 4K version suitable for prints or large displays. If you’re exploring local image generation, you may also want to look at how to run Ernie Image locally.
Improving night scenes and landscapes: PID handles fine details in stars, trees, and distant buildings very well, making it great for scenic shots and astrophotography-style images.

Core PID model variants explained

The PID model zoo can look confusing at first, but it’s organized around two main ideas: which base model it’s compatible with, and what resolution jump it performs.

Latent types (base model compatibility)

PID offers different variants depending on the base image generator you’re using:

Flux 1-based PID: Use this if your base image is generated by a Flux 1 model (for example, Z-image, which is built on Flux 1).
Flux 2-based PID: Use these if your base images come from Flux 2 or models derived from it.
Stable Diffusion 3-based PID: Use these when your base images are generated by SD3.

If you’re simply uploading an existing image (not generated by one of these models), the latent type is less critical, but it’s still recommended to pick one that matches the model you plan to pair PID with in your workflows.

Resolution jumps and precision formats

Each latent type has variants for different input and output resolutions:

512 → 2K: For smaller inputs that you want to upscale to around 2048px on the long side.
1K → 4K: For 1024px inputs that you want to turn into full 4K (4096px) images.

On top of that, each model comes in different precision formats:

BF16 / FP16: Full-quality models, typically around 2.6–2.7 GB. These run on most modern Nvidia GPUs and are a safe default.
MXFP8: Highly compressed (around 1.5 GB), designed for the latest Nvidia 50-series GPUs and Blackwell architecture. Faster and lighter, but with some quality trade-offs and stricter hardware requirements.

If you’re unsure, choose the BF16/FP16 1K → 4K model that matches your base generator (for example, the Flux 1 1K → 4K BF16 model for Z-image).

Running PID locally with ComfyUI

PID is designed to run well in ComfyUI, one of the most popular node-based interfaces for local image and video generation. ComfyUI lets you build visual workflows where you plug models and operations together like blocks.

To use PID, you’ll need a working ComfyUI setup on your machine. If you’re interested in broader workflows like video enhancement, you might also find it useful to explore tools that can upscale video to 4K, such as the one covered in this AI video enhancer guide.

Step 1: Update ComfyUI

Before loading any PID workflows, make sure ComfyUI is up to date. In the portable version, you can usually do this by running the update_comfyui.bat file in the update folder. Once it finishes and prompts you to press any key, restart ComfyUI.

Step 2: Load the PID upscaling workflow

There are three main workflows you can use with PID:

Workflow 02 – Image to 2K/4K: Upload an existing image and upscale it directly with PID. This is the most generally useful workflow.
Workflow 03 – Generate then upscale: Use a top-tier model like Z-image or Flux 2 to generate a 1K image, then feed it into PID for 4K upscaling.
Workflow 01 – PID text-to-image: Use PID itself as a text-to-image model at 1K resolution. It’s very fast but not as strong in quality as the best dedicated generators.

After downloading a workflow JSON file, simply drag and drop it onto the ComfyUI canvas to load it.

Workflow 1: Upscale an existing image to 4K

This is the most straightforward way to use PID if you already have an image you like and just want it sharper and bigger.

Required models

For the image-to-4K workflow, you’ll typically need:

Gemma 2 2B text encoder: A text encoder used for conditioning the upscaling process. You can choose the full FP16 (~5.2 GB) or the smaller FP8 (~2.6 GB) version. Place it in ComfyUI/models/text_encoders.
PID upscaler model: For example, the Flux 1 1K → 4K BF16 model (~2.7 GB). Place it in ComfyUI/models/diffusion_models.
VAE (AE.safetensors): A VAE commonly used by Z-image, around 335 MB. Place it in ComfyUI/models/VAE.

After copying each file into the correct folder, press R in ComfyUI to refresh the model lists and select them from the dropdowns in the workflow nodes.

Configuring the upscaler

Once the models are loaded and no nodes are outlined in red:

Upload your image: Use an image that’s around 1024px on its longest side if you’re using a 1K → 4K PID model.
Set output resolution: For 4K width, set width = 4096. To keep the same aspect ratio, multiply the original height by 4 (for example, 683 → 2732).
Prompt: Add a short descriptive prompt that matches your image (for example, “a scenic lakeside town at sunset”). This helps guide the upscaler to enhance the right details.

Then hit Run. On a decent GPU, PID should upscale your image in under 10 seconds, often in just a few seconds.

Optional: Side-by-side comparison

If you want to visually compare before and after:

Double-click the canvas and search for an “image comparer” node (for example, the one by RJ3).
Connect the original input image to one side and the PID output to the other.

This makes it easy to see how much extra detail PID adds when you zoom into areas like buildings, trees, hair, or stars in the night sky.

Workflow 2: Generate with Z-image or Flux, then upscale with PID

If you don’t already have an image, you can use a top-tier generator first and then let PID handle the high-resolution step. This gives you the best of both worlds: strong composition and style from the base model, plus ultra-sharp 4K detail from PID.

Models needed for the generation + upscaling workflow

For a Z-image + PID workflow, you’ll typically need:

Z-image Turbo diffusion model: Available in BF16 (~12 GB) or a more compressed FP4 version. Place it in ComfyUI/models/diffusion_models.
Qwen 3 4B text encoder: Used as the CLIP/text encoder for Z-image. Available in full, FP4, or FP8 variants. Place it in ComfyUI/models/text_encoders.
Gemma 2 2B text encoder: For PID, as before.
PID Flux 1 1K → 4K model: Since Z-image uses Flux 1 latents, pick the Flux 1-compatible PID variant.
AE.safetensors VAE: For decoding and previewing the intermediate image.

After placing all models in their respective folders, press R in ComfyUI and select them in the workflow nodes.

How the workflow is structured

This combined workflow is usually split into two sections:

Top section – Base generation: Z-image (or Flux 2 / SD3) generates a 1K image from your text prompt.
Bottom section – PID upscaling: The generated 1K image is passed into PID, which outputs a 4K version.

Key settings include:

Prompt: Describe your scene clearly (for example, “a red panda standing on its hind legs in a misty forest with a waterfall behind it and a black bird on its head”).
Base resolution: Set the longest side to 1024 so it matches the 1K → 4K PID model.
Batch size: How many images you want to generate at once.
Sampler settings: For Z-image Turbo, 7–9 steps is usually a good balance. CFG around 1 works well, and you can leave sampler/scheduler at their default recommended values.

When you run the workflow, you’ll see the 1K image preview first (if you’ve enabled those nodes), followed by the 4K PID result. The difference in fur, foliage, and background detail is usually dramatic.

Workflow 3: Using PID as a standalone text-to-image model

PID also includes a tiny text-to-image model that can generate 1K images directly. It’s extremely fast and lightweight (around 2.6 GB for the BF16 version), making it easy to run even on more modest GPUs.

Setup and usage

To use PID’s text-to-image mode, you’ll need:

PID text-to-image diffusion model: The dedicated 1K text-to-image model, in BF16 or MXFP8. Place it in ComfyUI/models/diffusion_models.
Gemma 2 2B text encoder: As in the other workflows.

Load the workflow JSON, refresh models with R, and select the correct diffusion model and Gemma encoder in the dropdowns.

Then you can:

Enter a prompt (for example, “a leopard hiding in the jungle”).
Set width and height up to 1024px.
Choose batch size and sampling steps.

PID will usually generate a 1K image in under 10 seconds, often much faster. However, the visual quality isn’t yet on par with top models like Z-image, Flux 2, or Ernie Image—it can look a bit plasticky and lacks some of the nuance and realism of those dedicated generators.

When to use each PID workflow

To get the most out of PID, it helps to pick the right workflow for your goal:

You already have an image you like: Use the image-to-2K/4K workflow. This is ideal for sharpening photos, artwork, or AI images from any source.
You want the best possible AI art at 4K: Use a strong base model (Z-image, Flux 2, Ernie Image, etc.) to generate a 1K image, then feed it into PID with the matching latent type. This gives you top-tier composition plus ultra-sharp detail.
You need something fast and light: Use PID’s own 1K text-to-image model when speed and low VRAM usage matter more than absolute quality.

Why PID is worth adding to your toolkit

Nvidia’s Pixel Diffusion is currently one of the best options for local, high-resolution upscaling:

Fast: Upscales in seconds, often under a second on strong GPUs.
Lightweight: Models are relatively small (around 2–3 GB) compared to many large diffusion models.
High quality: Sharper, more consistent, and more faithful than many older upscalers like SeedVR2.
Flexible: Works as a drop-in upscaler for popular generators like Flux, Z-image, and Stable Diffusion 3, and can also be used directly from a prompt.

If you care about print-ready images, detailed concept art, or just want your AI creations to look great on 4K screens, PID is an excellent tool to integrate into your ComfyUI workflows.