Free GPU + your own AI image API in 15 minutes (no credit card needed)

26 May 2026 14:37 36,942 views

Learn how to run Stable Diffusion XL on a free Google Colab GPU and expose it as a real REST API using FastAPI and ngrok. No paid APIs, no local GPU, just your browser.

Most AI tutorials show you how to call someone else’s API. This guide shows you how to run the model yourself — on a free GPU — and turn it into your own REST API that any app can call.

In about 15 minutes, you can have Stable Diffusion XL running on Google’s hardware, with a browser UI and a public endpoint you control. No API keys, no subscription, and no GPU of your own.

What You’ll Build

By the end, you’ll have:

• Stable Diffusion XL (SDXL) running on a free Nvidia T4 GPU in Google Colab
• A simple browser interface to generate 1024×1024 images from text prompts
• A production-style REST API built with FastAPI
• A public URL (via ngrok) that any frontend or backend can call

This pattern works not just for image generation, but for almost any model on Hugging Face: text, speech, OCR, translation, and more.

Key Tools: Hugging Face, Colab, and ngrok

To make this work, you’ll combine three main pieces: Hugging Face, Google Colab, and ngrok.

Hugging Face: The “GitHub of AI Models”

Hugging Face hosts millions of AI models: image generators, language models, speech recognition, and more. Most are free to download and run.

Two parts matter here:

• Models: For this guide, you’ll use Stable Diffusion XL base 1 by Stability AI, a powerful text-to-image model that fits into the free Colab GPU.
• Spaces: Hosted demo apps built on top of models, usually using Gradio or Streamlit. They’re great for quick tests in the browser, but not ideal as a backend API.

Why not just use Spaces as your API?

• Free Spaces run on shared CPU, so they’re slow and can take over a minute per request.
• They can sleep or restart at any time, so you get no uptime guarantees.
• Many Spaces are just UIs wrapping someone else’s model, not the model itself.

That’s why you’ll instead pull the model into your own Colab runtime and expose it yourself.

Google Colab: Free GPU in Your Browser

Google Colab is a hosted Jupyter notebook environment where you write Python in your browser and run it on Google’s servers.

The important part: the free tier can give you access to an Nvidia T4 GPU with 15 GB VRAM, which is enough to run SDXL.

In Colab you will:

• Authenticate with Hugging Face to download models faster
• Load the SDXL pipeline into GPU memory
• Test image generation from a prompt
• Spin up a FastAPI server inside the notebook

ngrok: Make Your Local API Public

ngrok creates a secure tunnel from your Colab machine to the internet and gives you a public HTTPS URL.

That means your FastAPI server running inside Colab becomes callable from anywhere with a single POST request. You can hit it from a React app, mobile app, backend service — anything that can make HTTP calls.

Step-by-Step: Run SDXL on a Free Colab GPU

The heavy lifting is already done in a prepared Colab notebook. You just plug in your tokens, choose the right runtime, and run the cells.

1. Get Your Tokens (Hugging Face + ngrok)

Hugging Face token

• Go to your Hugging Face settings and create a new access token.
• In Colab, open the notebook’s Secrets section and add the token as a secret (e.g., HF_TOKEN).
• The token stays hidden and won’t be exposed if you share the notebook.

ngrok token

• Sign up for a free ngrok account.
• Copy your auth token from the ngrok dashboard.
• Add it as another secret in the Colab notebook (e.g., NGROK_TOKEN).

2. Switch Colab to a T4 GPU

By default, Colab often starts on CPU, which is too slow for SDXL.

• In Colab, go to Runtime → Change runtime type.
• Set Hardware accelerator to GPU and choose T4 GPU if available.
• Save the settings.

3. Run the Notebook and Load the Model

Once your runtime is set and tokens are configured:

• Click Run all or execute the cells one by one.
• The notebook will authenticate with Hugging Face and start downloading SDXL (around 7+ GB).
• You can monitor RAM, disk, and GPU usage from the Colab sidebar.

After the download, the notebook loads the model weights into GPU memory and builds the diffusion pipeline. This step takes a bit of time but only needs to happen once per runtime.

4. Test Image Generation in the Browser UI

Once the model is loaded, the notebook spins up a Gradio UI and prints a public URL.

Open that URL and you’ll see a simple interface where you can:

• Enter a text prompt
• Adjust parameters like guidance scale
• Generate 1024×1024 images

You can:

• Copy the seed from a result to reproduce or slightly tweak an image by changing the prompt.
• Set the seed to -1 (or clear it) to get a completely new random image.
• Increase the guidance scale to make the image follow your prompt more closely (e.g., from 7.5 to 8.5).

All of this runs entirely on Google’s GPU, not your machine, and doesn’t cost you anything.

Turn the Model into a REST API

The real power for developers is exposing this model as a proper REST API. The notebook includes a section that does exactly that using FastAPI and ngrok.

1. Start a FastAPI Server in Colab

Inside the notebook, a FastAPI app is defined that:

• Accepts a JSON payload with your prompt and optional parameters
• Runs inference through the SDXL pipeline
• Returns the generated image as a base64-encoded PNG

When you run this cell, FastAPI starts listening on a local port inside the Colab environment.

2. Expose It with ngrok

Next, ngrok is launched and pointed at the FastAPI port. It gives you a public HTTPS URL, like:

https://<random-subdomain>.ngrok-free.app

The notebook also exposes a Swagger UI (OpenAPI docs) where you can:

• See the available endpoints
• Click Try it out
• Paste your prompt and send a request
• Get back a base64 string representing the generated image

3. Call the API from Your Own Project

From here, integrating the model into your app is straightforward:

• Make a POST request to the ngrok URL from your React, Vue, or mobile app.
• Receive a JSON response with a base64-encoded PNG.
• Convert that base64 string to an image and display it in an <img> tag or save it on the server.

The notebook even includes a minimal React example showing how to call the endpoint in about ten lines of code.

Benefits of this approach:

• No per-image cost — you’re not paying an external API.
• No API key rotation — you control the endpoint.
• Customizable — you can change parameters, add authentication, logging, rate limits, or swap out the model entirely.

Apply the Same Pattern to Any Hugging Face Model

What you’ve built is a reusable pattern:

1. Load a Hugging Face model in Colab.
2. Wrap it with FastAPI.
3. Expose it via ngrok.
4. Call it from any app.

This works for:

• Text generation (LLMs)
• Speech synthesis and recognition
• OCR and document parsing
• Translation
• Object detection and more

If you enjoy building your own AI backends, you might also like guides such as building local AI agents on your own hardware or running your own local AI video generator.

Next Steps and Custom Experiments

You now have a 3.5B-parameter image model running on a free GPU, with both a browser UI and a public REST API. From here, you can:

• Swap SDXL for another image model on Hugging Face.
• Replace it with a text, audio, or multimodal model.
• Add authentication, logging, or quotas to your FastAPI server.
• Build a full frontend product on top of your own AI backend.

As a more advanced step, you can fine-tune SDXL on a custom dataset so it learns a specific style or character, then expose that fine-tuned model through the exact same API pattern.

The core idea stays the same: don’t just use AI — run it yourself, control it, and integrate it directly into your projects without paying per request.

Tags: AI API Stable Diffusion Google Colab

Comments

David Kim Jul 13, 2026

I tried running this on my own GPU (RTX 3060) instead of Colab. The same code works locally with some modifications (no ngrok). For those who have a personal GPU, you can skip Colab entirely. But for those without, this is a great alternative.

William Nelson Jul 10, 2026

I'm a frontend developer and this is a dream come true. I can now prototype AI features without backend dependencies. The React example was spot on. I modified it to show a loading spinner while the image is being generated. One thing: make sure to handle errors when the model is not loaded yet (returns 503).

Emily Clark Jul 8, 2026

As a mobile developer, this is a game changer for me. I can now test image generation features without waiting for backend team support. The React example was helpful even though I use Flutter – the HTTP call is similar. One thing: make sure to handle the case where Colab disconnects; I added a retry logic in my app.

James Miller Jun 28, 2026

I'm a backend developer and I love this pattern. It's like having your own mini GPU server. I'm planning to use it for batch processing images: I send multiple prompts in a single request and loop through them. However, the free Colab GPU has memory limits – I could only run one generation at a time. Might need to optimize.