How to run Gemma 4 locally with Ollama and connect it to OpenClaw

25 May 2026 20:37 48,402 views

Local AI models have improved dramatically, and you can now run powerful models like Gemma 4 directly on your own hardware. This guide walks you through choosing the right hardware, installing Ollama, downloading a local model, and wiring it into OpenClaw so you can cut cloud costs while keeping your data private.

Local AI models have taken a big leap forward. In the last couple of months, we’ve seen new open-source models that are fast, capable, and surprisingly easy to run on mid-range hardware. With the right setup, you can run models like Gemma 4 on your own machine, plug them into OpenClaw, and dramatically reduce your cloud AI spend—while keeping your data fully under your control.

When Local Models Make Sense (and When They Don’t)

Running models locally isn’t for everyone. Whether it’s a good fit depends on a few key factors:

Use case: Local models are great for everyday coding help, writing, note-taking, research, and many automation tasks. For extremely complex reasoning or cutting-edge performance, top cloud models (like the latest GPT or Claude versions) are still ahead.

Cost: If you’re making a lot of API calls every month, local models can save you hundreds or even thousands of dollars by eliminating per-token fees.

Privacy & security: If you work with sensitive data and don’t want it to leave your machine or server, local models are a strong option.

Hardware: Local models are demanding. If your hardware is too weak, the experience will be painfully slow and you’ll end up back on cloud APIs anyway. The rest of this guide focuses on getting this part right.

Understand Your Hardware First

Your experience with local models is mostly determined by two things: which model you choose and what hardware you run it on. Before installing anything, you need to know what resources you have.

On Mac: Check Your Unified Memory

If you’re on a relatively recent Mac (M-series, within the last ~5–6 years), the key number is your total memory:

1. Click the Apple icon in the top-left.

2. Choose “About This Mac”.

3. Look for the “Memory” value (e.g., 16 GB, 32 GB).

This unified memory is shared between CPU, GPU, and everything else. Your local model will use a big chunk of it, but not all. As a rough rule of thumb, you typically want the model size to be comfortably below your total memory. For example, with 32 GB of RAM, running a 20 GB model is reasonable; going much higher will slow everything down.

If your Mac is more than 6–7 years old or has very low memory, local models will likely feel sluggish, and cloud models may still be the better experience.

On Windows or Linux: Check Your GPU VRAM

On Windows or Linux (especially desktops), the most important spec is your GPU VRAM, not your system RAM:

VRAM is the dedicated memory on your graphics card (e.g., 8 GB, 12 GB, 24 GB). Local models will try to use as much of this as possible for speed.

For example:

• 24 GB VRAM (e.g., RTX 4090): You can run larger, higher-quality models comfortably.

• 8 GB VRAM: You’ll want smaller models, but they can still be very usable.

In general, the model size you choose should be less than your available VRAM. The closer you get to the limit, the slower and more unstable things can become.

Install Ollama to Run Local Models

Once you understand your hardware limits, the next step is installing Ollama. Ollama is a lightweight tool that lets you download and run local LLMs with a simple command-line interface.

Key points about Ollama:

• It’s free to use.

• It runs models locally on your machine or server.

• It exposes an API that tools like OpenClaw can connect to.

Installing Ollama

1. Go to ollama.com.

2. Copy the install command for your operating system.

3. Open your terminal or command prompt.

4. Paste the command and press Enter.

Even if you already have Ollama installed, it’s a good idea to run the latest install command to update to the newest version—some newer models require it.

After installation, run:

ollama

If you see output (help text, version info, etc.), it’s working. If the command isn’t recognized, close and reopen your terminal and try again.

Running the Ollama Service

OpenClaw needs Ollama to be running as a background service. If OpenClaw can’t reach Ollama, do this:

1. Open a terminal.

2. Run:

ollama serve

Leave this terminal window open; it keeps the Ollama server running. On Mac, you can also launch the Ollama app from Spotlight and look for the llama icon in the menu bar to confirm it’s active.

Choose and Download the Right Gemma 4 Model

With Ollama installed, the next decision is which model to run. The recommendation here is Gemma 4, one of the strongest small local models available right now.

When choosing a variant, you’ll see options like:

• gemma-4-2b

• gemma-4-4b

• gemma-4-26b

• gemma-4-31b

The “b” stands for billions of parameters. More parameters generally mean better performance but also a larger model file and higher hardware requirements.

Match Model Size to Your Hardware

Alongside each model, Ollama shows an approximate size (e.g., 7 GB, 9 GB, 18 GB, 20 GB). Use these as your guide:

• On a 16 GB Mac: stick to the smaller Gemma 4 variants (e.g., 2B or 4B).

• On a 32 GB Mac: you can usually handle the larger variants (e.g., 26B or 31B), though they’ll be slower.

• On an 8 GB VRAM GPU: smaller models are safer.

• On a 24 GB VRAM GPU: you can comfortably run the larger Gemma 4 variants.

Always leave some headroom; if a model is too close to your total memory/VRAM, it may run but feel unusably slow.

Download Gemma 4 with Ollama

Once you’ve picked a size, download it with:

ollama pull gemma-4-4b

(Replace gemma-4-4b with the exact model name you want, such as gemma-4-31b.)

Ollama will download the full model file (e.g., 9 GB, 20 GB). This can take a few minutes depending on your internet speed.

To see which models you’ve installed, run:

ollama list

Quickly Test the Model Locally

Before wiring it into OpenClaw, make sure the model runs:

ollama run gemma-4-4b

Then type a message like:

Hello, how are you?

If you get a quick, coherent response, your setup is working. Type /exit to leave the interactive session.

Connect Ollama to OpenClaw

Now that Gemma 4 is running locally, you can connect it to OpenClaw and use it as the brain for your agents and tools.

If you haven’t explored OpenClaw deeply yet, you may find it helpful to also read this breakdown of the latest OpenClaw update to understand how it orchestrates tools and agents.

Configure Ollama Inside OpenClaw

1. Make sure OpenClaw is installed on the same machine (or VPS) where Ollama is running.

2. In your terminal, run:

openclaw configure

3. In the interactive menu, navigate to the model section using the arrow keys and press Enter.

4. Scroll down until you see Ollama as a provider.

5. Choose the local only option (ignore the cloud Ollama option for this setup).

6. Leave the base URL as the default (usually http://localhost:11434); that’s where Ollama exposes its API.

7. When prompted to select models, choose the Gemma 4 variants you installed (e.g., gemma-4, gemma-4-latest, or the specific size you pulled).

Confirm your selection to enable these models inside OpenClaw.

Restart the OpenClaw Gateway

To apply the changes, restart the OpenClaw gateway:

openclaw gateway restart

After this, OpenClaw should be able to call your local Gemma 4 model through Ollama.

Test the Local Model in OpenClaw

Open the OpenClaw control interface and send a simple message, such as:

Hello, how are you doing? Can you tell me the meaning of life?

If everything is configured correctly, OpenClaw will route the request to your local Gemma 4 model and you should see a response appear quickly.

If you have multiple models connected (cloud + local), you can switch between them using slash commands:

• /models – list available models.

• /model gemma-4 – switch to a specific model.

You can also configure the default model in your OpenClaw configuration (for example, in openclaw.json) so that the local model is used by default, and you only switch to cloud models when needed.

Mix Local and Cloud Models for the Best of Both Worlds

Even with the latest local models, top-tier cloud models like GPT-4.x or Claude Opus still have an edge on very complex reasoning and some niche tasks. A practical setup is:

• Local model (Gemma 4) as default: Used for most everyday tasks, automations, and tool calls—fast and free once running.

• Cloud models as backups: Configure an OpenAI or Anthropic model in OpenClaw for cases where you need extra intelligence or reliability.

You can even define in your OpenClaw configuration (e.g., in agents.md or tools settings) when to use the local model versus when to escalate to a cloud model, based on task type or importance.

If you want to go deeper into turning local models into full-blown agents, including tool calling and MCP integration, check out this practical guide to building powerful local LLM agents with Ollama.

Going Further: Multiple Models and Advanced Configs

Once you’re comfortable with Gemma 4, you can expand your setup:

• Install additional local models (e.g., Gwen, specialized code models, or image/video models) via ollama pull.

• Expose them all to OpenClaw and assign each one to different tasks.

• Tune your OpenClaw configuration files (agents.md, soul.md, etc.) so certain agents prefer certain models.

The key is always the same: balance model size and quality against your hardware limits. Pick the largest model that runs comfortably on your machine without grinding everything to a halt.

With the current generation of local models and tools like Ollama and OpenClaw, you can now build serious, production-grade automations that run entirely on your own hardware—no monthly API bill required.