Code Assistants LLM Models Developer Tools

How to Use Gemma 4 and Ollama for Local AI Coding in VS Code

12 May 2026 23:37 134,226 views

Learn how to turn Visual Studio Code into an AI-powered coding environment using Gemma 4 running locally through Ollama, plus the Continue extension for chat and code edits.

Want AI help with your code, but don’t want to rely only on cloud models or switch to a different editor? With a simple setup, you can use Gemma 4 locally through Ollama and get an AI coding assistant directly inside Visual Studio Code.

This guide walks through installing everything, wiring it up with the Continue extension, and using it to build a small to‑do app in the browser.

Setting Up VS Code for AI Coding

If you don’t already have Visual Studio Code installed, grab it from the official website for your operating system and launch it with the default settings.

You can customize the editor’s look right away—change the color theme, tweak sidebar colors, and adjust borders so the main code area stands out. Themes like Everforest or Solarized Dark work well and make it easier to see pop‑ups and panels from AI extensions.

VS Code has a built-in "agent" panel that’s designed for GitHub Copilot, but it doesn’t work with local models. Instead, you’ll add your own AI integration using an extension.

Running Gemma 4 Locally with Ollama

To use Gemma 4 as your coding assistant, you first need it running locally. That’s where an LLM runner comes in. Popular options include LM Studio, llama.cpp, and Ollama. This guide uses Ollama.

Choosing the Right Gemma 4 Model

On the Ollama website, you’ll find several Gemma 4 variants with different parameter sizes. Larger models (like 26B or 31B) are more capable but require more RAM and GPU power. On a MacBook with an M4 Pro and 24 GB of RAM, the 8B version is a good balance between performance and resource usage.

To download Gemma 4 8B with Ollama, run:

ollama pull gemma4:8b

After the download finishes, check that it’s available:

ollama list

Then quickly test it in chat mode to confirm everything works:

ollama run gemma4:8b

If the model responds normally, you’re ready to connect it to VS Code.

Connecting Gemma 4 to VS Code with Continue

To integrate local models into VS Code, install an extension that can talk to your LLM runner. One of the most popular options is Continue, an open-source AI coding assistant.

In VS Code, open the Extensions tab and search for "Continue". Before installing, pay attention to the trust prompt—extensions can access your code, so it’s worth a moment to confirm you’re comfortable with it. Once installed, Continue adds an AI panel and tools for chat, code edits, and agent-like workflows.

If Ollama is running, Continue should automatically detect your local models. You can compare the models listed in Continue with the output of ollama list to confirm they match.

If your models don’t appear automatically, you can configure them manually in Continue’s model settings. There, you can:

Select a provider (e.g., Ollama, LM Studio, llama.cpp, or cloud providers like Gemini and Anthropic).
Add API keys for paid cloud models if you want to mix local and hosted LLMs.
Point Continue to your local Ollama instance and specify the model name (for example, gemma4:8b).

If you’re interested in going further with local models and tools, you might also like this guide on turning local LLMs into powerful AI agents with Ollama and MCP.

Configuring Tools and Permissions in Continue

Once Gemma 4 is visible in Continue, you can start using it as an AI assistant on your project. In the example project, the goal is to build a simple web-based to‑do app from a markdown file describing a set of tasks.

However, if you just send the first task to the model and nothing happens—no new files, no edits—it’s often due to tool permissions.

Key Tool Settings to Check

Open Continue’s tool settings and review how it’s allowed to interact with your codebase:

Read files: Set to automatic so the model can inspect your project without asking each time.
Create new files: By default, this may require confirmation. If you’re working in a safe test repo, you can switch it to automatic so the model can generate files like index.html on its own.
Read current file: Also safe to set to automatic.
Edit current file / Find and replace: You can allow these automatically if you’re comfortable with the model modifying code directly. Otherwise, keep them on confirmation for more control.
Run terminal commands: This should usually stay on ask for permission to avoid accidental or unsafe commands.

Even with these settings, Continue may still ask you to confirm its "plan" before executing a sequence of actions. If the model seems to stall, check for a pending plan approval in the UI and confirm it.

Building a Simple To‑Do App with Gemma 4

With everything configured, you can now use Gemma 4 inside VS Code to work through the tasks in your markdown file.

Task 1: Generate the Initial HTML Page

The first task is to create a basic HTML page that displays a list of tasks. You can paste the task description into the Continue chat and ask the model to implement it.

After approving its plan, Gemma 4 will create a new HTML file in your project. Open it in the browser to verify that the task list is rendered as expected.

Task 2: Add an Input Form

The next task is to add a form so you can create new to‑do items directly from the page. Again, send the task description to the model.

Continue will likely ask you to confirm its plan and then request permission to modify the existing HTML file. Once approved, Gemma 4 updates the page to include an input field and a button for adding items.

Reload the page in your browser and test the form. You should be able to type a new task, submit it, and see it appear in the list without errors.

At this point, it’s clear that the setup works: Gemma 4 can read files, create new ones, and edit existing code directly from within VS Code.

Performance and When to Use Local Models

Running Gemma 4 8B locally on a modern laptop delivers surprisingly solid performance for focused, well-defined coding tasks. It’s fast enough to generate HTML, wire up simple JavaScript, and iterate on small features without feeling sluggish.

However, local models still shouldn’t be seen as a complete replacement for top-tier paid models. They’re best used as a complement:

Great for privacy-sensitive projects where you don’t want code leaving your machine.
Ideal for quick experiments, small utilities, or offline work.
Less suited for very large, complex codebases or highly nuanced reasoning compared to the strongest cloud models.

If you’re curious how local tools compare with cloud-based coding assistants, you might find this deep dive on Claude Code vs Google Antigravity helpful.

With VS Code, Ollama, Gemma 4, and the Continue extension, you get a flexible setup: a familiar editor, a local LLM you control, and an AI assistant that can read, write, and refactor your code directly in your project.