How to run Claude Code locally with Qwen 3.5 and Ollama

30 May 2026 08:37 23,848 views

Learn how to connect Claude Code to a fully local AI model using Qwen 3.5 and Ollama. This guide walks through installation, configuration, and boosting context length so you can use Claude-style coding agents without burning through cloud credits.

Want Claude-style coding assistance without burning through expensive API credits? You can actually run Claude Code on top of a fully local open-source model using Ollama and Qwen 3.5. That means you get an AI coding agent that can clone repos, read your project, and help you build features, all powered by a model running on your own machine.

Why Run Claude Code on a Local Model?

Claude Code is powerful, but its usage can get expensive fast, especially if you’re on a limited plan and working with large projects. Every long session, repo analysis, or refactor eats into your monthly quota.

By pairing Claude Code with a local model like Qwen 3.5 via Ollama, you can:

Offload many coding tasks to a free, local LLM
Keep your cloud credits for the hardest problems
Work offline or on machines where you don’t want to send code to the cloud
Experiment with different open-source models as your AI agent’s “brain”

If you like this kind of setup, you may also want to check out how to turn local LLMs into powerful AI agents with Ollama and MCP.

What You Need Before You Start

Here’s the basic setup used in this walkthrough:

Windows machine with around 16 GB RAM
VS Code installed
Claude Code CLI installed
Ollama installed
Qwen 3.5 model downloaded via Ollama

You can adapt the same approach to other platforms and models, but we’ll focus on Qwen 3.5 9B as a good balance between performance and resource usage for a 16 GB RAM system.

Step 1: Install and Configure Claude Code

Start by opening the folder of the project you want to work on in VS Code, then open a terminal (PowerShell works well on Windows).

From there:

Install Claude Code using the command provided in the docs.ollama.com Claude integration page.
After installation, note the path where claude.exe is placed, typically something like C:\Users\<username>\.local\bin.
Add this path to your system’s Environment Variables (User PATH):
- Open Environment Variables
- Edit the PATH for your user
- Add the ...\.local\bin path
- Move it near the top so it’s picked up reliably

If this step is skipped, Claude Code may fail to start because it can’t find the executable.

Step 2: Install Ollama and Download Qwen 3.5

Next, you need Ollama to run local models.

Install Ollama using the official command from docs.ollama.com (run it in PowerShell).
Once installed, run ollama in the terminal to confirm it’s working.
List available models with ollama list. At this point, you may see some cloud-only Claude models if you’ve used Ollama Cloud before, but those require a paid plan.

Pull the Qwen 3.5 9B model locally:

ollama pull qwen2.5-coder:9b  (or the exact Qwen 3.5 9B name from the docs)

This will download a ~6–7 GB model file.

Confirm it’s available with ollama list. You should see something like qwen-3.5:9b (exact name depends on the model page).
Test the model with:
```
ollama run qwen-3.5:9b
```
Ask a simple question (e.g., 1 + 1?) to make sure it responds.

If you have duplicate or older variants of the model, you can remove them with ollama remove <model-name> to save disk space.

Step 3: Launch Claude Code with Your Local Qwen Model

Now you can wire Claude Code to use Qwen 3.5 as its backend model via Ollama.

In your project folder terminal, run:
```
ollama launch claude --model qwen-3.5:9b
```
Replace qwen-3.5:9b with the exact name shown in ollama list.
Claude Code will start and ask for a quick safety check: confirm that you trust the current project folder.
Once running, Claude Code will show the available models. You’ll see entries like custom Sonnet, Opus, and Haiku models, all mapped to your Qwen 3.5 9B backend by default.
Send a simple message like “Hi, how are you?” and wait for the response. On a 16 GB machine, the first reply might take tens of seconds, especially if screen recording or other heavy apps are running.

At this point, you effectively have a Claude-style coding agent powered entirely by a local Qwen model through Ollama.

Step 4: Check Resource Usage and Context Limits

Ollama lets you inspect what’s running under the hood.

Run ollama ps to see active models. You should see your Qwen 3.5 9B instance with details like:
- Model size (around 9–10 GB)
- CPU and GPU usage
- Context length (e.g., 16384 tokens)

Even though Qwen 3.5 supports very large context windows (up to 256K tokens), Ollama’s default for the model on your machine may be much lower (like 16K). For serious repo work, that can be limiting.

Step 5: Increase Qwen’s Context Window with a Custom Model File

To give Claude Code more room to think about large projects, you can create a custom variant of the Qwen model with a bigger context window.

Create a Model File

Create a new text file named something like ModelFile (no extension).
Inside it, define the base model and parameters, for example:
```
from qwen-3.5:9b

parameter num_ctx 65536
```
This tells Ollama: “Create a new model based on Qwen 3.5 9B, but with a 65,536-token context window.”
Use the Ollama model file docs as a reference if you want to tweak more parameters.

Create the New 64K Model

In the same folder as your ModelFile, run:
```
ollama create qwen-3.5-9b-64k -f ModelFile
```
This creates a new model named qwen-3.5-9b-64k that internally copies the base model but with the larger context.
Check it with ollama list. You should now see both the original Qwen 3.5 9B and the new 64K variant.
If disk space is tight, you can remove the original model after confirming the new one works, but keeping both gives you flexibility.

Run Claude Code with the 64K Context Model

Now point Claude Code to the new model:

ollama launch claude --model qwen-3.5-9b-64k

Claude Code will now use the 64K context variant, giving it far more room to:

Read large repos
Track long conversations
Handle multi-file refactors and deep analysis

Using the Local Agent on Real Projects

With everything wired up, you can treat this setup like a full AI coding agent.

For example, you can ask it to:

Clone a GitHub repo into your project folder
Summarize the architecture and key components
List system requirements (RAM, VRAM, GPU type) to run the project
Suggest whether your local machine is sufficient or if you should use a GPU VPS

In one test, the agent cloned a video object removal model repo, analyzed it, and correctly concluded that the local 8 GB GPU wasn’t enough and that something like a 40 GB VRAM GPU (e.g., A100) would be required for inference. It also outlined installation steps (pip install requirements, download weights, run notebooks, etc.).

This is exactly the kind of deep, repo-level assistance that benefits from a larger context window and a persistent local model.

Other Models and Cloud Options

Qwen 3.5 9B is just one good choice. Ollama’s docs also recommend other models that work well as coding agents, such as:

Qwen 3.5 (Cloud) – a hosted version on Ollama Cloud (requires a paid subscription for sustained use)
GLM 4.7 Flash – another capable open-source model you can run locally or on a GPU VPS

You can also deploy these models on GPU VPS providers like RunPod, run Ollama there, and then connect Claude Code to that remote Ollama instance instead of your local machine.

If you’re interested in more local coding setups, take a look at this guide to a local AI coding setup with VS Code and Gemma as another option.

Wrapping Up

By combining Claude Code with Ollama and Qwen 3.5, you can turn your machine into a powerful, mostly free AI coding environment:

Claude Code provides the agent-like workflow (repo cloning, file edits, explanations)
Ollama hosts the local LLM backend
Qwen 3.5 (with a boosted context window) gives you enough capacity to handle real-world projects

You’ll still need at least a Claude Pro plan to use Claude Code itself, but once that’s in place, most of the heavy lifting can be done by your local model instead of expensive cloud tokens.

Comments

Gregory Morris 12h ago

I'm concerned about the legality of using Claude Code with a local model. The Claude Code license might prohibit using it with non-Anthropic models. Anyone looked into this?

Nicole Adams 2d ago

Does this work with other Claude Code features like the diff view and interactive edit? I find those really useful but I'm not sure if they rely on cloud.

Thomas Brown Jul 11, 2026

Actually, tool use is handled locally too. Claude Code sends the request to the local model and the model decides which tool to call. The actual execution of tool commands happens on your machine. I confirmed by checking the Ollama logs.

Lori Hill Jul 10, 2026

I tested 128K on Qwen 3.5 and it works surprisingly well. The model maintains coherence for most of the window, though responses near the end of the context can be slightly less accurate. 64K is safer for high quality.

Scott Reed Jun 30, 2026

I wish the article included troubleshooting for the 'model not found' error. I had to pull the model with the exact tag from the Ollama library, not just the generic name.