How to run Claude Code locally with Qwen 3.5 and Ollama
Want Claude-style coding assistance without burning through expensive API credits? You can actually run Claude Code on top of a fully local open-source model using Ollama and Qwen 3.5. That means you get an AI coding agent that can clone repos, read your project, and help you build features, all powered by a model running on your own machine.
Why Run Claude Code on a Local Model?
Claude Code is powerful, but its usage can get expensive fast, especially if you’re on a limited plan and working with large projects. Every long session, repo analysis, or refactor eats into your monthly quota.
By pairing Claude Code with a local model like Qwen 3.5 via Ollama, you can:
- Offload many coding tasks to a free, local LLM
- Keep your cloud credits for the hardest problems
- Work offline or on machines where you don’t want to send code to the cloud
- Experiment with different open-source models as your AI agent’s “brain”
If you like this kind of setup, you may also want to check out how to turn local LLMs into powerful AI agents with Ollama and MCP.
What You Need Before You Start
Here’s the basic setup used in this walkthrough:
- Windows machine with around 16 GB RAM
- VS Code installed
- Claude Code CLI installed
- Ollama installed
- Qwen 3.5 model downloaded via Ollama
You can adapt the same approach to other platforms and models, but we’ll focus on Qwen 3.5 9B as a good balance between performance and resource usage for a 16 GB RAM system.
Step 1: Install and Configure Claude Code
Start by opening the folder of the project you want to work on in VS Code, then open a terminal (PowerShell works well on Windows).
From there:
- Install Claude Code using the command provided in the
docs.ollama.comClaude integration page. - After installation, note the path where
claude.exeis placed, typically something likeC:\Users\<username>\.local\bin. - Add this path to your system’s Environment Variables (User PATH):
- Open Environment Variables
- Edit the PATH for your user
- Add the
...\.local\binpath - Move it near the top so it’s picked up reliably
If this step is skipped, Claude Code may fail to start because it can’t find the executable.
Step 2: Install Ollama and Download Qwen 3.5
Next, you need Ollama to run local models.
- Install Ollama using the official command from
docs.ollama.com(run it in PowerShell). - Once installed, run
ollamain the terminal to confirm it’s working. - List available models with
ollama list. At this point, you may see some cloud-only Claude models if you’ve used Ollama Cloud before, but those require a paid plan. - Pull the Qwen 3.5 9B model locally:
This will download a ~6–7 GB model file.ollama pull qwen2.5-coder:9b (or the exact Qwen 3.5 9B name from the docs) - Confirm it’s available with
ollama list. You should see something likeqwen-3.5:9b(exact name depends on the model page). - Test the model with:
Ask a simple question (e.g.,ollama run qwen-3.5:9b1 + 1?) to make sure it responds.
If you have duplicate or older variants of the model, you can remove them with ollama remove <model-name> to save disk space.
Step 3: Launch Claude Code with Your Local Qwen Model
Now you can wire Claude Code to use Qwen 3.5 as its backend model via Ollama.
- In your project folder terminal, run:
Replaceollama launch claude --model qwen-3.5:9bqwen-3.5:9bwith the exact name shown inollama list. - Claude Code will start and ask for a quick safety check: confirm that you trust the current project folder.
- Once running, Claude Code will show the available models. You’ll see entries like custom Sonnet, Opus, and Haiku models, all mapped to your Qwen 3.5 9B backend by default.
- Send a simple message like “Hi, how are you?” and wait for the response. On a 16 GB machine, the first reply might take tens of seconds, especially if screen recording or other heavy apps are running.
At this point, you effectively have a Claude-style coding agent powered entirely by a local Qwen model through Ollama.
Step 4: Check Resource Usage and Context Limits
Ollama lets you inspect what’s running under the hood.
- Run
ollama psto see active models. You should see your Qwen 3.5 9B instance with details like:- Model size (around 9–10 GB)
- CPU and GPU usage
- Context length (e.g.,
16384tokens)
Even though Qwen 3.5 supports very large context windows (up to 256K tokens), Ollama’s default for the model on your machine may be much lower (like 16K). For serious repo work, that can be limiting.
Step 5: Increase Qwen’s Context Window with a Custom Model File
To give Claude Code more room to think about large projects, you can create a custom variant of the Qwen model with a bigger context window.
Create a Model File
- Create a new text file named something like
ModelFile(no extension). - Inside it, define the base model and parameters, for example:
This tells Ollama: “Create a new model based on Qwen 3.5 9B, but with a 65,536-token context window.”from qwen-3.5:9b parameter num_ctx 65536 - Use the Ollama model file docs as a reference if you want to tweak more parameters.
Create the New 64K Model
- In the same folder as your
ModelFile, run:
This creates a new model namedollama create qwen-3.5-9b-64k -f ModelFileqwen-3.5-9b-64kthat internally copies the base model but with the larger context. - Check it with
ollama list. You should now see both the original Qwen 3.5 9B and the new 64K variant. - If disk space is tight, you can remove the original model after confirming the new one works, but keeping both gives you flexibility.
Run Claude Code with the 64K Context Model
Now point Claude Code to the new model:
ollama launch claude --model qwen-3.5-9b-64k
Claude Code will now use the 64K context variant, giving it far more room to:
- Read large repos
- Track long conversations
- Handle multi-file refactors and deep analysis
Using the Local Agent on Real Projects
With everything wired up, you can treat this setup like a full AI coding agent.
For example, you can ask it to:
- Clone a GitHub repo into your project folder
- Summarize the architecture and key components
- List system requirements (RAM, VRAM, GPU type) to run the project
- Suggest whether your local machine is sufficient or if you should use a GPU VPS
In one test, the agent cloned a video object removal model repo, analyzed it, and correctly concluded that the local 8 GB GPU wasn’t enough and that something like a 40 GB VRAM GPU (e.g., A100) would be required for inference. It also outlined installation steps (pip install requirements, download weights, run notebooks, etc.).
This is exactly the kind of deep, repo-level assistance that benefits from a larger context window and a persistent local model.
Other Models and Cloud Options
Qwen 3.5 9B is just one good choice. Ollama’s docs also recommend other models that work well as coding agents, such as:
- Qwen 3.5 (Cloud) – a hosted version on Ollama Cloud (requires a paid subscription for sustained use)
- GLM 4.7 Flash – another capable open-source model you can run locally or on a GPU VPS
You can also deploy these models on GPU VPS providers like RunPod, run Ollama there, and then connect Claude Code to that remote Ollama instance instead of your local machine.
If you’re interested in more local coding setups, take a look at this guide to a local AI coding setup with VS Code and Gemma as another option.
Wrapping Up
By combining Claude Code with Ollama and Qwen 3.5, you can turn your machine into a powerful, mostly free AI coding environment:
- Claude Code provides the agent-like workflow (repo cloning, file edits, explanations)
- Ollama hosts the local LLM backend
- Qwen 3.5 (with a boosted context window) gives you enough capacity to handle real-world projects
You’ll still need at least a Claude Pro plan to use Claude Code itself, but once that’s in place, most of the heavy lifting can be done by your local model instead of expensive cloud tokens.
Comments