How to run Claude Code locally with Qwen 3.5 and Ollama

30 May 2026 08:37 23,806 views
Learn how to connect Claude Code to a fully local AI model using Qwen 3.5 and Ollama. This guide walks through installation, configuration, and boosting context length so you can use Claude-style coding agents without burning through cloud credits.

Want Claude-style coding assistance without burning through expensive API credits? You can actually run Claude Code on top of a fully local open-source model using Ollama and Qwen 3.5. That means you get an AI coding agent that can clone repos, read your project, and help you build features, all powered by a model running on your own machine.

Why Run Claude Code on a Local Model?

Claude Code is powerful, but its usage can get expensive fast, especially if you’re on a limited plan and working with large projects. Every long session, repo analysis, or refactor eats into your monthly quota.

By pairing Claude Code with a local model like Qwen 3.5 via Ollama, you can:

  • Offload many coding tasks to a free, local LLM
  • Keep your cloud credits for the hardest problems
  • Work offline or on machines where you don’t want to send code to the cloud
  • Experiment with different open-source models as your AI agent’s “brain”

If you like this kind of setup, you may also want to check out how to turn local LLMs into powerful AI agents with Ollama and MCP.

What You Need Before You Start

Here’s the basic setup used in this walkthrough:

  • Windows machine with around 16 GB RAM
  • VS Code installed
  • Claude Code CLI installed
  • Ollama installed
  • Qwen 3.5 model downloaded via Ollama

You can adapt the same approach to other platforms and models, but we’ll focus on Qwen 3.5 9B as a good balance between performance and resource usage for a 16 GB RAM system.

Step 1: Install and Configure Claude Code

Start by opening the folder of the project you want to work on in VS Code, then open a terminal (PowerShell works well on Windows).

From there:

  1. Install Claude Code using the command provided in the docs.ollama.com Claude integration page.
  2. After installation, note the path where claude.exe is placed, typically something like C:\Users\<username>\.local\bin.
  3. Add this path to your system’s Environment Variables (User PATH):
    • Open Environment Variables
    • Edit the PATH for your user
    • Add the ...\.local\bin path
    • Move it near the top so it’s picked up reliably

If this step is skipped, Claude Code may fail to start because it can’t find the executable.

Step 2: Install Ollama and Download Qwen 3.5

Next, you need Ollama to run local models.

  1. Install Ollama using the official command from docs.ollama.com (run it in PowerShell).
  2. Once installed, run ollama in the terminal to confirm it’s working.
  3. List available models with ollama list. At this point, you may see some cloud-only Claude models if you’ve used Ollama Cloud before, but those require a paid plan.
  4. Pull the Qwen 3.5 9B model locally:
    ollama pull qwen2.5-coder:9b  (or the exact Qwen 3.5 9B name from the docs)
    
    This will download a ~6–7 GB model file.
  5. Confirm it’s available with ollama list. You should see something like qwen-3.5:9b (exact name depends on the model page).
  6. Test the model with:
    ollama run qwen-3.5:9b
    
    Ask a simple question (e.g., 1 + 1?) to make sure it responds.

If you have duplicate or older variants of the model, you can remove them with ollama remove <model-name> to save disk space.

Step 3: Launch Claude Code with Your Local Qwen Model

Now you can wire Claude Code to use Qwen 3.5 as its backend model via Ollama.

  1. In your project folder terminal, run:
    ollama launch claude --model qwen-3.5:9b
    
    Replace qwen-3.5:9b with the exact name shown in ollama list.
  2. Claude Code will start and ask for a quick safety check: confirm that you trust the current project folder.
  3. Once running, Claude Code will show the available models. You’ll see entries like custom Sonnet, Opus, and Haiku models, all mapped to your Qwen 3.5 9B backend by default.
  4. Send a simple message like “Hi, how are you?” and wait for the response. On a 16 GB machine, the first reply might take tens of seconds, especially if screen recording or other heavy apps are running.

At this point, you effectively have a Claude-style coding agent powered entirely by a local Qwen model through Ollama.

Step 4: Check Resource Usage and Context Limits

Ollama lets you inspect what’s running under the hood.

  • Run ollama ps to see active models. You should see your Qwen 3.5 9B instance with details like:
    • Model size (around 9–10 GB)
    • CPU and GPU usage
    • Context length (e.g., 16384 tokens)

Even though Qwen 3.5 supports very large context windows (up to 256K tokens), Ollama’s default for the model on your machine may be much lower (like 16K). For serious repo work, that can be limiting.

Step 5: Increase Qwen’s Context Window with a Custom Model File

To give Claude Code more room to think about large projects, you can create a custom variant of the Qwen model with a bigger context window.

Create a Model File

  1. Create a new text file named something like ModelFile (no extension).
  2. Inside it, define the base model and parameters, for example:
    from qwen-3.5:9b
    
    parameter num_ctx 65536
    
    This tells Ollama: “Create a new model based on Qwen 3.5 9B, but with a 65,536-token context window.”
  3. Use the Ollama model file docs as a reference if you want to tweak more parameters.

Create the New 64K Model

  1. In the same folder as your ModelFile, run:
    ollama create qwen-3.5-9b-64k -f ModelFile
    
    This creates a new model named qwen-3.5-9b-64k that internally copies the base model but with the larger context.
  2. Check it with ollama list. You should now see both the original Qwen 3.5 9B and the new 64K variant.
  3. If disk space is tight, you can remove the original model after confirming the new one works, but keeping both gives you flexibility.

Run Claude Code with the 64K Context Model

Now point Claude Code to the new model:

ollama launch claude --model qwen-3.5-9b-64k

Claude Code will now use the 64K context variant, giving it far more room to:

  • Read large repos
  • Track long conversations
  • Handle multi-file refactors and deep analysis

Using the Local Agent on Real Projects

With everything wired up, you can treat this setup like a full AI coding agent.

For example, you can ask it to:

  • Clone a GitHub repo into your project folder
  • Summarize the architecture and key components
  • List system requirements (RAM, VRAM, GPU type) to run the project
  • Suggest whether your local machine is sufficient or if you should use a GPU VPS

In one test, the agent cloned a video object removal model repo, analyzed it, and correctly concluded that the local 8 GB GPU wasn’t enough and that something like a 40 GB VRAM GPU (e.g., A100) would be required for inference. It also outlined installation steps (pip install requirements, download weights, run notebooks, etc.).

This is exactly the kind of deep, repo-level assistance that benefits from a larger context window and a persistent local model.

Other Models and Cloud Options

Qwen 3.5 9B is just one good choice. Ollama’s docs also recommend other models that work well as coding agents, such as:

  • Qwen 3.5 (Cloud) – a hosted version on Ollama Cloud (requires a paid subscription for sustained use)
  • GLM 4.7 Flash – another capable open-source model you can run locally or on a GPU VPS

You can also deploy these models on GPU VPS providers like RunPod, run Ollama there, and then connect Claude Code to that remote Ollama instance instead of your local machine.

If you’re interested in more local coding setups, take a look at this guide to a local AI coding setup with VS Code and Gemma as another option.

Wrapping Up

By combining Claude Code with Ollama and Qwen 3.5, you can turn your machine into a powerful, mostly free AI coding environment:

  • Claude Code provides the agent-like workflow (repo cloning, file edits, explanations)
  • Ollama hosts the local LLM backend
  • Qwen 3.5 (with a boosted context window) gives you enough capacity to handle real-world projects

You’ll still need at least a Claude Pro plan to use Claude Code itself, but once that’s in place, most of the heavy lifting can be done by your local model instead of expensive cloud tokens.

Share:

Comments

Jennifer Lee 16h ago
Same here. The agent's ability to use git commands is hit or miss. I think it's because the local model doesn't have the same tool-use training as Claude's cloud models. But for code analysis, it works great.
George Harris 2d ago
The article says you need a Claude Pro plan. Is that still true? I thought the CLI was free to use with any model. I'd rather not pay Anthropic just to use their CLI.

More in AI Agents