How to Turn Local LLMs into Powerful AI Agents with Ollama and MCP
Running AI models locally is great for privacy and cost—but on their own, they’re just smart chatbots. The real magic happens when your local model can actually take action: read your Notion docs, update your calendar, send data to other apps, and more.
In this guide, you’ll learn how to run a capable local LLM with Ollama and then turn it into a fully featured AI agent using the Zapier MCP server. That gives your local model access to thousands of tools—while keeping the core intelligence running privately on your own machine.
LLM vs AI Agent: What’s the Difference?
Large language models (LLMs) are the “brains” that generate text, code, or other outputs. On their own, they mostly just take in text and return text.
An AI agent is an LLM plus tools. It can:
• Call external APIs
• Read and write data in apps like Notion or Google Calendar
• Trigger automations (for example, via Zapier)
• Take real actions based on your instructions
When you run a model locally with Ollama, you start with just the LLM. By connecting it to tools through MCP (Model Context Protocol), you turn that local model into an agent that can actually do things with your data.
Step 1: Install and Run Ollama
Ollama is a popular way to run LLMs locally on macOS, Windows, and Linux.
To get started:
1. Go to ollama.com and download the app for your operating system.
2. Install it, then open a terminal (Terminal on macOS, PowerShell/CMD/Terminal on Windows).
3. Type ollama to confirm it’s installed and running. If it doesn’t start, open the Ollama app manually so it runs in the background.
Ollama also has a Models tab where you can browse and download different models directly from the UI.
Step 2: Choose the Right Local Model
Not every model is a good fit for every machine, and not every model can call tools. You need to balance performance, hardware limits, and tool-calling support.
Understand Your Hardware Limits
Two things matter most:
• RAM / Unified Memory (Mac): On Apple Silicon (M1/M2/M3), RAM is shared between CPU and GPU. If you have 32 GB unified memory, a model that needs ~24 GB can be workable, but huge models (80+ GB) will be painfully slow.
• VRAM (Windows/Linux with GPU): On systems with a dedicated GPU (e.g., RTX 4090 with 24 GB VRAM), the model usually runs in GPU memory. Larger models need more VRAM.
You can technically run models on CPU without a GPU, but they’ll be very slow for anything serious.
Pick a Model That Supports Tool Calling
For tool integrations, your model must support tool calling (sometimes called function calling). Many older or smaller models don’t.
In the Ollama model list, look for models that explicitly mention tools. One example used here is Gwen 3.5, which has multiple parameter sizes and supports tools.
Parameter size trade-offs:
• Smaller models (e.g., 8–9B parameters): Faster, lighter, easier to run on modest hardware, but less capable.
• Medium models (e.g., 27B): Good balance of quality and speed on a strong laptop or desktop.
• Huge models (e.g., 122B): Better quality but often impractical on consumer hardware due to RAM/VRAM limits.
For most people, a 9B–27B model with tool support is a good starting point.
Download and Test the Model
Once you’ve picked a model in Ollama (for example, gwen-3.5:27b):
1. Open your terminal.
2. Run ollama pull gwen-3.5:27b (replace with your chosen model name). This downloads the model and its weights.
3. After it finishes, test it with:ollama run gwen-3.5:27b
Ask a simple question like “What are you good at?” and see how fast it responds. If it’s extremely slow or never responds, try a smaller parameter version of the same model.
Step 3: Connect Ollama to Tools with Zapier MCP
Now you have a local model running. Next, you’ll give it access to real-world tools using MCP and the Zapier MCP server.
What Is the Zapier MCP Server?
The Zapier MCP server exposes over 8,000+ app integrations (Notion, Google Calendar, Meta ads, CRMs, and more) as tools that an AI agent can call.
Key points:
• It’s free to start; usage counts against your normal Zapier task limits.
• Each tool call is like running a Zapier task.
• You configure which apps and actions are available, so you stay in control.
Set Up Your Zapier MCP Server
1. Go to the Zapier MCP page and sign in or create an account.
2. Create a new MCP server and choose Other as the client type (since you’re connecting from Ollama).
3. Add tools by connecting apps. For example:
• Notion: Connect your Notion account via OAuth, then choose which pages or databases to expose (e.g., travel notes, project docs).
• Google Calendar: Connect your calendar account so the agent can read events and create or update them.
4. Once your tools are selected, go to the Connect tab.
5. Generate a new token and copy the full URL with token. This is your MCP server URL—store it somewhere safe and never share it publicly.
Step 4: Install the Ollama MCP Client (Bridge)
Ollama doesn’t natively speak MCP, so you need a small bridge: the Ollama MCP client (often referred to as ol-mcp or similar). This tool connects Ollama to your MCP server and exposes the tools to your local model.
Install the MCP Client
First, make sure you have Python installed. Then, in your terminal, run something like:
pip install --upgrade ol-mcp
Alternatively, if you use uv you can run:
uvx ol-mcp
(Exact package name/command may vary slightly depending on the latest repository, but the idea is the same: install the Ollama MCP client.)
Run Ollama with Your MCP Server
Once installed, you can start an interactive session that connects your model to the Zapier MCP server. The command will look like:
ol-mcp --mcp-server-url "YOUR_MCP_SERVER_URL" -m "gwen-3.5:27b"
Where:
• --mcp-server-url is the full URL (including token) you copied from Zapier.
• -m (or --model) is the Ollama model name you want to use.
When it connects successfully, you’ll see that it has discovered your tools. You can usually type commands like tools to list them, or toggle things like thinking mode and metrics.
Step 5: Use Your Local Agent with Real Tools
With everything wired up, you can now talk to your local model and let it call tools via Zapier MCP.
Example: Querying Notion Data
Try a prompt like:
Can you tell me where I was traveling in the last year based on my Notion documents?
The agent will:
1. Decide it needs to use a Notion-related tool.
2. Propose a tool call (you may be asked to confirm with Y or N).
3. Fetch relevant pages or databases from Notion via the Zapier MCP server.
4. Read the content and summarize your travel history.
This might take a bit longer than a pure cloud model, but the upside is that the core model is running locally and privately.
Example: Creating Calendar Events
Another test prompt:
Create a new booking today from 4–5 p.m. called "Eat lunch" in my calendar.
The agent will call the Google Calendar tool exposed by Zapier MCP, create the event, and then confirm. Time zones can matter here—if your calendar or system time zones differ, you might see events appear at unexpected times (for example, 4–5 a.m. instead of p.m.), so be ready to adjust or clarify.
You can then follow up with something like:
The event is at 4–5 a.m. Move it to 4–5 p.m.
and the agent should update the event accordingly.
Step 6: Using Your Local Agent from Code
Once you’re comfortable in the terminal, you can integrate this setup into your own apps or automations.
Calling Ollama via Its REST API
Ollama exposes a REST API server, so any language that can make HTTP requests can talk to your local model. In Python, for example, you can use libraries like requests or higher-level wrappers to send prompts and receive responses.
Combining Ollama, MCP, and LangChain
For more advanced orchestration, you can use frameworks like LangChain together with MCP adapters. A typical Python setup might:
• Define the Zapier MCP server URL in code.
• Create an MCP client and load all available tools.
• Wrap your Ollama model as a LangChain LLM.
• Build a React-style agent that decides when to call which tool.
• Start a conversational loop where you send user messages and get tool-augmented responses back.
This approach is ideal if you’re building a product that needs a private, local model but still wants deep integrations with SaaS tools. It’s similar in spirit to more advanced agent frameworks like those discussed in real-world AI agent updates focused on stability and memory.
Performance, Trade-Offs, and Next Steps
Running everything locally with tool integrations is powerful, but there are trade-offs:
• Speed: Local models—especially larger ones—will usually be slower than top-tier cloud models.
• Hardware: Better GPUs or more unified memory let you run larger, smarter models at usable speeds.
• Accuracy: Smaller models may be less accurate or need more guidance, but they’re much faster.
The good news is that once you’ve done the initial setup, it’s mostly plug-and-play: swap models, add or remove tools in Zapier, and iterate until you find the right balance of speed and capability for your machine.
If you enjoy running powerful AI locally, you might also like exploring other self-hosted workflows, such as running local text-to-video models on a cloud GPU for creative projects.
From here, you can experiment with different models, expand your tool set beyond Notion and Google Calendar, and even embed this local agent into your own apps or internal tools.
Comments
No comments yet. Be the first to share your thoughts!