zacStep-by-step guide to setting up OpenClaw with Ollama for a fully local, private AI agent. Zero API costs, full data sovereignty, works on Mac, Windows, and...
Originally published on Remote OpenClaw.
Running an AI agent that can read your emails, manage your calendar, browse the web, and automate tasks across your apps — all without sending a single byte of data to a third-party server. That is what OpenClaw + Ollama gives you.
OpenClaw is the open-source AI agent framework that took GitHub by storm (321,000+ stars). Ollama is the tool that lets you run large language models locally on your own hardware. Put them together and you get a fully private, zero-cost AI assistant running entirely on your machine.
Marketplace
Free skills and AI personas for OpenClaw — browse the marketplace.
Join the Community
Join 1k+ OpenClaw operators sharing deployment guides, security configs, and workflow automations.
The trade-off is that you need decent hardware and local models are not as capable as cloud models like Claude or GPT-4. But for the majority of daily automation tasks, a good local model is more than enough.
Tier
RAM
VRAM
Model Size
Minimum
8 GB
8 GB
7-9B parameters
Recommended
16-32 GB
16-24 GB
14-32B parameters
Best Experience
64 GB+
48 GB+
70B+ parameters
Mac users: Apple Silicon shares unified memory between CPU and GPU, so a Mac Mini M4 with 32 GB RAM can comfortably run 14B-32B models — this is why Mac Minis are one of the most popular setups for self-hosted OpenClaw.
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | bash
# Verify installation
ollama --version
You need Ollama 0.17 or later for OpenClaw integration.
A single command handles everything:
ollama launch openclaw
Ollama detects whether OpenClaw is installed, installs it if needed, prompts you to select a model, configures the provider, installs the gateway daemon, activates web search, and starts the agent.
# Specify a model directly
ollama launch openclaw --model glm-4.7-flash
# Headless mode for automation
ollama launch openclaw --model kimi-k2.5:cloud --yes
Model
Parameters
VRAM Needed
Best For
glm-4.7-flash
~14B
~12 GB
Default recommendation. Good balance of speed and capability
Qwen3.5 27B
27B
~20 GB
Best quality-to-size ratio. 72.4% on SWE-bench
Qwen3.5 35B-A3B
35B (3B active)
~16 GB
Fastest — 112 tokens/sec via sparse activation
Qwen3.5 9B
9B
~8 GB
Entry-level hardware
Models to avoid: Under 7B parameters (struggle with tool-use), older Llama/Mistral models (unreliable tool calling), any model with less than 64K context.
openclaw configure --section channels
Connect WhatsApp (via QR code), Telegram (via bot token), Discord (via bot token), Slack (via app integration), iMessage (Mac only), Signal (via Signal CLI), or the built-in web chat.
Marketplace
Free skills and AI personas for OpenClaw — browse the marketplace.
Key numbers to know
The easiest approach — set one environment variable and OpenClaw auto-discovers all Ollama models:
export OLLAMA_API_KEY="ollama-local"
Do NOT use the /v1 OpenAI-compatible URL. This is the single most common mistake:
# WRONG — breaks tool calling
"baseUrl": "http://localhost:11434/v1"
# CORRECT — use native Ollama API
"baseUrl": "http://localhost:11434"
Using /v1 breaks tool calling and causes models to output raw tool JSON as plain text instead of executing actions.
OpenClaw disables streaming by default for Ollama. Use a faster model, reduce context size, or increase the timeout. The Qwen3.5 35B-A3B MoE model (112 t/s) is specifically good here.
Model names must match your provider config exactly. Run ollama list and copy-paste the exact name.
Almost always caused by using the /v1 URL or an incompatible model. Use the native Ollama URL and switch to a Qwen3 or Qwen3.5 model.
The smartest setup combines local and cloud models:
Nothing for the software. OpenClaw is free and open source. Ollama is free. The models are free. Your only costs are hardware and electricity.
Yes — and Macs are one of the best platforms for this. Apple Silicon shares unified memory between CPU and GPU, so a Mac Mini M4 with 32 GB RAM can comfortably run 14B-32B models.
For most users: glm-4.7-flash as a balanced default or Qwen3.5 27B if your hardware can handle it. The Qwen3.5 family has the best tool-calling reliability with OpenClaw as of March 2026.
Partially. The LLM inference works offline once the model is downloaded. But tasks that need the internet (email, web search, calendar sync) still require connectivity.
Yes. You can configure multiple providers in OpenClaw and route different tasks to different models.
*Last updated: March 2026. Published by the Remote OpenClaw team at remoteopenclaw.com.*