Running Claude Code Locally with Ollama: The $0/Month Setup
Running Claude Code Locally with Ollama: The $0/Month Setup
You're paying $200/month for Claude Code. Your colleague is paying nothing and running the same workflows locally. Here's how they did it—and why you might want to consider it too.
What Changed
Until recently, using Claude Code meant accepting API costs as a fact of life. Every file read, every response, every iteration hit Anthropic’s servers and your wallet. A productive coding session could cost $20-50. Heavy users paid $200/month for the Max plan.
Now there’s an alternative: run Claude Code entirely on your local machine using Ollama and open-source models. Zero API costs. Complete privacy. Offline capability.
The tradeoff? Performance. Open-source models aren’t as capable as Claude Opus 4.5. But for many tasks—prototyping, learning, working with proprietary code—they’re good enough.
Prerequisites
Before starting, you need:
Hardware: 16GB RAM minimum (32GB recommended for larger models)
Software: Node.js 16+ installed
Time: 10-30 minutes depending on your setup
Step 1: Install Ollama
Ollama runs language models locally. It’s open-source and handles the complexity of model serving.
macOS:
bash
brew install ollamaLinux:
bash
curl -fsSL https://ollama.ai/install.sh | shWindows: Download from ollama.ai/download
Start the Ollama service:
bash
ollama serveLeave this running in the background. It listens on localhost:11434.
Step 2: Download a Coding Model
Ollama supports multiple models. For coding, Qwen 2.5 Coder is currently the strongest option.
bash
ollama pull qwen2.5-coderThis downloads 4-7GB depending on model size. Available variants:
1.5B: Fast, lightweight, handles simple tasks
7B: Balanced performance, runs on most laptops (recommended)
14B: Better quality, needs 32GB+ RAM
32B: Best quality, requires significant resources
The 7B model offers the best speed-to-capability ratio for most users.
Alternative models:
bash
ollama pull codellama:7b
ollama pull deepseek-coder:6.7bStep 3: Install Claude Code
Claude Code is distributed via npm:
bash
npm install -g @anthropic-ai/claude-codeVerify installation:
bash
claude --versionStep 4: Configure Local Endpoint
This step redirects Claude Code from Anthropic’s cloud API to your local Ollama server.
Set environment variables:
bash
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434The auth token can be any value—Ollama doesn’t require authentication for local connections. The base URL points to your Ollama instance.
Make it permanent:
bash
echo 'export ANTHROPIC_AUTH_TOKEN=ollama' >> ~/.zshrc
echo 'export ANTHROPIC_BASE_URL=http://localhost:11434' >> ~/.zshrc
source ~/.zshrc(Replace .zshrc with .bashrc if using bash)
Step 5: Start Coding
Launch Claude Code with your local model:
bash
claude --model qwen2.5-coderExample task:
> Create a REST API in Python using FastAPI with CRUD endpoints for a Todo modelClaude Code will read files, generate code, and request confirmation before making changes—just like the cloud version, but running entirely on your machine.
What Works (and What Doesn’t)
Works locally:
File reading and editing
Code generation across multiple files
Terminal command execution
Iterative refinement
Custom instructions via
CLAUDE.md
Limitations compared to cloud Claude:
Quality: More bugs, occasional misunderstandings, outdated patterns
Speed: 10-30 seconds per response on CPU (faster with GPU)
Context: 32K tokens vs 200K for Claude Sonnet 4
Features: No web search capability
When This Makes Sense
Good fit:
Proprietary code that can’t leave your network
Prototyping and learning projects
Budget-constrained development
Offline work requirements
Experimenting with different models
Poor fit:
Production-critical code requiring maximum quality
Time-sensitive deliverables
Systems with limited RAM (<16GB)
Projects requiring web search or external tool integration
Performance Optimization
GPU acceleration: If you have an NVIDIA GPU, Ollama uses it automatically. Check:
bash
ollama psLook for VRAM usage to confirm GPU acceleration.
Memory management: Unload models when not in use:
bash
ollama stop qwen2.5-coderModel selection: Start with 7B. Too slow? Try 1.5B. Have 32GB+ RAM? Try 14B for better results.
Avoid concurrent sessions: Each Claude Code instance loads the model separately, consuming significant RAM.
Alternative Models
CodeLlama 13B:
bash
ollama pull codellama:13b
claude --model codellama:13bBetter at complex instructions, slower inference.
DeepSeek Coder V2:
bash
ollama pull deepseek-coder-v2:16b
claude --model deepseek-coder-v2:16bExcellent for understanding and refactoring existing codebases.
Mistral 7B:
bash
ollama pull mistral:7b
claude --model mistral:7bGeneral-purpose model, surprisingly capable for non-specialized coding.
Troubleshooting
Connection issues:
bash
curl http://localhost:11434/api/tagsShould return JSON listing available models.
Slow responses: Check GPU status:
bash
ollama psZero VRAM usage means CPU-only inference (expected without NVIDIA GPU).
Out of memory: Switch to smaller model:
bash
ollama pull qwen2.5-coder:1.5b
claude --model qwen2.5-coder:1.5bPoor-quality outputs: Open-source models exhibit higher hallucination rates. If results are consistently inadequate, consider cloud API access for that specific project.
Why This Matters
This setup removes the financial barrier to AI-assisted development. Previously, you needed a credit card and a willingness to pay per token. Now you need the hardware most developers already own.
That changes who can build with AI agents: students, hobbyists, developers in regions with limited payment infrastructure, and anyone on a tight budget.
The models will improve. Qwen 2.5 Coder is already capable. In six months, there will be better options. The gap between open-source and proprietary models continues narrowing.
This is actual democratization: not cheaper cloud APIs, but local infrastructure anyone can run.


Love this breakdown of running Claude locally. The democratization angle is real, removing that API paywall means more experimentation without the anxiety of burning through credits. I've been testing Qwen 2.5 Coder on side projects and honestly the quality gap is narrower than expected for most routine tasks,tho context window still bites when refactoring larger codebases.
Thanks for the simple walkthrough for the good 😊