Running Claude Code Locally with Ollama: The $0/Month Setup

Jan 24, 2026

You're paying $200/month for Claude Code. Your colleague is paying nothing and running the same workflows locally. Here's how they did it—and why you might want to consider it too.

What Changed

Until recently, using Claude Code meant accepting API costs as a fact of life. Every file read, every response, every iteration hit Anthropic’s servers and your wallet. A productive coding session could cost $20-50. Heavy users paid $200/month for the Max plan.

Now there’s an alternative: run Claude Code entirely on your local machine using Ollama and open-source models. Zero API costs. Complete privacy. Offline capability.

The tradeoff? Performance. Open-source models aren’t as capable as Claude Opus 4.5. But for many tasks—prototyping, learning, working with proprietary code—they’re good enough.

Prerequisites

Before starting, you need:

Hardware: 16GB RAM minimum (32GB recommended for larger models)
Software: Node.js 16+ installed
Time: 10-30 minutes depending on your setup

Step 1: Install Ollama

Ollama runs language models locally. It’s open-source and handles the complexity of model serving.

macOS:

bash

brew install ollama

Linux:

bash

curl -fsSL https://ollama.ai/install.sh | sh

Windows: Download from ollama.ai/download

Start the Ollama service:

bash

ollama serve

Leave this running in the background. It listens on localhost:11434.

Step 2: Download a Coding Model

Ollama supports multiple models. For coding, Qwen 2.5 Coder is currently the strongest option.

bash

ollama pull qwen2.5-coder

This downloads 4-7GB depending on model size. Available variants:

1.5B: Fast, lightweight, handles simple tasks
7B: Balanced performance, runs on most laptops (recommended)
14B: Better quality, needs 32GB+ RAM
32B: Best quality, requires significant resources

The 7B model offers the best speed-to-capability ratio for most users.

Alternative models:

bash

ollama pull codellama:7b
ollama pull deepseek-coder:6.7b

Step 3: Install Claude Code

Claude Code is distributed via npm:

bash

npm install -g @anthropic-ai/claude-code

Verify installation:

bash

claude --version

Step 4: Configure Local Endpoint

This step redirects Claude Code from Anthropic’s cloud API to your local Ollama server.

Set environment variables:

bash

export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434

The auth token can be any value—Ollama doesn’t require authentication for local connections. The base URL points to your Ollama instance.

Make it permanent:

bash

echo 'export ANTHROPIC_AUTH_TOKEN=ollama' >> ~/.zshrc
echo 'export ANTHROPIC_BASE_URL=http://localhost:11434' >> ~/.zshrc
source ~/.zshrc

(Replace .zshrc with .bashrc if using bash)

Step 5: Start Coding

Launch Claude Code with your local model:

bash

claude --model qwen2.5-coder

Example task:

> Create a REST API in Python using FastAPI with CRUD endpoints for a Todo model

Claude Code will read files, generate code, and request confirmation before making changes—just like the cloud version, but running entirely on your machine.

What Works (and What Doesn’t)

Works locally:

File reading and editing
Code generation across multiple files
Terminal command execution
Iterative refinement
Custom instructions via CLAUDE.md

Limitations compared to cloud Claude:

Quality: More bugs, occasional misunderstandings, outdated patterns
Speed: 10-30 seconds per response on CPU (faster with GPU)
Context: 32K tokens vs 200K for Claude Sonnet 4
Features: No web search capability

When This Makes Sense

Good fit:

Proprietary code that can’t leave your network
Prototyping and learning projects
Budget-constrained development
Offline work requirements
Experimenting with different models

Poor fit:

Production-critical code requiring maximum quality
Time-sensitive deliverables
Systems with limited RAM (<16GB)
Projects requiring web search or external tool integration

Performance Optimization

GPU acceleration: If you have an NVIDIA GPU, Ollama uses it automatically. Check:

bash

ollama ps

Look for VRAM usage to confirm GPU acceleration.

Memory management: Unload models when not in use:

bash

ollama stop qwen2.5-coder

Model selection: Start with 7B. Too slow? Try 1.5B. Have 32GB+ RAM? Try 14B for better results.

Avoid concurrent sessions: Each Claude Code instance loads the model separately, consuming significant RAM.

Alternative Models

CodeLlama 13B:

bash

ollama pull codellama:13b
claude --model codellama:13b

Better at complex instructions, slower inference.

DeepSeek Coder V2:

bash

ollama pull deepseek-coder-v2:16b
claude --model deepseek-coder-v2:16b

Excellent for understanding and refactoring existing codebases.

Mistral 7B:

bash

ollama pull mistral:7b
claude --model mistral:7b

General-purpose model, surprisingly capable for non-specialized coding.

Troubleshooting

Connection issues:

bash

curl http://localhost:11434/api/tags

Should return JSON listing available models.

Slow responses: Check GPU status:

bash

ollama ps

Zero VRAM usage means CPU-only inference (expected without NVIDIA GPU).

Out of memory: Switch to smaller model:

bash

ollama pull qwen2.5-coder:1.5b
claude --model qwen2.5-coder:1.5b

Poor-quality outputs: Open-source models exhibit higher hallucination rates. If results are consistently inadequate, consider cloud API access for that specific project.

Why This Matters

This setup removes the financial barrier to AI-assisted development. Previously, you needed a credit card and a willingness to pay per token. Now you need the hardware most developers already own.

That changes who can build with AI agents: students, hobbyists, developers in regions with limited payment infrastructure, and anyone on a tight budget.

The models will improve. Qwen 2.5 Coder is already capable. In six months, there will be better options. The gap between open-source and proprietary models continues narrowing.

This is actual democratization: not cheaper cloud APIs, but local infrastructure anyone can run.

Resources

Neural Foundry

Jan 24

Love this breakdown of running Claude locally. The democratization angle is real, removing that API paywall means more experimentation without the anxiety of burning through credits. I've been testing Qwen 2.5 Coder on side projects and honestly the quality gap is narrower than expected for most routine tasks,tho context window still bites when refactoring larger codebases.

Meenakshi NavamaniAvadaiappan

Thanks for the simple walkthrough for the good 😊

InfraFlow AI

Discussion about this post

Ready for more?