Summary: Nate Herk — Ollama + Claude Code = 99% Cheaper

Nate Herk demonstrates two methods for running Claude Code with non-Anthropic models to eliminate or drastically reduce API costs. The framing: Claude Code is the car (harness), the AI model is the engine — you can swap the engine.

Core Concept

Claude Code is a harness that wraps around a model. By default it uses Anthropic’s Opus/Sonnet/Haiku. By redirecting to a local model or OpenRouter, you substitute a different engine while keeping the Claude Code tooling, file system access, and workflow intact.

Why open-source models aren’t perfect substitutes:

May not have been trained on Claude Code’s tool-calling conventions
May have context windows too small for Claude Code’s system prompt
May not follow the exact JSON protocol Claude Code expects

Despite this, modern open-source models (Qwen, Gemma 4) are closing the gap with closed-source models rapidly. Qwen 3.5 outperforms Claude Sonnet 3.7 on SWE-bench verified — a model that was considered state-of-the-art at release.

Using this is not against Anthropic’s ToS — you’re using Anthropic’s agent harness, just with a different model.

Method 1: Local Models via Ollama

Download Ollama from ollama.com
Pull a model: ollama pull qwen3.5 (or chosen model)
Launch Claude Code with Ollama: ollama launch claude → select model at startup
For auth: requires a one-time $5 Anthropic API credit deposit (never actually consumed when using local models)

Context window issue: Ollama may default to a small context window even if the model supports more. Fix by creating a custom model config that sets num_ctx explicitly. Ask Claude Code: “I’m pulling a model from Ollama and want to increase the context window — what command do I run?”

Trade-offs of local models:

Slower than cloud (Anthropic servers vs. local hardware)
Fully private — nothing leaves your machine
Less visibility into tool calls (doesn’t stream tool steps the same way)
Quality capped by your hardware (RAM, GPU)

Method 2: OpenRouter

OpenRouter hosts hundreds of models (including many free ones) behind a single API. Configure Claude Code to point to OpenRouter instead of Anthropic.

Setup (in .claude/settings.local.json):

{
  "env": {
    "ANTHROPIC_BASE_URL": "https://openrouter.ai/api/v1",
    "ANTHROPIC_AUTH_TOKEN": "<your-openrouter-api-key>",
    "ANTHROPIC_MODEL": "qwen/qwen3-6b:free",
    "ANTHROPIC_SMALL_FAST_MODEL": "qwen/qwen3-6b:free"
  }
}

Critical: Must set ALL model variables (not just the primary). If you only set ANTHROPIC_MODEL, Claude Code still uses Anthropic Haiku/Sonnet for small tool calls and charges you without warning.

Rate limits on free models:

Without deposit: 50 requests/day
With $5-10 d e p os i t : 1, 000 re q u es t s / d a y (f ree m o d e l ss t i ll cos t$ 0)

When to Use Open-Source Models

Good fits:

Low-stakes, high-volume: summarizing files, grepping codebases, generating code scaffolding
Research and information gathering: web searches, summarizing emails, pulling docs
Organization tasks: categorizing, triaging, filing
Simple tests and bug reviews
Fallback when Anthropic is down (check status.claude.com) or session limit is hit

Not recommended for:

Heavy, high-stakes coding work (use Opus 4.6 or equivalent)
Tasks where missing a step is costly

The Real Cost Trade-Off

“There’s no such thing as free.” The actual costs shift to:

Local: hardware (RAM, GPU) to run larger models
Ollama cloud: subscription for higher usage
VPS: hosting cost for self-hosted inference

The realistic benefit: 50–100x cost reduction using cheap-but-not-free OpenRouter models rather than eliminating cost entirely.

AI For Dev

Explorer

summary-nate-herk-ollama-claude-code

Summary: Nate Herk — Ollama + Claude Code = 99% Cheaper

Core Concept

Method 1: Local Models via Ollama

Method 2: OpenRouter

When to Use Open-Source Models

The Real Cost Trade-Off

See Also

Graph View

Table of Contents

Backlinks