How to Avoid Runaway API Costs in OpenClaw

OpenClaw had been running for one day on recommended defaults. When I checked my LLM API billing dashboard, it showed $25 in charges. At that rate (roughly $750 a month), the defaults were too expensive for personal use. Before I could tailor OpenClaw to my own needs, I had to understand what was driving those costs.

OpenClaw recommends starting with a capable frontier model, so I picked Anthropic's Sonnet. On top of that, heartbeat - OpenClaw's built-in keep-alive and background task loop - was running every 30 minutes without isolation, meaning each ping carried my full conversation history rather than starting fresh. The result: background heartbeat calls hitting an expensive model with large context payloads, racking up charges before I had tuned the defaults for personal use.

The good news is that two config changes fixed almost all of it. This post walks through what is actually happening under the hood, what to change, and what it should cost when everything is dialed in.

One quick scope note: this post assumes you are paying for model access via API, not a monthly subscription plan. Some providers - OpenAI, for example - allow routing agent traffic through a subscription plan, which can reduce costs. For Anthropic models, check their current terms - at the time of writing, automated use like this falls under API access, meaning per-token billing applies. Either way, the config optimizations here reduce token consumption and cost regardless of how you are billed.

Where the Background Costs Come From

Two settings are responsible for most of the surprise.

Model routing

By default, OpenClaw sends every request - complex reasoning, simple keep-alive checks, background housekeeping - through the same model. That model is Sonnet in our case, a capable frontier model well suited to the hard work. But using it for everything, including the trivial stuff, is where costs start to climb.

Heartbeat

Heartbeat is OpenClaw's built-in keep-alive loop - an API call fired on a schedule (default: every 30 minutes) whether you are actively using the agent or not. Each call carries your full conversation history, tool state, and system prompt - somewhere between 8,000 and 15,000 tokens - even when the result is just HEARTBEAT_OK. At that volume on Sonnet, it adds up to real money before your agent handles a single task.

The two fixes address each root cause directly.

Fix 1 - Right-Size Your Models

OpenClaw runs as a single agent by default, which means every request - heartbeat checks, routing decisions, complex reasoning - hits the same model. There is no way to send cheap work to a cheap model. Three steps change that.

Model tiering: Configuring different agents or task types to use different models based on cost and capability requirements. The goal is to match model capability to what the task actually needs - not always the cheapest, not always the most capable.

Step 1: Switch to multiple agents

As long as you have one agent, every request shares one model configuration. OpenClaw supports multi-agent routing once you enable it - just ask OpenClaw to convert and it will update its own configuration. After that you can create focused agents scoped to specific task types, each independently configured.

Step 2: Switch to OpenRouter

Before assigning models, route everything through OpenRouter rather than connecting OpenClaw directly to each provider.

Model gateway: A unified API layer that routes requests to multiple LLM providers through a single endpoint. Instead of managing separate API keys for Anthropic, Google, Mistral, and others, you configure one endpoint and swap models with a string change.

The practical workflow: start with a mid-tier model, run your actual tasks, check quality, step down to something cheaper if quality holds, step up only when you need it. That iteration is painful when you are locked into one provider and easy when you are not.

OpenRouter also puts per-token costs for all supported models in one place, so you can compare side by side without juggling provider dashboards. It adds a small markup over direct API pricing, but in my experience that is negligible relative to the savings from picking the right model.

An additional benefit if you are setting up OpenClaw fresh and choose OpenRouter as your provider: the setup wizard defaults to openrouter/auto as the model, which means OpenRouter handles model selection automatically without any additional configuration on your end.

Step 3: Assign the right model to each agent

Now that you have separate agents and a single gateway, map capability to cost:

A lightweight model for heartbeat, monitoring, and simple routing
A premium model reserved for complex reasoning, long-context analysis, or high-stakes output

The easiest starting point: let OpenRouter decide. If you set up OpenClaw fresh with OpenRouter, the setup wizard defaults to openrouter/auto. The OpenRouter auto-router selects the model per request on your behalf - no per-agent configuration needed. You can still override with a specific model for any agent when you need predictable performance or cost.

If you prefer to pick manually, a few models I have used for lower-cost slots:

StepFun - free (confirm before committing - free tiers can change), fast, and holds up well for orchestration and general tasks in my experience so far; a pleasant surprise I found while reviewing popular tool-capable models on OpenRouter
Gemini Flash-Lite, GPT-mini, Gemini Flash - solid smaller frontier options
Haiku (Anthropic) - good balance of cost and capability
DeepSeek V3 - cheap, but I found response latency noticeably high in practice; your mileage may vary

OpenRouter maintains a curated list of models vetted for tool-calling use cases - a good starting point for picking a model for your orchestration layer.

One caveat before you commit: some cheaper models lack image and media support. Confirm your requirements against current OpenClaw docs before locking in a model.

Fix 2 - Tame the Heartbeat

You have two options.

Option A: Keep heartbeat, reduce what it costs

If you still want heartbeat running, three levers reduce what each cycle costs. Apply any or all of them.

Reduce frequency

The default is every 30 minutes. You can dial that back without turning it off entirely:

openclaw config set agents.defaults.heartbeat.every "2h"

That cuts the frequency to once every two hours.

Use a cheaper model

You can assign a lighter model specifically to heartbeat without changing your main agent model:

openclaw config set agents.defaults.heartbeat.model "anthropic/claude-haiku-3-5"

A Haiku-class model handles keep-alive checks fine - no need to send those through Sonnet.

Run heartbeat in an isolated session

By default, each heartbeat carries your full conversation context. With isolatedSession, it starts fresh each time, dropping the payload dramatically:

openclaw config set agents.defaults.heartbeat.isolatedSession true

All three levers stack. Combine them for the biggest reduction short of disabling heartbeat entirely.

Option B: Disable heartbeat (what I did)

This is the path I took. I turned off heartbeat entirely:

openclaw config set agents.defaults.heartbeat.every "0m"

Then replaced it with cron jobs - each scoped to a specific task, running in isolation with no context carryover. Also set isolation for cron so each scheduled run stays cheap:

openclaw config set agents.defaults.cron.isolatedSession true

Cron vs. heartbeat tradeoff: Heartbeat batches multiple checks into one turn; cron runs each task in isolation. One heartbeat can be cheaper than five isolated cron jobs for routine monitoring - but cron is cleaner for discrete scheduled tasks with no shared state. Worth reading the official comparison before deciding.

What It Should Cost

There are two separate cost buckets worth keeping distinct.

LLM API costs - what this post covers. The exact savings depend on which model you choose and your usage pattern, but the combination of heartbeat isolation and model tiering consistently brings costs to a fraction of the default.

Hosting costs - a separate line item. I run OpenClaw on Hetzner for about $12/month. If you have a dedicated local machine that can run always-on, that unlocks another option: open-weight models via Ollama or similar, which can drive LLM API spend to near zero for many workloads. The tradeoff is hardware requirements and setup complexity.

Start with your models: convert to multi-agent routing, point everything through OpenRouter, and assign each agent a model that fits what it actually does. Then tame the heartbeat - either disable it and replace with cron, or reduce the frequency. Matching each type of work to the right model and keeping background calls lean will bring your bill down considerably.