Brian Gershon

Building products, sharing best practices.

Gargoyle masthead mark

How to Avoid Runaway API Costs in OpenClaw

Routing every OpenClaw request, from background tasks to complex reasoning, through a premium model adds up fast. Here is how to right-size it.

How to Avoid Runaway API Costs in OpenClaw

OpenClaw had been running for one day on default settings. When I opened my OpenRouter billing dashboard, it showed $25 in charges - and I had not yet done anything useful with it. That sent me digging.

OpenClaw recommends starting with a capable frontier model, so I picked Anthropic's Sonnet. On top of that, heartbeat - OpenClaw's built-in keep-alive and background task loop - was running every 30 minutes without isolation, meaning each ping carried my full conversation history rather than starting fresh. The result: background heartbeat calls hitting an expensive model with large context payloads, racking up charges before my agent handled a single real task.

The good news is that two config changes fixed almost all of it. This post walks through what is actually happening under the hood, what to change, and what it should cost when everything is dialed in.

Where the Background Costs Come From

Two settings are responsible for most of the surprise.

Heartbeat: OpenClaw's built-in keep-alive loop - an API call sent on a schedule (default: every 30 minutes) to check agent status and run any pending background tasks. It fires whether you are using the agent or not.

Heartbeat is designed to keep your agent responsive and run background tasks on a schedule - useful when you want proactive monitoring. Without tuning, though, it runs roughly 38 times a day. Each call carries somewhere between 8,000 and 15,000 tokens of context - your full conversation history, tool state, and system prompt - even when the result is just HEARTBEAT_OK. At that token volume on a capable frontier model like Sonnet, my rough estimate puts each heartbeat call at around 50 cents. That is my own back-of-envelope figure based on running OpenClaw with Anthropic Sonnet, not a third-party benchmark. But even if you adjust it by half, 38 calls a day adds up to real money before your agent handles a single task.

Model routing: The process of choosing which LLM handles each incoming request. By default, OpenClaw routes most requests - including simple housekeeping calls - through the most capable (and most expensive) model in your configuration.

Sonnet is a capable frontier model - powerful enough for complex reasoning and less prone to the mistakes that matter in an autonomous agent. But routing every request through it, including simple keep-alive calls and routing decisions, is wasteful. Running heartbeat checks through a model like Sonnet is like hiring a senior consultant to answer "are you still there?" every 30 minutes.

The two fixes address each root cause directly.

Fix 1 - Tame the Heartbeat

You have three options, depending on how much you rely on background monitoring.

Option A: Reduce heartbeat frequency

The default is every 30 minutes - roughly 38 calls a day. You can dial that back without turning it off entirely:

openclaw config set agents.defaults.heartbeat.every "2h"

That cuts the call count from ~38 to around 12 per day. Good middle ground if you need real-time monitoring but not every half hour.

Option B: Run heartbeat in an isolated session

If you want to keep the current frequency but reduce what each call costs, isolate it. By default, each heartbeat carries your full conversation context - roughly 100,000 tokens. With isolatedSession, it starts fresh each time, dropping that to 2,000-5,000 tokens:

openclaw config set agents.defaults.heartbeat.isolatedSession true

You can combine this with Option A for an even bigger reduction.

Option C: Disable heartbeat and replace it with cron (what I did)

This is the path I took. I turned off heartbeat entirely:

openclaw config set agents.defaults.heartbeat.every "0m"

Then replaced it with cron jobs - each scoped to a specific task, running in isolation with no context carryover. Also set isolation for cron so each scheduled run stays cheap:

openclaw config set agents.defaults.cron.isolatedSession true

Cron vs. heartbeat tradeoff: Heartbeat batches multiple checks into one turn; cron runs each task in isolation. One heartbeat can be cheaper than five isolated cron jobs for routine monitoring - but cron is cleaner for discrete scheduled tasks with no shared state. Worth reading the official comparison before deciding.

Fix 2 - Right-Size Your Models

The second lever is which model handles which work. OpenClaw runs as a single agent by default, which means every request - heartbeat checks, routing decisions, complex reasoning - hits the same model. There is no way to send cheap work to a cheap model. Three steps change that.

Model tiering: Configuring different agents or task types to use different models based on cost and capability requirements. The goal is to match model capability to what the task actually needs - not always the cheapest, not always the most capable.

Step 1: Switch to multiple agents

As long as you have one agent, every request shares one model configuration. OpenClaw supports multi-agent routing once you enable it - just ask OpenClaw to convert and it will update its own configuration. After that you can create focused agents scoped to specific task types, each independently configured.

Step 2: Switch to OpenRouter

Before assigning models, route everything through OpenRouter rather than connecting OpenClaw directly to each provider.

Model gateway: A unified API layer that routes requests to multiple LLM providers through a single endpoint. Instead of managing separate API keys for Anthropic, Google, Mistral, and others, you configure one endpoint and swap models with a string change.

The practical workflow: start with a mid-tier model, run your actual tasks, check quality, step down to something cheaper if quality holds, step up only when you need it. That iteration is painful when you are locked into one provider and easy when you are not.

OpenRouter also puts per-token costs for all supported models in one place, so you can compare side by side without juggling provider dashboards. It adds a small markup over direct API pricing, but in my experience that is negligible relative to the savings from picking the right model.

Step 3: Assign the right model to each agent

Now that you have separate agents and a single gateway, map capability to cost:

  • A lightweight model for heartbeat, monitoring, and simple routing
  • A premium model reserved for complex reasoning, long-context analysis, or high-stakes output

A few models I have used for lower-cost slots:

  • StepFun - a pleasant surprise I found while reviewing popular tool-capable models on OpenRouter; free to start, fast, and a solid Haiku-level replacement in my experience so far (confirm before you commit - free tiers can change)
  • Gemini Flash-Lite, GPT-mini, Gemini Flash - solid smaller frontier options
  • Haiku (Anthropic) - good balance of cost and capability
  • DeepSeek V3 - cheap, but I found response latency noticeably high in practice; your mileage may vary

OpenRouter maintains a curated list of models vetted for tool-calling use cases - a good starting point for picking a model for your orchestration layer.

One caveat before you commit: some cheaper models lack image and media support. Confirm your requirements against current OpenClaw docs before locking in a model.

What It Should Cost

There are two separate cost buckets worth keeping distinct.

LLM API costs - what this post covers. The exact savings depend on which model you choose and your usage pattern, but the combination of heartbeat isolation and model tiering consistently brings costs to a fraction of the default. The 90% reduction in the headline is the ceiling if you are coming from an Opus-default setup with unconfigured heartbeat; your actual number depends on your workload.

Hosting costs - a separate line item. I run OpenClaw on Hetzner for about $12/month. If you have a dedicated local machine that can run always-on, that unlocks another option: open-weight models via Ollama or similar, which can drive LLM API spend to near zero for many workloads. The tradeoff is hardware requirements and setup complexity.


Start with the heartbeat - either disable it and replace with cron, or reduce the frequency. Then adjust your models: convert to multi-agent routing, point everything through OpenRouter, and assign each agent a model that fits what it actually does. Matching each type of work to the right model and keeping background calls lean will bring your bill down considerably.