Aerostack
Aerostack

Why Your OpenClaw Bill Is $270/Month (And How to Cut It to $18)

The 6 cost drivers behind OpenClaw's token burn, the community's fixes, and the structural change that cut my bill from $270 to $18/month.

Navin Sharma

Navin Sharma

@navinsharmacse

May 16, 20266 min read
Why Your OpenClaw Bill Is $270/Month (And How to Cut It to $18)

Why Your OpenClaw Bill Is $270/Month (And How to Cut It to $18)

My first month running OpenClaw with Claude Sonnet cost me $340. I checked the dashboard three times because I thought there was a billing error.

There wasn't. That's just what happens when you run a single conversation thread for 30 days without understanding how context accumulation works.

I spent two weeks tracing every token. Rate limits were fine. API quotas were fine. Nobody had hijacked my account. The math was simple and brutal: most of that $340 went to context I'd already paid for, being resent with every single message. The same tool outputs, the same conversation history, the same system prompt — over and over and over.

If you're running agents on OpenClaw and haven't looked at your token bill closely, stop reading and go check. I'll wait.

The Six Cost Drivers

I didn't discover this alone. Apiyi wrote a solid analysis of why OpenClaw is token-intensive, and Tom Smykowski documented how he traced every token and cut his bill by 90%. Their findings aligned with what I was seeing:

Context Accumulation — Your session history grows indefinitely. Every message you send, every tool output, every reasoning step gets stored. After a week of steady conversation, you've accumulated 50,000–100,000 tokens of context that OpenClaw sends with every new message.

Large Tool Outputs Saved and Resent — When your agent calls a tool that returns 5,000 tokens of output, OpenClaw doesn't discard it after responding. It saves it to the session and includes it in every future message to provide "continuity." Smart design. Expensive behavior.

Complex System Prompts Resent Each Time — Your agent's system prompt might be 2,000–3,000 tokens. OpenClaw resends it with every message. For a well-defined agent with detailed instructions, that adds up fast.

Multi-Turn Reasoning Requiring Multiple API Calls — Complex tasks often need the agent to think through multiple steps, call multiple tools, and refine its approach. Each turn burns tokens.

Using Expensive Models for Simple Tasks — Running Claude Sonnet for a quick lookup is overkill. Cost-wise and latency-wise. But it's the default.

Background Heartbeats Consuming Tokens — OpenClaw maintains agent sessions with periodic heartbeat requests to keep them warm. These aren't free; they consume tokens even when nobody's actively talking.

These aren't bugs. They're design tradeoffs. But they compound quickly.

The Token Math That Shocked Me

Let me break down what I was actually paying for.

Claude Sonnet costs about $3 per million input tokens. That's cheap by LLM standards, but tokens add up.

A typical OpenClaw conversation after one day of use accumulates roughly 60,000 tokens of context. That includes system prompts, previous messages, tool outputs, reasoning traces, and session metadata.

When you send a message to your agent, OpenClaw includes all that context. So the cost per message isn't just the new tokens you're asking for — it's:

  • New message: ~200 tokens

  • Accumulated context: ~60,000 tokens

  • Total: ~60,200 tokens per message

  • Cost: ~$0.18 per message

If your agent handles 50 messages per day:

  • Daily cost: 50 × $0.18 = $9/day

  • Monthly cost: 30 × $9 = $270/month

The crazy part? Most of that 60,000-token context blob is irrelevant to what the agent is actually trying to do right now.

The Workarounds That Help

I found a bunch of recommendations in the community. Apiyi's analysis mentioned a feature request (GitHub issue #5431) for a "Context Optimizer" that could reduce token usage by 30–70%. LaoZhang AI published a 50–80% reduction guide. centminmod's OpenClaw Token Optimization Guide covered tuning and pruning strategies.

These all work, to a degree:

Reset Sessions Regularly — OpenClaw docs recommend this. If you kill your session after each major task, you lose all that accumulated context. New session = fresh start. You lose continuity, but you save money. Trade-off worth understanding.

Switch to Cheaper Models for Simple Tasks — Use Claude Haiku instead of Sonnet for quick lookups, fact checks, or routine tasks. Haiku runs at a fraction of Sonnet's cost and handles simple work just fine. OpenClaw lets you specify model per agent, so this is easy to implement.

Limit Bootstrap Files — Your agent config has bootstrapMaxChars (default 20,000). This controls how much context gets loaded at startup. Reduce it if you don't need full history.

Heartbeat Tuning — Set your heartbeat interval just under the cache TTL. Shorter intervals = more heartbeats = more token costs. There's a sweet spot.

Cache-TTL Pruning for Idle Sessions — If a session hasn't been active in hours, let it expire. New session when conversation resumes. Loses continuity but saves background token burn.

These all help. But they're workarounds for the real problem: monolithic conversation threads.

What Actually Made the Difference

Token cost comparison — single thread with 60K tokens per message vs. topic-separated threads with ~4K each

I got real savings when I stopped trying to keep everything in one conversation.

Instead of running one long OpenClaw thread for everything, I split into separate conversations by topic: DevOps, Marketing, Development. Each conversation handles its own domain.

The math changed completely:

  • One long conversation: ~60,000 tokens of accumulated context

  • Three separate conversations: ~4,000 tokens each

  • Cost per message in separated model: ~$0.012 (vs. $0.18)

  • Monthly estimate: ~$18 (vs. $270)

That's where the big win came from.

Now, I'll be honest: on paper that's a 93% reduction. In practice, I see 40–60% savings because I run multiple active conversations simultaneously and some of them do grow over time. Still huge. Still worth doing.

And this isn't a hack. OpenClaw's own documentation recommends "splitting tasks into separate sessions for better performance and cost efficiency." I just made it structural and intentional instead of accidental.

Why This Matters Now

Most people don't check their OpenClaw token usage until they get the bill. By then you're already committed. But if you're in that phase right now, the first thing to do is look at your API usage dashboard and see what's actually consuming tokens.

The second thing is to audit your agent design. Are you running one massive conversation that handles everything? That's your $270 right there. Could you split it into 3–5 focused conversations? That's where the real cost savings come from.

The community has documented a lot of tactical tweaks that help. Use cheaper models when you can. Reset sessions between major tasks. Tune your heartbeat. These all move the needle.

But the structural shift — separating concerns into different conversations — that's what actually made my bill manageable.

Your number might be different than $270. Depends on how active your agents are, which models you're using, how much context accumulates before you reset. But if you're running agents in production, the principle is the same: check the math, understand where your tokens are going, and design for efficiency instead of convenience.

The $270 number was my wake-up call. If you haven't checked your token usage recently, go look at your API dashboard. You might be paying for a lot of context you don't need.


Part of the Agent Operations series. Start with the full guide: "I Run 5 MCP Servers on OpenClaw"

See how workspace-level context isolation works: aerostack.dev