Why Your AI Coding Agent Gets Exponentially More Expensive (and What to Do About It)

# ai# programming# productivity# llm

Anton Abyzov

If you're using Claude Code, Cursor, or any LLM-based coding agent, there's a cost pattern you should...

If you're using Claude Code, Cursor, or any LLM-based coding agent, there's a cost pattern you should know about: your sessions get quadratically more expensive as they grow.

A detailed analysis from exe.dev breaks it down.

The problem

Every time the agent makes an API call, it reads the entire conversation history from the cache. The cost of those cache reads grows with both the context length AND the number of calls. That's not linear growth. It's quadratic.

The numbers

At 27,500 tokens: cache reads = 50% of total cost
At 100,000 tokens: cache reads = 87% of total cost
A single "ho-hum" feature implementation cost $12.93

The formula is roughly: total_cost = output_tokens * num_calls + cache_read_price * context_length * num_calls. That second term grows quadratically because both context_length and num_calls increase together.

What actually helps

1. Start fresh conversations more often

It feels wasteful to lose context, but re-establishing context is almost always cheaper than the growing cache read tax. A fresh session with a clear prompt costs a fraction of continuing a bloated conversation.

2. Use spec-bounded sessions

This is exactly why I built SpecWeave: each task has a clear spec with acceptance criteria, and the AI works within that boundary. Short, focused sessions instead of open-ended marathons.

When each task has a defined scope, you naturally keep conversations short. The AI knows when it's done because the spec tells it.

3. Delegate to sub-agents

Work done in a separate context window doesn't add to your main conversation's cache. If your agent framework supports sub-agents (Claude Code does), use them. The overhead of spawning a new context is almost always less than the cost of an ever-growing main context.

4. Let tools return large outputs in one call

Splitting a file read into five smaller reads is actually MORE expensive because each one adds another cache read of the full history. Batch your tool calls when possible.

The meta-lesson

Context management, cost management, and agent orchestration are all the same problem. The developers building workflows that respect these constraints will ship faster and cheaper than those who let agents run unbounded.

The teams that figure this out early have a real advantage. Not because they're smarter, but because they're spending 3x less per feature while shipping at the same velocity.

Full analysis: Why AI Agents Are Expensively Quadratic

What cost patterns have you noticed in your AI coding workflows?