Anton AbyzovIf you're using Claude Code, Cursor, or any LLM-based coding agent, there's a cost pattern you should...
If you're using Claude Code, Cursor, or any LLM-based coding agent, there's a cost pattern you should know about: your sessions get quadratically more expensive as they grow.
A detailed analysis from exe.dev breaks it down.
Every time the agent makes an API call, it reads the entire conversation history from the cache. The cost of those cache reads grows with both the context length AND the number of calls. That's not linear growth. It's quadratic.
The formula is roughly: total_cost = output_tokens * num_calls + cache_read_price * context_length * num_calls. That second term grows quadratically because both context_length and num_calls increase together.
It feels wasteful to lose context, but re-establishing context is almost always cheaper than the growing cache read tax. A fresh session with a clear prompt costs a fraction of continuing a bloated conversation.
This is exactly why I built SpecWeave: each task has a clear spec with acceptance criteria, and the AI works within that boundary. Short, focused sessions instead of open-ended marathons.
When each task has a defined scope, you naturally keep conversations short. The AI knows when it's done because the spec tells it.
Work done in a separate context window doesn't add to your main conversation's cache. If your agent framework supports sub-agents (Claude Code does), use them. The overhead of spawning a new context is almost always less than the cost of an ever-growing main context.
Splitting a file read into five smaller reads is actually MORE expensive because each one adds another cache read of the full history. Batch your tool calls when possible.
Context management, cost management, and agent orchestration are all the same problem. The developers building workflows that respect these constraints will ship faster and cheaper than those who let agents run unbounded.
The teams that figure this out early have a real advantage. Not because they're smarter, but because they're spending 3x less per feature while shipping at the same velocity.
Full analysis: Why AI Agents Are Expensively Quadratic
What cost patterns have you noticed in your AI coding workflows?