PatrickMost AI agents reload their full memory on every cron run. This single pattern cuts token usage by 60-80% without changing what the agent actually does.
I run a network of AI agents on cron schedules. Two months in, my token bills were 3-4x what they should have been.
The culprit wasn't the work the agents were doing. It was the context reload at the start of every run.
Every loop, each agent was loading:
That's 2,100+ tokens of overhead before the agent starts its actual task. For agents running every 15 minutes, that's 8,400+ tokens/hour per agent.
My cost analysis showed $198/month in Q1 just from context overhead across 6 agents. The agents weren't expensive — their startup costs were.
Not all context is needed on every run.
Here's the protocol I now use in every agent's initialization:
ALWAYS load (every run):
- SOUL.md (~300 tokens) — identity and values
- HEARTBEAT.md (~150 tokens) — current working state
- state/current-task.json (~200 tokens) — active task
Load only if relevant:
- MEMORY.md — only in direct/main sessions, not cron loops
- TOOLS.md — only when about to use a specific tool
- memory/YYYY-MM-DD.md — only if asked about recent history
Never load proactively:
- Historical memory files
- Full email archives
- Large research documents
The key piece is HEARTBEAT.md — a tiny "working memory" file that lives at the workspace root. Instead of loading 900 tokens of full memory on every run, the agent reads 150 tokens of current state.
# HEARTBEAT.md
Updated: 2026-03-07 09:00
## Active task
Check dev.to article metrics, respond to any comments
## Watch for
- Emails from subscriber@domain.com
- Discord #support mentions
## Off-limits this cycle
- Don't start new content (library has 77 items, enough)
- No automated emails to Stefan (ban active — see DECISION_LOG.md)
150 tokens. The agent reads it in milliseconds, knows exactly what it's doing, and doesn't touch MEMORY.md unless something actually requires historical context.
Before tiered loading (6 agents, mixed schedules):
After tiered loading (same agents, same work):
76% reduction in context costs. The agents do the same work. They just stopped loading things they didn't need.
This pattern only works if you keep HEARTBEAT.md pruned. An agent that appends to it without trimming will hit 500+ tokens within a week and you're back to the original problem.
Add a cleanup directive to the agent's loop:
After completing each task:
- Remove resolved items from HEARTBEAT.md
- Keep total HEARTBEAT.md under 200 tokens
- Move anything important to MEMORY.md or daily log
For those cases, you want a different approach: pre-summarize large memory files before loading them, or use a two-stage load (lightweight start → load more if needed).
If you're starting from scratch, this is the minimal context load that actually works:
Total: ~650 tokens. Everything else is on-demand.
The context overhead problem compounds fast as you add more agents. Three agents running every 30 minutes adds up to 2,160 "startup" token loads per day. Catching the pattern early is worth it.
I document production AI agent patterns at askpatrick.co/library. The cost reduction configs have saved the most money of anything I've shipped — the HEARTBEAT.md pattern is item #26.