
Amariah KamauWhen we set out to build Blueprint — Atlarix's structural codebase retrieval system — the hypothesis...
When we set out to build Blueprint — Atlarix's structural codebase retrieval system — the hypothesis was simple: give the AI a map of the codebase upfront, and it will need to read fewer files. Fewer files means less context. Less context means lower cost and faster responses.
We ran a controlled benchmark. The AI with Blueprint used 54% more context than the AI without it.
Here's why that's not a failure.
If you've used Cursor, Claude Code, or GitHub Copilot on a large codebase, you've hit this wall: the AI either reads too much (dumping raw files into context until you hit the limit) or reads too little (making confident wrong assumptions about files it hasn't seen).
The root cause is navigation. Without a structural map of the codebase, the AI is exploring blind — making guesses about which files matter, following import chains manually, or relying on whatever files happen to be open. In a multi-repository workspace, this gets worse fast. You might have 25 separate projects with thousands of files. The AI has no idea where it is.
The standard solutions are:
We built Blueprint to try a fourth approach: give the model a symbolic structural graph before it starts exploring.
Blueprint is a four-layer index:
Layer 1 — Universal Ctags (symbol index)
Extracts every function, class, type, and method across 18 languages. Line-accurate positions. Cached to .atlarix/symbols.json.
Layer 2 — ast-grep (structural edges)
AST-level pattern matching for import edges, call edges, and HTTP route edges. Express app.get, Fastify fastify.post, Next.js export async function GET — all become first-class nodes in the graph.
Layer 3 — BM25 (semantic symbol ranking)
Ranks ctags symbols by concept query. "Authentication middleware" finds the right functions without requiring an exact name match.
Layer 4 — ripgrep (text fallback)
Exact string search for when you know precisely what you're looking for.
The output is a compact Markdown slice — rooms (directory-scoped clusters), beacons (individual symbols), and edges (structural relationships). Section-scoped: the agent requests one folder at a time, not the whole workspace.
We ran two arms of the same task on a production multi-repository workspace:
Task: Trace an event-driven HTTP-ingress-to-webhook-reply pipeline. Both arms had identical deliverables — narrative of the flow, key file paths, Mermaid sequence diagram.
Arm A (with Blueprint): Prescribed tool order — explore folder → get_blueprint → text search → read_file on 2-3 central files.
Arm B (without Blueprint): Same task, no get_blueprint — only explore folder → text search → read_file.
Model: Kimi K2.6 (268K context window) via OpenRouter. Same model, same provider, both arms.
| With Blueprint | Without Blueprint | |
|---|---|---|
| Blueprint slice | ~6,500 tokens | 0 |
| Final billed input | 63,541 tokens | 41,327 tokens |
| Output tokens | 2,671 | 2,534 |
| Task completion | ✅ | ✅ |
Blueprint arm used 54% more total context.
Context growth per turn:
With Blueprint: 8,661 → 13,966 → 24,771 → 25,012 → 31,717 → 54,188 → 63,541
Without Blueprint: 2,253 → 3,567 → 8,629 → 13,934 → 14,175 → 37,876 → 41,327
The Blueprint arm took six tool-call turns. The no-Blueprint arm took five.
Here's what we found in the qualitative output comparison:
The Blueprint arm named 7 specific internal functions by exact identifier — the auth validator, mention detector, memory clamp, post-processor, card builder, and two others. It surfaced a section-specific post-processor module not explicitly requested.
The no-Blueprint arm found a client module in an eval/ subdirectory that Blueprint's section scope hadn't included. It named specific environment variables and API constants the text search found directly.
Both arms completed the task correctly. But the type of knowledge was different.
Blueprint gave the model a symbol-level map before any file was read. With that map, the model knew which files were worth reading and went deeper — more function names, more architectural detail, more thorough coverage. Without the map, the model explored more conservatively: followed fewer paths, read fewer files, stopped sooner.
The no-Blueprint arm used fewer tokens partly because it was less certain about what to look for next.
For a read-only exploration task, "explored less" isn't obviously worse. Both arms got the answer. But for write tasks — bug fixes, refactors, feature implementation — a model that stops exploring because it's navigationally lost is not saving tokens. It's missing dependencies, and those missing dependencies become production bugs.
The honest framing isn't "Blueprint reduces total context." It's that these are two different problems:
Structural understanding cost — how many tokens does it take to know where you are in the codebase?
With Blueprint: ~6,500 tokens, regardless of section complexity, in ~3 seconds.
Without Blueprint: amortised across many search/read tool calls over multiple turns.
Execution context — how many tokens accumulate as the model actually does the work?
This is determined by exploration depth — how many files the model reads, how many tool calls it makes. Blueprint increases this by making the model more confident. But it's bounded and manageable.
We address the execution context problem with a separate mechanism: post-turn tool-result summarisation. After each turn, large tool outputs in the persisted transcript are rewritten by a fast compaction model — keeping paths, symbol names, and key values, dropping JSON noise and repetition. In the benchmark runs, individual read_file results compressed from 2,500–3,500 tokens to 60–110 tokens. ~95–98% reduction per qualifying block.
Two mechanisms, two layers, two different problems.
If you're building an AI coding tool, an agentic system, or anything that needs to navigate a large codebase:
Don't chase "total context reduction" as a single metric. It conflates structural overhead (knowable upfront, bounded by your retrieval design) with execution noise (determined by task complexity and model confidence).
Give the model a map before it explores. Not raw files — a structural graph. The model will use more total context because it will explore more thoroughly. That's the right trade for write tasks.
Compress history, not retrieval. Post-turn summarisation on tool outputs is more effective than trying to cram less information into the initial retrieval. The model needs the full file during the turn. Future turns don't.
This benchmark is documented in a technical paper published on Zenodo with full methodology, exact prompts, provider-billed token counts, and an honest discussion of limitations:
Blueprint: Section-Scoped Structural Graph Retrieval and Post-Turn Compression for Agentic LLM Coding in Multi-Repository Workspaces
zenodo.org/records/20381860 · DOI: 10.5281/zenodo.20381860
Atlarix is available at atlarix.dev. The MCP server registry is open-source at github.com/AmariahAK/atlarix-mcps.
Built in Nairobi. Questions or thoughts? Drop them in the comments.