gentic newsTool search automatically defers MCP tool definitions, replacing them with a single search tool that loads tools on-demand, preserving your context wi
Tool search automatically defers MCP tool definitions, replacing them with a single search tool that loads tools on-demand, preserving your context window for actual work.
Every MCP tool you connect to Claude Code comes with a definition — name, description, and JSON schema — that costs 200-800 tokens. With multiple MCP servers (GitHub, Slack, Jira, etc.), you can easily burn 60,000+ tokens on tool definitions alone, every single turn, before Claude even reads your message.
Tool search solves this by deferring most tool definitions. Instead of sending all 147 tool schemas, Claude Code sends:
ToolSearch toolThis reduces token overhead from ~90,000 to ~15,000 tokens immediately.
When Claude needs a specific tool, it calls:
ToolSearch({"query": "github create issue"})
The system returns a tool_reference for mcp__github__create_issue. On the next turn, that tool's full schema is available, and Claude can call it normally.
The cost? One extra turn and ~200 tokens for discovery.
The savings? ~1.5 million tokens over a 20-turn conversation.
The system uses a priority-ordered checklist:
_meta['anthropic/alwaysLoad'] to force loading every turnshouldDefer flag: Rarely used but availableThis follows Anthropic's principle of "fail closed, fail toward asking." If anything is uncertain, the system loads all tools rather than hiding them.
Tool search operates in three modes controlled by ENABLE_TOOL_SEARCH:
tst (Default)
Always defer MCP and shouldDefer tools. This is the right default — if you're using MCP tools, you've already accepted the latency tradeoff for a larger effective context window.
tst-auto
Threshold-based deferral. Only defer when tools exceed a token budget. Use ENABLE_TOOL_SEARCH=auto or ENABLE_TOOL_SEARCH=auto:50 (where 1-99 is the percentage threshold).
standard
Never defer. Use ENABLE_TOOL_SEARCH=false to disable completely.
Discovered tools are preserved across context compaction through a snapshot system. When compaction occurs, the system takes a snapshot of:
This ensures Claude doesn't lose access to tools it's already discovered, even as the conversation context gets trimmed.
If you're building MCP servers, consider:
alwaysLoad: If a tool is needed on nearly every turn (like a primary database query), opt it out of deferralCheck your current configuration:
echo $ENABLE_TOOL_SEARCH
If you're not seeing tool search benefits, ensure:
standard modealwaysLoad
For maximum savings with minimal latency impact, use the default tst mode. The one-turn discovery overhead is negligible compared to the context window preservation.
tst): Most users — balances savings with discovery latencytst-auto:20: If you're sensitive to latency but still want some savingsstandard: Only if you have very few MCP tools or need every tool available immediatelyThis system represents a fundamental shift in how Claude Code handles tool ecosystems. Instead of paying the token cost for every possible tool upfront, you now pay only for what you actually use.
Originally published on gentic.news