5 Context Window Tricks That Cut My Token Usage in Half

# ai# programming# productivity# llm
5 Context Window Tricks That Cut My Token Usage in HalfNova Elvaris

I was burning through tokens like they were free. Then I started measuring. My average coding session...

I was burning through tokens like they were free. Then I started measuring. My average coding session used ~80K tokens, and most of it was wasted context the model didn't need.

After a month of experimenting, I cut that to ~35K tokens per session with zero loss in output quality. Here are the five tricks that made the difference.

1. The File Summary Header

Instead of pasting an entire file into context, I prepend a 3-line summary:

// FILE: src/auth/middleware.ts (147 lines)
// PURPOSE: Express middleware for JWT validation + role-based access
// EXPORTS: authMiddleware, requireRole, extractUser
Enter fullscreen mode Exit fullscreen mode

Then I only include the specific function I need help with. The model gets enough context to understand the architecture without reading 147 lines of boilerplate.

Token savings: ~60% per file reference.

2. The Dependency Stub

When the model needs to understand how a function interacts with other modules, don't paste the full dependency — paste a stub:

// STUB: database.ts
interface DB {
  query<T>(sql: string, params: any[]): Promise<T[]>;
  transaction(fn: (tx: Transaction) => Promise<void>): Promise<void>;
}
Enter fullscreen mode Exit fullscreen mode

The model only needs the interface contract, not the 500-line implementation with connection pooling and retry logic.

Token savings: ~80% per dependency.

3. The Rolling Context Window

For multi-turn sessions, I reset context every 3-4 turns with a summary:

Context reset. Here's where we are:
- We're building a rate limiter for the /api/upload endpoint
- We've decided on a sliding window algorithm with Redis
- The function signature is: rateLimit(userId: string, windowMs: number, maxRequests: number)
- Current blocker: handling Redis connection failures gracefully

Continue from here.
Enter fullscreen mode Exit fullscreen mode

This prevents the "context sludge" problem where the model drags along 20 turns of outdated conversation.

Token savings: ~40% on sessions longer than 5 turns.

4. The Negative Context Declaration

Tell the model what to ignore explicitly:

Focus only on the error handling logic in processPayment().
Ignore: logging, metrics, the retry wrapper, input validation.
These are tested and working — don't modify or comment on them.
Enter fullscreen mode Exit fullscreen mode

Without this, the model will "helpfully" refactor your logging, suggest improvements to your validation, and burn tokens on things you didn't ask about.

Token savings: ~30% on modification tasks.

5. The Output Budget

Constrain the response format upfront:

Return ONLY:
1. The modified function (no surrounding code)
2. A 2-line summary of what changed
3. One potential edge case to test

Do NOT include: explanations of existing code, import statements,
or alternative approaches.
Enter fullscreen mode Exit fullscreen mode

I started doing this after noticing that ~40% of most AI responses was explanation I didn't need. The code was fine — the commentary was the waste.

Token savings: ~40% on output tokens.

The Combined Effect

Using all five together on a typical refactoring task:

Technique Before After
File references 12K tokens 4K tokens
Dependencies 8K tokens 2K tokens
Conversation history 25K tokens 15K tokens
Unfocused responses 15K tokens 8K tokens
Verbose output 20K tokens 6K tokens
Total 80K 35K

That's not just cheaper — it's faster. Smaller context means faster inference, fewer hallucinations, and more focused output.

The Meta-Lesson

Context windows are not "how much the model can read." They're a budget. Every token you spend on unnecessary context is a token not available for reasoning about your actual problem.

Treat your context window like RAM: measure it, manage it, and stop assuming more is better.


Start with trick #1 (file summary headers) — it's the easiest to adopt and has the highest payoff. Then layer in the others as they feel natural.

Your wallet and your response quality will both thank you.