Tuning AI compaction: from "it crashed" to "it triggers early"

# ai# devops# infrastructure# machinelearning
Tuning AI compaction: from "it crashed" to "it triggers early"Julio Molina Soler

After last week's context overflow incident — where OpenClaw's compaction failed at 119% capacity — I...

After last week's context overflow incident — where OpenClaw's compaction failed at 119% capacity — I went into the config and tried to make it behave. Here's the full exploration.

Context

If you missed the previous post: OpenClaw (my local AI agent) stopped responding after its context window hit 119% of the model limit. Compaction failed repeatedly because it needed tokens to summarize, but there were no tokens left. Classic resource exhaustion.

In that post I listed three things I'd change. This is about actually doing one of them: configuring compaction to trigger early, not after it's already too late.


The exploration

Started by checking maintenance logs:

less /home/m900/.openclaw/logs/maintenance.log
Enter fullscreen mode Exit fullscreen mode

Then tried to set a compaction threshold. First attempts:

openclaw config set agents.main.compaction.min_threshold 0.1
openclaw config set agents.defaults.compaction.min_threshold 0.1
Enter fullscreen mode Exit fullscreen mode

Neither worked. Did a config get to inspect the actual structure:

openclaw config get agents
Enter fullscreen mode Exit fullscreen mode

Found the issue: the key changed in v2026.3. It's threshold, not min_threshold.

openclaw config set agents.defaults.compaction.threshold 0.1
Enter fullscreen mode Exit fullscreen mode

This is the kind of silent breaking change that's easy to miss. Always config get before assuming a key name is stable.


Trying different modes

With the threshold set, I experimented with the mode:

openclaw config set agents.defaults.compaction.mode "summarize"
Enter fullscreen mode Exit fullscreen mode

summarize forces compression every time the threshold hits. Too aggressive in practice — it was summarizing mid-task.

Reverted to default:

openclaw config set agents.defaults.compaction.mode "default"
Enter fullscreen mode Exit fullscreen mode

Default mode treats the threshold as a hint, not a hard trigger. Better for interactive sessions where you're mid-task.


The two settings that actually mattered

1. maxHistoryShare — limits what fraction of the context window can be consumed by conversation history before triggering compaction:

openclaw config set agents.defaults.maxHistoryShare 0.1
Enter fullscreen mode Exit fullscreen mode

At 10%, the system reserves 90% of the window for system prompt, tools, and working context. Compaction kicks in much earlier.

2. reserveTokens — explicit headroom reserved for the compaction process itself:

openclaw config set agents.defaults.compaction.reserveTokens 900000
Enter fullscreen mode Exit fullscreen mode

This is the actual fix for last week's root cause. The summarizer needs tokens to generate a summary. If you don't pre-allocate, it can't run when you need it most. Reserving 900k tokens guarantees the compaction process always has room to work.

Both followed by a restart:

openclaw gateway restart
Enter fullscreen mode Exit fullscreen mode

What changed in behavior

Before: Compaction triggered reactively at ~90-100% capacity, sometimes failing because there was no headroom left.

After: Compaction triggers early (when history hits 10% of the window), with guaranteed token reserves. The agent stays responsive.

Tradeoff: Compaction runs more frequently. Each run is a small model call with a minor cost. Predictable behavior beats rare but catastrophic failures.


Lessons

  • API config keys change between versions. min_thresholdthreshold was a silent breaking change. Check with config get before trusting documentation.
  • Reserve resources for your recovery mechanism. If your fallback process needs the same resource it's trying to free, pre-allocate. reserveTokens is exactly this.
  • "Default" mode isn't a cop-out. It genuinely behaves better than forced summarization for interactive sessions.
  • The context window is not free RAM. Every token has a cost: latency, API spend, and overflow risk. Manage it proactively or it manages you.

Part of my build log — a public record of things I'm building, breaking, and learning at the intersection of AI, infrastructure, and Web3.