zacOpenClaw rate limit cooldown cascade explained. When one model hits a rate limit, the fallback chain can fail too. Root cause from GitHub #53906 and working...
Originally published on Remote OpenClaw.
Marketplace
Free skills and AI personas for OpenClaw — browse the marketplace.
Join the Community
Join 1k+ OpenClaw operators sharing deployment guides, security configs, and workflow automations.
The cooldown cascade is one of the most frustrating bugs in OpenClaw's model routing system, tracked as GitHub issue #53906. Here is what happens: your agent is running normally, using Claude as the primary model with GPT-4o as the fallback. Claude hits a rate limit. OpenClaw's model router is supposed to gracefully switch to GPT-4o and continue serving requests. Instead, GPT-4o also gets marked as rate-limited even though it has not received a single request. Every model in your fallback chain enters cooldown simultaneously, and your agent goes completely offline.
The cascade does not just affect two models. If you have configured a chain of three or four fallback models — a common production setup — all of them enter cooldown at the same time. Your agent stops responding to messages, scheduled tasks fail, and any in-flight conversations are dropped.
This bug disproportionately affects operators who run high-volume agents. If you are processing hundreds of messages per hour across multiple channels, you are more likely to hit rate limits, and the cascade makes it impossible to recover gracefully. The exact scenario the fallback system was designed to handle is the one it fails at.
The root cause is in OpenClaw's model router. When a model returns a 429 (rate limited) response, the router enters a cooldown handler. The handler is supposed to mark only the affected model as unavailable and route future requests to the next model in the chain. Instead, the cooldown handler iterates over all configured models and sets a cooldown timer on each one.
This happens because the cooldown state is stored in a shared object. The handler does not distinguish between "this specific model hit its rate limit" and "the model routing system encountered a rate limit." It treats the event as a system-wide rate limit rather than a per-model one.
The code path is roughly:
// What should happen:
models[failedModelIndex].cooldownUntil = Date.now + cooldownMs;
// What actually happens:
models.forEach(m => m.cooldownUntil = Date.now + cooldownMs);
This is a classic shared-state bug. The cooldown timer is applied globally when it should be applied to a single model instance. The fix is conceptually simple (scope the cooldown to the specific model), but the implementation is complicated by how the model router handles provider-level vs model-level rate limits.
Some rate limits are per-model (you exceeded the RPM for Claude Sonnet specifically), while others are per-provider (you exceeded your Anthropic API account's overall rate limit). The router needs to handle both cases differently, and the current code does not make this distinction.
Marketplace
Free skills and AI personas for OpenClaw — browse the marketplace.
The first thing every operator tries when their agent goes offline is restarting the container. With the cooldown cascade, this does not help. Here is why.
OpenClaw persists session state to disk. This includes conversation history, memory, and — critically — model cooldown timers. When you restart the container, OpenClaw reads the session store from the data/ directory, reloads the cooldown timers, and immediately enters the same state it was in before the restart.
The cooldown timers are timestamp-based. If a model was set to cooldown until 14:30:00 and you restart at 14:25:00, the model remains in cooldown for another 5 minutes after restart. If all models were cascaded into cooldown at the same time, they all remain in cooldown after restart.
This is actually by design for single-model scenarios — you do not want a restart to immediately hammer a rate-limited API. But in the cascade scenario, it means the restart that should fix the problem perpetuates it.
When you hit the cascade and need your agent back online immediately, here are three workarounds in order of disruption:
Option 1: Wait it out (least disruptive). The cooldown timers expire on their own. Default cooldown duration is 5 minutes in OpenClaw 3.22 and 10 minutes in 3.23. Check your OPENCLAW_RATE_LIMIT_COOLDOWN environment variable if you have customized it. Once all timers expire, normal operation resumes automatically.
Option 2: Manual model swap (moderate disruption). While the cascade affects all models in your configured fallback chain, you can manually switch to a model that was not part of the chain. Stop the container, change your OPENCLAW_MODEL_NAME to a different model (even the same underlying model with a different alias), and restart. This bypasses the cooldown timers because the new model name has no cooldown state.
# Stop the container
docker compose down
# Edit docker-compose.yml — change model name
# e.g., from claude-sonnet-4-20250514 to claude-sonnet-4-20250514
# Restart
docker compose up -d
Option 3: Clear cooldown data (most disruptive). This wipes all cooldown state and forces a fresh start. It does not affect your conversation history or memory — only the cooldown timers.
# Stop the container
docker compose down
# Remove cooldown state files
rm -f ./data/session/cooldowns.json
rm -f ./data/session/model-state.json
# Restart
docker compose up -d
After clearing the state, OpenClaw starts with all models available. If the underlying rate limit at the API provider has not expired yet, the first model in your chain may immediately hit a 429 again. In that case, the cascade will re-trigger unless you have applied the multi-provider fix below.
The best defense against the cooldown cascade is configuring models from different providers in your fallback chain. Even with the cascade bug, API rate limits are per-provider. If Claude (Anthropic) hits a rate limit, GPT-4o (OpenAI) is unaffected at the provider level.
The cascade bug marks all models as cooldown in OpenClaw's internal state, but if you clear the state and have models from multiple providers, you can immediately fall back to a provider that is not actually rate-limited.
In OpenClaw 3.23, the partial fix introduces per-provider cooldown isolation. This means the cascade no longer crosses provider boundaries. If your Anthropic model hits a rate limit, only other Anthropic models in the chain enter cooldown. OpenAI and Ollama models remain available.
Here is a recommended multi-provider fallback configuration for your environment variables:
OPENCLAW_MODEL_PROVIDER=anthropic
OPENCLAW_MODEL_NAME=claude-sonnet-4-20250514
OPENCLAW_FALLBACK_1_PROVIDER=openai
OPENCLAW_FALLBACK_1_MODEL=gpt-4o
OPENCLAW_FALLBACK_2_PROVIDER=ollama
OPENCLAW_FALLBACK_2_MODEL=llama3:8b
With this configuration on 3.23, a Claude rate limit only cascades to other Anthropic models (if any). GPT-4o and Llama 3 remain available as fallbacks. This is effectively a complete workaround for the cascade issue.
For operators who use a single provider, the other option is to configure multiple API keys from the same provider. Different API keys have independent rate limits. You can set up separate provider entries using different keys, and the cascade cannot propagate across them because each key has its own rate limit bucket.
Version
Cascade Behavior
Fix Status
3.21 and earlier
Full cascade — all models affected
No fix available
3.22
Full cascade — all models affected
Bug reported as #53906
3.23
Per-provider isolation — cascade limited to same provider
Partial fix merged
3.24 (upcoming)
Per-model isolation — no cascade
PR open, expected Q2 2026
If you are on 3.22 or earlier, upgrading to 3.23 with a multi-provider fallback chain effectively eliminates the cascade issue. If you are stuck on a single provider, the full fix in 3.24 is the one to wait for.
The cooldown cascade is a bug (GitHub #53906) where hitting a rate limit on one model triggers cooldown timers on all fallback models simultaneously. Instead of gracefully switching to a backup model, OpenClaw marks all models as rate-limited at once, leaving no available model to handle requests.
The cooldown state is persisted in the session store. Restarting the OpenClaw container reloads the same cooldown timers from disk. The models remain marked as rate-limited even after a fresh restart. You need to either wait for all cooldown timers to expire or manually clear the session store.
Three immediate workarounds: (1) Stop OpenClaw, delete the session cooldown data from the data directory, and restart. (2) Manually switch to a model on a different provider that was not part of the cascade. (3) Wait for the longest cooldown timer to expire (usually 5-15 minutes depending on your config).
Partially. OpenClaw 3.23 introduced isolated cooldown timers per provider, which prevents cross-provider cascade. However, models within the same provider can still cascade. The full fix with independent per-model cooldowns is expected in 3.24.