How Agent Island Detects Claude Code and Codex Session State

# ai# opensource# swift# programming
How Agent Island Detects Claude Code and Codex Session Statetristan666666

I built Agent Island around a small problem that becomes painful during long agentic coding runs: the...

I built Agent Island around a small problem that becomes painful during long agentic coding runs: the task does not always fail loudly.

A Claude Code or Codex session may still be working, may have finished and be waiting for your next instruction, or may have frozen mid-turn. If that happens while you are away from the keyboard, the only visible signal is usually buried in a terminal or transcript file.

Agent Island turns that into a local macOS signal in the MacBook notch / menu-bar area:

  • working: the provider logo breathes
  • your turn: the provider logo spins
  • stalled: the provider logo turns red and can play a short beep sequence

Launch video: https://github.com/user-attachments/assets/d69b41e0-9298-4f17-b6c9-6014f3bd956b

Repo: https://github.com/tristan666666/agent-island

Download DMG (macOS 13+): https://github.com/tristan666666/agent-island/releases/download/v1.0.0/AgentIsland-1.0.0.dmg

The data source is local

There is no shared live-state API for Claude Code and Codex CLI sessions, so Agent Island reads the same local artifacts the tools already write.

Claude Code activity is detected from JSONL transcript files under:

~/.claude/projects/*.jsonl
Enter fullscreen mode Exit fullscreen mode

The trigger picker uses Claude Desktop's session store for nicer labels and archived filtering:

~/Library/Application Support/Claude/claude-code-sessions/local_*.json
Enter fullscreen mode Exit fullscreen mode

Codex activity is detected from recent JSONL sessions under:

~/.codex/sessions/YYYY/MM/DD/*.jsonl
Enter fullscreen mode Exit fullscreen mode

For Codex session labels, Agent Island reads the first session_meta JSONL event and uses its id and cwd.

This keeps state detection local. The app does not need a third-party service just to decide whether a session is still moving.

The state model

The scanner runs every six seconds and classifies each candidate transcript. File I/O happens off the main actor so the notch UI stays responsive.

The core inputs are:

  • transcript modification time
  • the last few JSONL events
  • whether this app has recently observed the file producing output

The current thresholds are intentionally conservative:

Signal Threshold Meaning
active window 18 seconds Fresh transcript writes mean the session is working.
stall after 300 seconds A watched working session that freezes mid-turn for 5 minutes becomes stalled.
stall cap 15 minutes Past this, the app treats it as idle rather than a live stall.
needs-you cap 20 minutes A finished turn is only surfaced as "your turn" while it is still fresh.
attention window 30 minutes Old transcripts are ignored.

That last point matters. Agent Island only sounds the alarm if it first saw the session working. A transcript that was already old when the app launched should not suddenly become a red alert.

Detecting "your turn"

Agent Island tails the last 128 KB of the transcript, capped to 200 lines, so a recent completion marker is unlikely to be pushed out by a burst of tool output.

For Claude Code, the scanner looks for an assistant event whose stop reason indicates the assistant turn is complete:

stop_reason: end_turn
stop_reason: stop_sequence
stop_reason: stop
Enter fullscreen mode Exit fullscreen mode

For Codex, the scanner looks for Codex event messages:

payload.type: task_complete
Enter fullscreen mode Exit fullscreen mode

If a newer task_started appears before a completion marker, Codex is treated as still working.

Detecting stalled sessions

A stalled session is not just "a file did not change for a while." That would be too noisy.

The scanner only returns stalled when all of these are true:

  • the transcript was recently observed in the working state
  • the transcript has stopped changing for at least five minutes
  • no fresh turn-complete marker is visible in the tail
  • the session is still inside the 15-minute stall cap

This catches the failure mode the app cares about: a long-running agent that was active, then froze mid-turn while you were not watching.

Aggregating per provider

Claude and Codex can each have multiple candidate transcripts. Agent Island classifies each file, then takes the most urgent state for that provider:

idle < working < your turn < stalled
Enter fullscreen mode Exit fullscreen mode

That gives one visible Claude state and one visible Codex state in the notch.

Auto-resume is separate from state detection

The visual state monitor is passive. Auto-resume is explicit and opt-in.

A trigger contains:

  • provider: Claude or Codex
  • session id
  • working directory
  • message to send
  • trigger mode
  • enabled state
  • last fired time

There are two trigger modes:

  • after reset: fire once after the provider's real reset boundary advances
  • every N hours: fire on a fixed local interval

The reset-based mode is deliberately not "wait five hours from now." The app watches the provider usage store and treats a changed reset boundary as the reset signal. That prevents relaunches from firing immediately and lets the app catch up once if the Mac slept through a genuine rollover.

The risk boundary

When a trigger fires, Agent Island resumes the selected CLI session using the provider's resume command. Those flags are powerful: they bypass normal approval prompts and can spend tokens.

That is why auto-resume is not the default experience. The app makes it explicit in the README and UI: only attach triggers to sessions you trust.

Why this belongs in the notch

This is not trying to replace a terminal UI, dashboard, or session browser. Those are useful when you want detail.

The notch is for a narrower job: a persistent, low-friction signal that answers one question while you work on something else:

Is my agent still moving, waiting for me, or stuck?

That is the surface Agent Island optimizes for.

Feedback is welcome, especially from people who run Claude Code or Codex for long refactors and maintenance tasks.