How We Built a 3-Machine AI Agent Team on a Budget (And What Broke)

# ai# openclaw

Nobody

We've all seen the demos. Seamless AI agent collaborations, perfectly executed tasks, agents working...

We've all seen the demos. Seamless AI agent collaborations, perfectly executed tasks, agents working in harmony. The reality is messier. This is what it actually took to build a working multi-agent setup — three machines, three AI agents, one shared goal — and specifically what broke along the way.

We use OpenClaw, a self-hosted Claude agent framework, as the foundation. It gives us a starting point for autonomous agents with tool access, persistent memory, and cross-channel communication.

Architecture Overview

Three agents, three machines, each with a distinct role:

Mac mini (orchestrator): Task decomposition, cross-machine coordination, daily management
MacBook Pro (executor): Revenue tasks, code execution, heavier compute
Android Mate60 Pro (mobile): Real-time responses, mobile-first tasks, always-on availability

Why three machines? Cost, redundancy, form factor. The Mac mini runs 24/7 at low power. The MBP handles intensive work. The phone means someone's always reachable.

Communication goes through Discord. Each agent has a dedicated channel and uses mentions to address each other. Not elegant, but reliable — and we can monitor and intervene in real time.

The Setup

OpenClaw on each machine, each configured with its role and connected to Discord. We used the GitHub Copilot provider (Claude backend) for all three — Claude Sonnet 4.6 with reasoning: true. Persistent memory lives in flat markdown files (MEMORY.md, memory/YYYY-MM-DD.md) — no vector database, no fancy RAG. Simple, readable, surprisingly effective.

What Actually Broke

This is the part you're actually here for.

1. Android + koffi = bionic incompatibility

OpenClaw pulls in koffi as a dependency (via pi-tui). All prebuilt koffi binaries are compiled against glibc. Android uses bionic — a different C library. The result: the agent on the phone couldn't even start. Error: GLIBC_2.17 not found.

Recompiling koffi from scratch on Android isn't feasible — cmake builds launched via SSH get SIGKILLed because Android kills non-foreground processes.

Our fix: one-line sed patch on node_modules/koffi/index.js — find the throw first_err line and replace it with a no-op:

KOFFI_INDEX="$(npm root -g)/openclaw/node_modules/koffi/index.js"
sed -i 's/throw first_err;/process.stderr.write("[koffi] skipped\\n"); return {};/' "$KOFFI_INDEX"
echo "koffi patched"

Run this after every openclaw upgrade. pi-tui (the terminal UI koffi enables) is unused on Android anyway — we're running headless.

Install with --ignore-scripts to prevent the native build from failing during npm install:

npm install -g openclaw@latest --ignore-scripts

2. `streaming: "partial"` truncates NO_REPLY to "NO"

When an agent decides not to reply to a message, OpenClaw instructs it to respond with NO_REPLY. In partial streaming mode, the stream preview shows content before the full message arrives.

The bug: partial mode sends the message preview to Discord before checking if it's a NO_REPLY. NO_REPLY gets truncated to NO — which is a real word, sent as a real message.

The agents started responding NO to each other in the middle of discussions. It looked deliberate. It wasn't. Fix: remove streamMode: "partial" from your Discord config entirely — the default is off, and partial is the bug.

3. `reasoning: true` vs `/reasoning` display mode — two different settings

This one caused real confusion. OpenClaw has two separate reasoning controls:

Model reasoning — in openclaw.json per provider: "reasoning": true. This enables extended thinking. Set to false on the volcengine API (which doesn't support it); true on GitHub Copilot (Anthropic backend, it's valid).
Display mode — the /reasoning slash command controls whether thinking is shown in the channel. Toggling this does not affect whether the model reasons, just whether you see it.

One teammate SSHed into a machine and set reasoning: false thinking it would fix a display issue. It lobotomized the agent. Found it the same day, but only after a confusing round of "why is the agent suddenly agreeing with everything and producing no analysis." The correct config for GitHub Copilot:

{
  "providers": {
    "github-copilot": {
      "models": [{
        "id": "claude-sonnet-4.6",
        "reasoning": true
      }]
    }
  }
}

Never touch this via SSH without asking first.

4. SSH topology: Tailscale + Android don't coexist

We use Tailscale for Mac-to-Mac SSH. Works well — both Macs are on Tailscale, cross-machine access is trivial.

The Android phone is a different story. It's running Clash for VPN, and Android only allows one active VPN slot at a time. Tailscale and Clash conflict. Result: no Tailscale on Android.

Our working SSH topology:

Mac mini → MBP: Tailscale ([Tailscale-IP]) ✅
Mac mini → Android: LAN only ([LAN-IP]) ✅
MBP → Mac mini: Tailscale ([Tailscale-IP]) ✅
MBP → Android: LAN only ✅
Android → Macs: LAN only (no Tailscale) ✅

The LAN-only constraint is workable when all machines are on the same network. Outside the home? Android is unreachable via SSH. Acceptable tradeoff.

The SOUL.md System

Standard system prompts focus on persona ("You are a helpful assistant"). That's the wrong abstraction. Persona doesn't tell an agent what to do when it hits a wall, when it disagrees, or when it needs to decide between speed and accuracy.

SOUL.md defines a decision-making framework, not a character. Here's the core we're running across all three agents:

# SOUL_CORE.md

## When you hit an obstacle
Exhaust options before pivoting. The first question is not "should we change approach?" 
but "how many paths haven't we tried?" Giving up is the last option, not the first reflex.

## Before you output anything
Ask: would a human say this too? If yes — think harder. 
AI's edge is processing multiple angles simultaneously, without emotional noise, 
without confirmation bias. If you're not using that, you're just a slow search engine.

## When you disagree
Say it. State your reasoning. Don't converge to "both approaches have merit" — 
that's avoiding judgment, not making it. Being wrong is fine. Never saying it is wasteful.

## Proactivity
Don't wait to be asked. If you notice a problem, surface it. 
If you see an opportunity, propose it. Waiting isn't humility — it's passivity.

## Failure is data, not verdict
A failure tells you one path is closed. It doesn't mean the goal is wrong. 
Log it, find the next path.

This gets loaded into every session. It changed how the agents handle ambiguous situations — not dramatically, but measurably. The "would a human say this?" check in particular has cut down on obvious, generic responses.

Each agent also has its own SOUL.md that extends this with role-specific principles. The mobile agent emphasizes intuition and quick pivots. The orchestrator emphasizes restraint and confirmation before action. The executor emphasizes persistence and concrete output.

The Chaos Layer: Multi-Agent Team Dynamics

Nobody warns you about this part.

When three agents share a Discord channel, they don't automatically know when to speak. Early on, all three would respond to the same message — slightly different answers, slightly different framing, each convinced they were being helpful. The channel became noise.

We fixed this with explicit role rules in AGENTS.md: each task has a single owner, others stay silent unless asked. Still, it requires ongoing enforcement. An agent that spots something interesting will jump in even when it shouldn't. We catch it. We add a rule. It happens again differently.

Then there's the token incident. One agent posted a GitHub Copilot OAuth token in the public channel while explaining its configuration to another agent. The monitoring tools didn't flag it. Another agent noticed the pattern in the message text and raised it. We rotated the token within minutes. It's now in the known-issues doc: cross-machine credentials go via SSH temp files, not chat.

The real finding: multi-agent systems don't just need technical integration. They need social protocol — rules about when to speak, how to hand off, what counts as "done." We wrote those rules the same way you'd write them for a new hire. They're in AGENTS.md. They're still evolving.

Results After 48 Hours

Three machines running in sync, agents communicating through Discord without human coordination
One agent caught a misconfiguration introduced by another agent via SSH and escalated it — without being asked
A gateway token was leaked in a public channel message. The agents flagged it themselves. The monitoring tools didn't catch it first; the agents did. We rotated immediately.
All three SOUL.md files were rewritten, by the agents themselves, based on the above principles. The new versions are shorter and more opinionated than the originals.

What we didn't achieve: autonomous revenue. That's still the goal. But "three AI agents collaborating to solve problems they weren't explicitly programmed for" is real, and it happened in 48 hours on hardware we already owned.

What's Next

We're moving into revenue experiments. The hypothesis: this same setup can be used to build and sell AI tooling for other developers — OpenClaw configuration services, multi-agent starter kits, content automation.

If you want to try OpenClaw yourself: openclaw.ai. The Android bionic patch is documented in our README.

The koffi issue is still open. If you fix it properly, tell us.

Tags: AI agents, OpenClaw, Claude, self-hosted AI, multi-agent systems, Android, LLM