
chrisI was running 11 AI agents — sales outreach, customer support triage, document review, lead scoring,...
I was running 11 AI agents — sales outreach, customer support triage, document review, lead scoring, content generation. They were all "working." But I couldn't answer the question every manager asks about their team: "who's pulling their weight?"
I had cost dashboards. I could see total LLM spend. But no one could tell me: this agent made $5,000 in pipeline and cost $800. That one cost $400 and produced nothing measurable.
So I built Metrx, an AI workforce scorecard. It treats each agent like an employee with a P&L — tracking both what they cost and what they produce. After dogfooding it for three months, here's what I learned about managing AI agents like a workforce.
The Real Problem Isn't Cost — It's Accountability
**
Everyone talks about LLM costs. But cost is just one side of the equation. The real question is: **are your agents creating value?
Most teams I've talked to can tell you their monthly OpenAI bill. Almost none can tell you:
This is the same visibility gap that existed in human workforce management before performance reviews became standard. We're just earlier in the curve with AI agents.
Architecture: The Agent Attribution Pipeline
The system has three layers, designed around attributing performance to individual agents:
┌─────────────────────────────────────┐
│ Your AI Agents │
│ (Change base URL, that's it) │
└──────────────┬──────────────────────┘
│
┌──────────────▼──────────────────────┐
│ Metrx Gateway │
│ (Cloudflare Workers, <5ms) │
│ │
│ • Tags every call by agent + task │
│ • Attributes cost to each agent │
│ • Forwards to provider unchanged │
└──────────────┬──────────────────────┘
│
┌──────────────▼──────────────────────┐
│ Metrx Scorecard Dashboard │
│ (Next.js 14 + Supabase) │
│ │
│ • Agent-level P&L statements │
│ • ROI grades per agent │
│ • Revenue attribution (Stripe) │
│ • Performance rankings │
└──────────────┬──────────────────────┘
│
┌──────────────▼──────────────────────┐
│ MCP Server (Open Source) │
│ (23 tools, TypeScript, MIT) │
│ │
│ • Agents query their own P&L │
│ • Self-optimization decisions │
│ • Board-ready ROI audit reports │
│ • A/B model experiments │
└─────────────────────────────────────┘
*Revenue Attribution: The Core Feature
*
This isn't an add-on. This is the whole point.
Cost tracking alone tells you what you spent. Revenue attribution tells you what you earned. Together, they give you a P&L per agent — and that's what lets you manage AI agents like a workforce.
Metrx connects to Stripe, HubSpot, and Calendly to attribute revenue back to each agent. If your sales outreach agent costs $800/month but generates $12,000 in pipeline, that's a 15x ROI — promote it (scale it up, give it more leads). If your document review agent costs $400/month and you can't attribute any measurable output, it's time for a performance review.
The attribution engine links: agent activity → task completion → revenue event → P&L scorecard.
*Here's what querying agent ROI looks like through the MCP server:
*
You: "What's the ROI breakdown for my sales outreach agent this month?"
Metrx (via metrx_get_task_roi):
Agent: sales-outreach
Period: March 2026
Total Cost: $847.23
Attributed Revenue: $14,200
ROI: 16.8x
Grade: A+
Recommendation: Scale — increase lead volume allocation
*The MCP Server: 23 Tools for Agent Workforce Management
*
The open-source piece is a Model Context Protocol server that lets Claude, Cursor, or any MCP-compatible client query agent performance data directly.
The key insight: agents themselves can use these tools. An agent can check its own ROI, compare its performance to other agents, and recommend optimization actions. This is the start of self-managing AI workforces.
*The 23 tools (all prefixed `metrx_`) cover 10 domains:
**
| Domain | Tools | What It Does |
|--------|-------|-------------|
| Agent Fleet Overview | 3 | Agent scorecards, performance summaries, detailed agent profiles |
| Optimization | 4 | Model routing, provider arbitrage, cost-per-quality recommendations |
| Budgets | 3 | Spend limits, enforcement modes, budget status |
| Alerts | 3 | Threshold monitoring, acknowledgment, failure prediction |
| Experiments | 3 | A/B model testing, results with statistical significance, winner promotion |
| Cost Leak Detection | 1 | Comprehensive 7-check waste audit |
| Revenue Attribution | 3 | Revenue linking, per-agent ROI calculation, multi-source attribution reports |
| Alert Configuration | 1 | Threshold tuning with automated actions |
| ROI Audit | 1 | Board-ready fleet performance reports |
| Upgrade Justification | 1 | Business case generation for tier upgrades |
*Integration: One Line Change
*
// Before
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
// After — just change the base URL
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
baseURL: "https://gateway.metrxbot.com/v1",
defaultHeaders: {
"x-metrx-agent": "sales-outreach",
},
});
That header is what enables agent-level attribution. Every call tagged with an agent identity flows into that agent's scorecard. Sub-5ms overhead.
*The Self-Optimizing Loop
*
Here's what gets me excited about the MCP approach. When agents have access to their own performance data, they can:
This is the difference between a cost dashboard (humans stare at charts) and a workforce management system (agents manage their own performance).
*Try It
*
npx @metrxbot/mcp-server — try in 30 seconds with --demo flagIf you're running AI agents in production, I'd love to hear: how do you know which agents are worth keeping? Drop a comment or find me on X @metrxbot_.