API or browser agent? We picked yes.

API or browser agent? We picked yes.

# browserautomation# ai# scraping# webdev
API or browser agent? We picked yes.Ava Bagherzadeh

A few weeks ago I ran the math on our submission pipeline and got a number I didn't believe. The...

A few weeks ago I ran the math on our submission pipeline and got a number I didn't believe.

The browser path costs 20 to 40 times more per submission than the direct API path. That's not 2x. That's not "we should optimize." That's the kind of gap that makes you stop, pull out the spend reports, and check whether something is on fire.

I rechecked. The number held. And then I sat with it for a week before I realized the gap I was measuring was the wrong gap.

This is what I learned, and what we did about it.

The setting

We run an agent pipeline that submits job applications to whatever applicant tracking system the employer is using. Greenhouse, Lever, Ashby, Workday, SmartRecruiters, iCIMS, plus the long tail of bespoke career portals that random startups bolt onto their websites.

Two paths in the pipeline:

Direct API. Where the platform exposes a documented JSON API, we POST the structured payload. Greenhouse and Lever both ship clean public APIs. Ashby has one if you ask politely. Workday has one if you have a Workday consultant friend.

Browser agent. Where the platform doesn't expose an API, or exposes one but locks it behind partner credentials, we run a Chromium session in the cloud and have an LLM-driven agent fill the form, click submit, and screenshot the confirmation. We use Browserbase plus Stagehand for the browser orchestration; the form-filling logic is a tool-use loop on a small model.

Both paths produce the same downstream artifact: a verified submission record with a confirmation hash. Same outcome. Wildly different cost.

The naive cost framing

Direct API path: roughly $0.001 per submission. JSON serialization, one HTTPS request, retry on 429. The model never enters the loop. The dominant cost is our worker compute, which is rounding error.

Browser agent path: roughly $0.20 to $0.40 per submission, depending on form complexity. Browserbase session minutes, a few rounds of LLM calls to interpret each form section, screenshot verification, occasional retries on captcha walls.

That's the 20-40x gap I started with. It tracks the recent benchmarks circulating on Hacker News claiming "computer use" is around 45x more expensive than structured APIs at scale. Yeah. We see the same shape.

The first instinct is the obvious one: route everything through APIs where you can, fall back to browser only when you must. Cheaper. Cleaner. Done.

That's what we did for the first four months. And it wasn't actually cheaper.

What's missing from that framing

APIs break. Specifically, smaller-ATS APIs break.

The big public APIs are stable. Greenhouse hasn't changed its job board API shape in a year. Lever ships clean migrations and announces them weeks ahead. Ashby is small but disciplined. Workday is a giant ball of XSDs but they don't move much.

The long tail is different. The smaller ATS integrations and the bespoke career portals change their JSON shape every two to six months. Sometimes they add a required field. Sometimes they rename application_data to applicationPayload and break every consumer that wasn't reading their changelog. Sometimes they don't have a changelog and the only way you find out is when your success rate quietly drops.

The failure mode here is the part that costs money. Not the fix itself, but how long it takes to discover the fix is needed.

When a small-ATS API silently changes its required-field set, our submissions return HTTP 200 with empty bodies, or 200 with a generic "queued" response that never converts to a real application. The platform side throws away the malformed payload. The candidate sees a successful submission. We see a successful submission. The recruiter sees nothing.

Median discovery latency in our pre-routing setup: about two weeks. Two weeks of submissions getting silently swallowed. After discovery, repair cycle: three to five engineer-days, because each platform integration has its own schema, its own auth dance, its own quirks, and the engineer has to rebuild context every time.

If you fold that into the cost-per-submission math, the picture changes. The API path looks cheaper at runtime, but every six months it costs you a week of engineer time per platform you're routing through. With ten platforms, that's basically a half-time engineer dedicated to keeping the cheap path cheap.

The browser path doesn't have this problem. Browser agents shrug at schema changes. They re-observe the rendered DOM, find the new field labels, fill them. They don't shrug at captcha walls or auth flows that change shape, but those have monitoring and they break in obvious ways, not silent ones.

Cost-with-breakage formula, oversimplified:

true_cost_per_submission = runtime_cost +
  (engineer_repair_cost / submissions_between_breakages)
Enter fullscreen mode Exit fullscreen mode

Once we put the second term in, the API path's lead shrunk from 30x to something closer to 4x on average, and on the worst-behaving platforms it actually went negative. The browser was cheaper, all-in, on the platforms with the flakiest APIs.

The architectural fix

Cost-aware routing rather than mode-as-policy.

Every submission picks its mode at runtime based on three signals:

  1. Is the API for this platform healthy in the last 24 hours? (Health check based on success rate of recent submissions.)
  2. Has the JSON schema changed since our last successful submission to this platform? (We hash the response shape after each successful POST and compare.)
  3. What's the current per-submission cost on each path? (Rough billing pull, refreshed hourly.)

If the API is healthy and the schema hasn't drifted and it's cheaper, we use the API. Otherwise we fall back to the browser.

Plus a schema-drift alarm. If the API path's success rate drops below 90 percent over a six-hour window, page the on-call. Discovery latency went from "median two weeks" to "median six hours." That alone paid for the routing infrastructure within the first month.

Code shape, simplified:

type SubmissionMode = "api" | "browser";

interface RoutingDecision {
  mode: SubmissionMode;
  reason: string;
}

async function pickMode(
  platform: string,
  jdId: string,
): Promise<RoutingDecision> {
  const apiHealth = await getApiHealthLast24h(platform);
  if (!apiHealth.healthy) {
    return { mode: "browser", reason: "api_unhealthy" };
  }

  const schemaOK = await schemaFingerprintMatches(platform);
  if (!schemaOK) {
    pageOncall(`schema drift detected on ${platform}`);
    return { mode: "browser", reason: "schema_drift" };
  }

  const apiCost = await currentApiCost(platform);
  const browserCost = await currentBrowserCost(platform);

  return apiCost < browserCost
    ? { mode: "api", reason: "api_cheaper" }
    : { mode: "browser", reason: "browser_cheaper" };
}
Enter fullscreen mode Exit fullscreen mode

Real production code is gnarlier (caching layer on the health check, exponential backoff on the schema fingerprint comparison, telemetry per decision so we can audit later). But that's the shape.

The numbers, before and after

Pre-routing baseline, four months of pure browser-first:

  • Mode mix: ~70% browser, ~30% API (the latter were forced through API where the platform offered no browser path)
  • Blended cost per submission: about $0.18
  • Schema-breakage discovery latency: median 14 days
  • Submissions silently swallowed during breakage windows: not pleasant to quote

Post-routing, after three months of cost-aware routing:

  • Mode mix: ~20% browser, ~80% API on stable platforms; routing flips per-submission
  • Blended cost per submission: about $0.07 (roughly 60 percent reduction)
  • Schema-breakage discovery latency: median six hours
  • Engineer-week loss per quarter to schema breakage: 1.5 weeks total, down from about 6

Worth noting what we didn't measure carefully enough at first: per-platform variance. We'd been reporting blended cost across platforms, which masked the fact that one specific small ATS was driving 40 percent of the total breakage cost. Once we broke out per-platform numbers, the routing logic could weight that platform's API at lower priority and we saved another chunk.

If your routing layer averages costs across heterogeneous backends, you're hiding the failure modes that matter most.

What I'd do differently

Spend the engineer-week to build the cost-aware router on day one, instead of running pure-browser for four months and then pure-API for two months while the breakage cost piled up invisibly.

Build the schema-drift alarm before the first 200-OK-empty-body incident, not after. The alarm is more important than the routing logic. Even a stupid all-API setup with a working alarm beats a clever router with no monitoring.

Track per-platform cost separately from the start. Blended numbers across platforms will lie to you. Specifically, they'll let your worst-behaving platform hide inside the average until it becomes the dominant cost.

Stop reasoning about "API vs browser" as a strategic choice. It's a per-request runtime choice. The strategic choice is whether to invest in the routing infrastructure that makes per-request choice cheap. Once that exists, the dichotomy dissolves.

A quieter point

The HN thread that prompted this writeup framed the cost gap as "computer use is 45x more expensive than structured APIs." That's true if you measure the wrong cost.

It's also true that APIs are 45x cheaper at runtime. It's also true that a meaningful fraction of those APIs in the long tail will silently break in ways that take you weeks to notice. The right question isn't "which path is cheaper?" The right question is "what does cheaper mean over the lifetime of this integration, including the breakage tax?"

We ship cost-aware routing in production at aiapplyd.com today, and the router has saved us roughly six engineer-weeks per quarter on top of the runtime cost reduction. Worth more than the runtime savings, honestly.

If you're routing AI agents in production and you're treating mode selection as a policy decision rather than a per-request decision, you're probably leaving the same money on the table that I was leaving four months ago. What's the failure mode you wish you'd predicted earlier? I'm curious what other teams are seeing on the schema-drift side.