Test AgentMail multi-tenant webhooks before they leak across tenants

Test AgentMail multi-tenant webhooks before they leak across tenants

# agentmail# webhooks# api# testing
Test AgentMail multi-tenant webhooks before they leak across tenantsFetchSandbox

AgentMail webhook creation can look done after a 200 OK. The real test is whether the persisted webhook is still scoped to the inboxes you sent.

The dangerous AgentMail webhook bug starts with a response that looks completely fine.

POST /v0/webhooks -> 200 OK
Enter fullscreen mode Exit fullscreen mode

The response body contains the webhook object. It includes your URL, event types, and usually enough fields to make the integration feel done.

But in a multi-tenant app, the field that matters is inbox_ids. If that array is silently dropped, your webhook is no longer scoped to the tenant inbox you meant to subscribe. It is subscribed broadly, and every tenant's message events can start flooding the same endpoint.

That is not a noisy failure. It is worse. The subscribe call worked. Your endpoint receives events. The logs look active. The leak is in the scope.

If your handler routes by webhook ID before it checks the inbox ID, the wrong customer can look like a valid event. That is the kind of bug that passes observability and fails privacy.

Why unit tests miss it

Most unit tests stub the webhook create call by echoing back exactly what the app sent.

request.inbox_ids -> response.inbox_ids
Enter fullscreen mode Exit fullscreen mode

That proves your client serialized the payload. It does not prove AgentMail persisted the scope.

The test passes. CI passes. The code ships. Then production receives events for inboxes that do not belong to the tenant that configured the webhook.

The annoying part is that the local test is not obviously wrong. It has the right endpoint, the right payload shape, and the right assertion against the create response. It just trusts the wrong copy of the object.

The pattern is reconcile-after-write

The fix is small enough to name:

POST -> GET -> diff
Enter fullscreen mode Exit fullscreen mode

Treat the POST response as untrusted. It may be your request echoed back. Treat the later GET as the provider state. That is what AgentMail actually stored.

The diff is the bug.

const sentInboxIds = ["inbox_tenant_123"];
const created = await agentmail.post("/v0/webhooks", {
  url: "https://app.example.com/webhooks/agentmail",
  event_types: ["message.delivered", "message.bounced"],
  inbox_ids: sentInboxIds,
});
const stored = await agentmail.get(`/v0/webhooks/${created.id}`);
const got = [...(stored.inbox_ids ?? [])].sort();
const want = [...sentInboxIds].sort();
if (JSON.stringify(got) !== JSON.stringify(want)) {
  throw new Error("AgentMail webhook scope drifted");
}
Enter fullscreen mode Exit fullscreen mode

That is the whole check. It is not a bigger mock. It is a read after the write.

Why the agent caught it

The reason a coding agent caught this where the unit test did not is not that it was smarter about webhooks. It ran a better-shaped test.

AgentMail has a curated webhook_lifecycle_create_read_delete workflow in FetchSandbox. When the agent ran it through the FetchSandbox MCP server, the workflow forced the boring step people skip:

create inbox
subscribe webhook scoped to that inbox
read webhook back
compare inbox_ids
delete webhook
Enter fullscreen mode Exit fullscreen mode

The workflow shape matches the bug shape. A stubbed unit test checks what your app meant to send. The workflow checks what the provider kept.

This is not only AgentMail

The same bug appears anywhere a control-plane API returns an object that looks like the request you sent:

  • IAM policies
  • ACL rules
  • draft resources
  • webhook subscriptions
  • tenant-scoped notification settings

If a create call returns 200, but the security or tenancy boundary matters, do not stop at the create response.

Read it back. Diff the fields that protect the boundary. Fail the test before the provider starts sending another tenant's events to your endpoint.

Run the brownfield demo

The brownfield-agentmail-demo repo shows this end to end if you want to run it against the workflow yourself.

The important part is not AgentMail-specific. The habit is: after a write that configures scope, reconcile what the provider persisted.

A webhook that passes create tests can still be the webhook that leaks across tenants.

FetchSandbox's AgentMail workflow docs include the create, read, and teardown path.