
FetchSandboxAgentMail webhook creation can look done after a 200 OK. The real test is whether the persisted webhook is still scoped to the inboxes you sent.
The dangerous AgentMail webhook bug starts with a response that looks completely fine.
POST /v0/webhooks -> 200 OK
The response body contains the webhook object. It includes your URL, event types, and usually enough fields to make the integration feel done.
But in a multi-tenant app, the field that matters is inbox_ids. If that array is silently dropped, your webhook is no longer scoped to the tenant inbox you meant to subscribe. It is subscribed broadly, and every tenant's message events can start flooding the same endpoint.
That is not a noisy failure. It is worse. The subscribe call worked. Your endpoint receives events. The logs look active. The leak is in the scope.
If your handler routes by webhook ID before it checks the inbox ID, the wrong customer can look like a valid event. That is the kind of bug that passes observability and fails privacy.
Most unit tests stub the webhook create call by echoing back exactly what the app sent.
request.inbox_ids -> response.inbox_ids
That proves your client serialized the payload. It does not prove AgentMail persisted the scope.
The test passes. CI passes. The code ships. Then production receives events for inboxes that do not belong to the tenant that configured the webhook.
The annoying part is that the local test is not obviously wrong. It has the right endpoint, the right payload shape, and the right assertion against the create response. It just trusts the wrong copy of the object.
The fix is small enough to name:
POST -> GET -> diff
Treat the POST response as untrusted. It may be your request echoed back. Treat the later GET as the provider state. That is what AgentMail actually stored.
The diff is the bug.
const sentInboxIds = ["inbox_tenant_123"];
const created = await agentmail.post("/v0/webhooks", {
url: "https://app.example.com/webhooks/agentmail",
event_types: ["message.delivered", "message.bounced"],
inbox_ids: sentInboxIds,
});
const stored = await agentmail.get(`/v0/webhooks/${created.id}`);
const got = [...(stored.inbox_ids ?? [])].sort();
const want = [...sentInboxIds].sort();
if (JSON.stringify(got) !== JSON.stringify(want)) {
throw new Error("AgentMail webhook scope drifted");
}
That is the whole check. It is not a bigger mock. It is a read after the write.
The reason a coding agent caught this where the unit test did not is not that it was smarter about webhooks. It ran a better-shaped test.
AgentMail has a curated webhook_lifecycle_create_read_delete workflow in FetchSandbox. When the agent ran it through the FetchSandbox MCP server, the workflow forced the boring step people skip:
create inbox
subscribe webhook scoped to that inbox
read webhook back
compare inbox_ids
delete webhook
The workflow shape matches the bug shape. A stubbed unit test checks what your app meant to send. The workflow checks what the provider kept.
The same bug appears anywhere a control-plane API returns an object that looks like the request you sent:
If a create call returns 200, but the security or tenancy boundary matters, do not stop at the create response.
Read it back. Diff the fields that protect the boundary. Fail the test before the provider starts sending another tenant's events to your endpoint.
The brownfield-agentmail-demo repo shows this end to end if you want to run it against the workflow yourself.
The important part is not AgentMail-specific. The habit is: after a write that configures scope, reconcile what the provider persisted.
A webhook that passes create tests can still be the webhook that leaks across tenants.
FetchSandbox's AgentMail workflow docs include the create, read, and teardown path.