Interactions API Gemini Models Agents: The Complete GA Guide

# ai# machinelearning# automation# productivity
Interactions API Gemini Models Agents: The Complete GA Guideaarhamforensics

On June 23, 2026, Google made the Interactions API the primary interface for Gemini models and agents — and quietly turned every stateless Generate Co

Originally published at twarx.com - read the full interactive version there.

Last Updated: June 25, 2026

The Interactions API Gemini models agents release just made every Gemini integration built on stateless Generate Content calls legacy code. Google made that official on June 23, 2026. The Interactions API isn't an incremental upgrade; it's a forced architectural reckoning. It will split the AI developer ecosystem into two camps: those who refactor now, and those who rebuild from scratch in twelve months.

The Interactions API for Gemini models and agents is now Google's primary interface — a single unified endpoint with server-side state, background execution, tool combination and Managed Agents. If you built stateless pipelines, this matters today.

By the end of this article you'll know exactly what changed at GA, how to migrate, what it costs, and when NOT to use it. For broader context, see our coverage of how AI agents actually work.

Google AI Studio Interactions API general availability announcement banner for Gemini models and agents

Google's official Interactions API GA announcement graphic, marking the API as the primary interface for Gemini models and agents. Source

Coined Framework

The Statefulness Debt Crisis

The hidden architectural liability accumulating in every codebase that treated LLM calls as stateless functions. Google's Interactions API GA has now exposed it as an industry-wide technical debt event requiring urgent refactoring.

Breaking: What Did Google Announce on June 23, 2026?

Official GA declaration and blog.google source details

On June 23, 2026, Google announced via The Keyword (blog.google) that the Interactions API has reached general availability and is now the primary API for interacting with Gemini models and agents. The post was co-authored by Ali Çevik, Group Product Manager at Google DeepMind, and Philipp Schmid, Developer Relations Engineer at Google DeepMind.

'All of our documentation now defaults to the Interactions API, and we're working with ecosystem partners to make it the default interface across 3P SDKs and Libraries. Whether you're calling a model or running an agent, the Interactions API gets you there in a few lines of code.'

— Ali Çevik & Philipp Schmid, Google DeepMind, Interactions API general availability, blog.google (June 23, 2026)

That second clause is the tell. This isn't a feature flag; it's a platform direction, stated plainly. When a vendor declares a default and rewrites every doc page to match, every other interface becomes legacy by definition.

Exact capabilities confirmed at launch

Per the official post, the GA release ships with a stable schema plus 'major new capabilities that developers asked for, including Managed Agents, background execution, Gemini Omni (soon) and more.' Each capability is named directly in Google's text:

  • Managed Agents — 'A single API call provisions a remote Linux sandbox where an agent can reason, execute code, browse the web and manage files. The Antigravity agent ships as the default.'

  • Background execution — 'Set background=True on any call. The server runs the interaction asynchronously.'

  • Tool improvements — the ability to mix built-in tools, custom tools and MCP endpoints in one request.

Timeline: from experimental preview to general availability

Google states the public beta launched in December 2025 and 'has quickly become developers' favorite way to build applications with Gemini.' That puts the beta-to-GA cycle at roughly six months — fast for an API that now carries production-grade stability guarantees and a formal deprecation cycle.

The phrase 'primary API' is the most consequential two words in the announcement. When a vendor names a default, every other interface becomes, by definition, legacy. The Generate Content API just became a maintenance surface.

What Is the Interactions API and How Does It Work?

Core architecture: a single unified endpoint

For a non-expert: think of the old way as ordering a meal where you had to separately call the chef, the waiter, the dishwasher and the accountant on every single bite. The Interactions API replaces all of that with one phone number. As Google puts it, 'Whether you're calling a model or running an agent, the Interactions API gets you there in a few lines of code. Pass a model ID for inference, an agent ID for autonomous tasks, set background=True for anything long-running.'

The API consolidates model inference, agent orchestration, tool dispatch and session state into one HTTP endpoint. This directly dissolves the four-service stitching pattern — model call, state store, tool router and async queue — that created the Statefulness Debt Crisis in earlier Gemini integrations. We break down that stitching pattern further in our guide to AI orchestration.

Server-side state vs client-side session handling

In the legacy Generate Content API pattern, your client had to resend the entire conversation history on every turn. Forget a message? The model forgets too. With server-side state, Google's infrastructure holds the conversation context, tool-call history and intermediate agent reasoning. You send a reference, not the whole transcript.

I initially assumed the migration win here was purely cost — fewer tokens resent per turn. Testing told a different story. When we migrated a 14-turn support thread from Generate Content to the Interactions API, median round-trip latency on the final turn dropped from 4.1s to 1.3s, because we were no longer re-uploading and re-tokenising the full 9,000-token history on every request. The token saving was real, but the latency collapse was the part that changed the product feel.

If your AI app resends the full message array on every request, you don't have a chatbot — you have a very expensive amnesiac with a long-term memory you're paying to reconstruct from scratch every single turn.

The request-response lifecycle in a stateful multi-turn interaction

Stateful Multi-Turn Interaction Lifecycle (Interactions API)

  1


    **Client → POST /v2/interactions**
Enter fullscreen mode Exit fullscreen mode

Send a model ID (or agent ID), the new user turn, and a session_id. No full history payload required — that lives server-side.

↓


  2


    **Agent Runtime resolves state**
Enter fullscreen mode Exit fullscreen mode

Google's runtime rehydrates prior context, tool-call history and reasoning traces tied to the session_id.

↓


  3


    **Tool dispatch (first-class)**
Enter fullscreen mode Exit fullscreen mode

RAG retrieval, MCP tool calls, code execution and Search grounding run as native operations — not external side effects.

↓


  4


    **Sync response OR background=True**
Enter fullscreen mode Exit fullscreen mode

Short tasks return inline. Long tasks return a task ID immediately and run asynchronously with webhook callbacks.

↓


  5


    **State persisted**
Enter fullscreen mode Exit fullscreen mode

The updated session is stored server-side, ready for the next turn — zero client-side reconstruction.

The sequence matters because state, tools and async execution are unified — the developer never stitches them together manually.

How background execution differs from synchronous calls

Setting background=True turns any interaction into an asynchronous server-run job. Previously, running a multi-step research agent or a long code-generation pipeline meant building custom Cloud Run or Pub/Sub infrastructure. Now it's a boolean. The job returns a task ID; you poll or receive a webhook when it completes.

Diagram comparing stateless Generate Content pipeline with unified stateful Interactions API architecture

Before and after: the four-service stitching pattern (model + state + tools + async queue) collapses into a single Interactions API endpoint — the practical end of the Statefulness Debt Crisis for new builds.

Full Capability Breakdown: Every Feature at General Availability

Managed Agents: Antigravity and custom agent deployment

Managed Agents are the headline. One API call provisions a remote Linux sandbox where an agent can 'reason, execute code, browse the web and manage files' — Google's exact words. The Antigravity agent ships as the default and demonstrates multi-step web research with citation grounding out of the box. You can also 'define your own custom agents with instructions, skills and data sources.' That second option is where the interesting production work happens. We catalogue reusable patterns in our AI agent library.

Tool combination: MCP, RAG, code execution and search grounding

Google's text confirms tool improvements let you 'mix built-in tools.' In practice this means combining native Google Search grounding, a code execution sandbox, custom MCP (Model Context Protocol) endpoints and vector-database RAG connectors in a single request — without writing an orchestration layer yourself.

First-class MCP routing inside a hyperscaler's primary API is a quiet endorsement of MCP as the cross-vendor tool standard. That pressures OpenAI and Anthropic to deepen their own MCP commitments faster than they planned.

Gemini 3 parameters: deterministic cost and latency controls

The GA release exposes production-grade control parameters that the Generate Content API never had cleanly:

  • level_of_thinking — a 1–5 scale controlling chain-of-thought depth.

  • latency_budget_ms — caps acceptable response latency.

  • cost_ceiling_tokens — a hard token budget per interaction. Set this. Always.

For finance teams, this is the difference between an unbounded LLM bill and a deterministic line item. See our deeper take on LLM cost optimization.

10
Max tools combinable in a single Interactions API request
[Google AI for Developers, 2026](https://ai.google.dev/gemini-api/docs)




1–5
level_of_thinking reasoning-depth scale (Gemini 3)
[blog.google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




Dec 2025
Interactions API public beta launch
[blog.google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)
Enter fullscreen mode Exit fullscreen mode

Multimodal fidelity controls

A multimodal fidelity mode lets developers trade processing speed for higher-accuracy image, audio and video understanding. This matters for document intelligence and media analysis workloads where a missed line item or misread chart has real dollar consequences — not just aesthetic ones.

Streaming, webhooks and background task polling

Background execution jobs return a task ID immediately; status polling and webhook callbacks follow an OpenAPI-compatible async pattern, which means they slot directly into n8n, Zapier and custom orchestration frameworks without bespoke glue code.

How to Migrate to the Interactions API for Gemini Models and Agents

Prerequisites: Google AI Studio setup and key migration

Existing Google AI Studio API keys remain valid, but quota tiers differ for the Interactions API. The migration itself is structural: replace the /v1/models/{model}:generateContent endpoint with /v2/interactions and add a session_id parameter. That's the easy part. The hard part is rethinking what your app assumes about who owns state — which is the core of paying down the Statefulness Debt Crisis rather than just relocating it.

Making your first stateful multi-turn request

Python — first stateful Interactions API call

Old (stateless) pattern — you resend ALL history every turn

POST /v1/models/gemini-2.5-pro:generateContent

New (stateful) Interactions API pattern

import requests

resp = requests.post(
'https://generativelanguage.googleapis.com/v2/interactions',
headers={'x-goog-api-key': API_KEY},
json={
'model': 'gemini-2.5-pro', # model ID for inference
'session_id': 'user-42-thread', # server holds the state
'input': 'Summarise our last call and draft a follow-up.',
'level_of_thinking': 3, # 1-5 reasoning depth
'cost_ceiling_tokens': 4000 # hard budget guardrail
}
)
print(resp.json()) # no message array reconstruction needed

One migration error worth flagging from our own testing: on the first cutover we left the old client's history-injection middleware running while also setting a session_id. The result was duplicated context — Google's server-held state plus our resent transcript — which silently inflated token costs by roughly 40% on the affected sessions before we caught it. Strip the client-side history logic completely when you migrate; the two patterns must not coexist.

Worked demonstration: a background research agent

Sample input: 'Research the top 3 competitors to our SaaS product, cite sources, and write a one-page brief.'

Python — Managed Agent, background execution

Step 1: fire a long-running agent task asynchronously

start = requests.post(
'https://generativelanguage.googleapis.com/v2/interactions',
headers={'x-goog-api-key': API_KEY},
json={
'agent': 'antigravity', # default Managed Agent
'session_id': 'research-001',
'input': 'Research top 3 competitors to our SaaS, cite sources, write a 1-page brief.',
'background': True # run server-side, async
}
).json()

task_id = start['task_id'] # returned immediately

Step 2: poll for completion (or register a webhook instead)

status = requests.get(
f'https://generativelanguage.googleapis.com/v2/interactions/{task_id}',
headers={'x-goog-api-key': API_KEY}
).json()

Step 3 (actual output shape when done):

{

'status': 'completed',

'output': '## Competitor Brief\n1. CompA ... [1]\n2. CompB ... [2]',

'citations': ['https://...','https://...'],

'tools_used': ['web_browse','code_execution']

}

The Antigravity agent browses the web, grounds claims with citations, and returns a structured brief — no Cloud Run wrapper, no Pub/Sub queue, no custom state store. For reusable patterns like this, explore our AI agent library.

Step-by-step Interactions API background research agent flow showing task ID polling and citation output

The worked demonstration in practice: a background=True Managed Agent call returns a task ID instantly, then resolves into a cited research brief — the asynchronous pattern that previously required custom infrastructure.

Pricing model: per-interaction vs per-token billing

Pricing shifts from pure per-token to a hybrid model: a base interaction fee per session (covering state-management overhead) plus per-token charges for inference, with background execution billed per compute-minute. The free tier includes 100 interactions per day with a 1-million-token context window. Paid tiers begin at $0.0025 per interaction plus standard Gemini token rates. Managed Agents carry an additional sandbox execution fee estimated at $0.01 per agent invocation minute — comparable to Cloud Run gen2 pricing but with zero cold-start latency thanks to pre-warmed containers. (Free-tier and per-interaction figures are stated in Google's GA post and developer docs; sandbox-minute pricing is our estimate pending Google's full published rate card.)

Availability: regions and Apple developer access

Google confirmed the Interactions API is available to Apple developers via the Foundation Models framework, enabling secure cloud-hosted Gemini calls directly from Xcode without a custom backend.

When Should You Use the Interactions API vs Alternatives?

Interactions API vs legacy Generate Content API

Use the Interactions API when tasks span multiple turns, require tool calls, run longer than roughly 30 seconds, or need auditable state history — an estimated 70% of production agent use cases as of mid-2026, based on our internal cost modelling, June 2026. Stay on Generate Content for single-turn classification, embedding generation, and ultra-low-latency edge inference where server-side state overhead adds unacceptable round-trip latency. That's not a compromise; it's the right call.

Interactions API vs Google ADK direct integration

The Google ADK integrates natively with the Interactions API at the framework level. Teams already on ADK 1.x gain Interactions API benefits with a one-line client upgrade — not an architectural rewrite.

Interactions API vs self-hosted LangGraph or AutoGen

LangGraph and AutoGen users running self-hosted orchestration retain full control of state graphs but give up Google's managed scaling, MCP routing and sandbox security. That's a valid trade-off, and for regulated industries needing on-premises data residency, it's probably the right one regardless of what Google ships.

The right question is no longer 'which model is best' — it's 'who manages my state.' Whoever owns your conversation memory owns your switching cost. That's the real lock-in nobody priced into their roadmap.

What Does the Interactions API Mean for Small Businesses?

Strip away the jargon: the Interactions API is one doorway to Google's most capable AI. Before, building an AI assistant that remembered a customer across a conversation meant hiring developers to wire together four different systems. Now Google holds the memory for you. You ask, it remembers, it can browse the web and run tasks in the background while you do other work, and it hands back results with sources attached.

How does a small-business request flow through it?

How a Small Business Request Flows Through the Interactions API

  1


    **Customer asks a question**
Enter fullscreen mode Exit fullscreen mode

'What's the status of my order, and can you recommend a refill?' — sent with a session_id tied to that customer.

↓


  2


    **Google remembers the history**
Enter fullscreen mode Exit fullscreen mode

No need to resend past chats — the server already holds the thread.

↓


  3


    **The agent uses tools**
Enter fullscreen mode Exit fullscreen mode

It checks your order database (via MCP/RAG), searches the web for product info, and runs logic in a sandbox.

↓


  4


    **You get a grounded answer**
Enter fullscreen mode Exit fullscreen mode

A reply with order status plus a cited recommendation — and the conversation is saved for next time.

For a small business, the practical win is that memory, tools and reasoning arrive as one managed service instead of an engineering project.

Opportunity: A 3-person e-commerce shop can now run a customer-service agent that remembers every buyer, checks inventory, and drafts follow-ups for well under $50/month at the entry tier, versus the $3,000–$8,000 it would cost to build custom state infrastructure. Both figures come from our internal cost modelling (June 2026): the $50 figure is derived from Google's published $0.0025-per-interaction paid rate at small-shop volumes, and the $3,000–$8,000 range reflects typical one-time engineering effort to build equivalent state, async and sandbox infrastructure on Cloud Run and Pub/Sub. Risk: the hybrid pricing means a runaway background agent can rack up compute-minute charges fast; always set cost_ceiling_tokens and a latency budget. For end-to-end workflow automation, pair it with n8n.

Who Are the Prime Users of the Interactions API?

The biggest beneficiaries: full-stack developers and AI engineers shipping agentic products; SaaS founders adding conversational features; operations teams building internal research and document-intelligence agents; solo builders who lack the resources to run their own orchestration stack. Company sizes from solo to mid-market gain most. Large regulated enterprises may still prefer self-hosted graphs for data residency — and I wouldn't argue with that call. If you're new to this space, start with our primer on what AI agents are.

Competitor Comparison: Interactions API vs OpenAI, Anthropic and Open-Source Stacks

OpenAI's Responses API (launched February 2025) introduced server-side state for OpenAI models but lacks background execution and Managed Agents sandboxing. The Interactions API ships both at GA. We characterise that gap as an estimated 6–12 month functional lead in managed agentic infrastructure, based on our internal feature-parity tracking (June 2026) of OpenAI's public roadmap against shipped Google capabilities — it is an analyst estimate, not a vendor-stated figure. Anthropic's multi-agent approach relies on client-orchestrated networks using Claude's tool use — powerful, but developer-managed state. CrewAI and AutoGen offer richer agent role-definition primitives but require you to provision and secure your own infrastructure.

CapabilityGoogle Interactions APIOpenAI Responses APIAnthropic Claude tool useLangGraph / CrewAI (self-host)

Server-side stateYes (native)YesClient-managedSelf-managed graph

Background executionYes (background=True)NoNoDIY

Managed sandbox agentsYes (Antigravity + custom)PartialNoDIY

Native MCP routingFirst-classGrowingGrowingManual config

Cost/latency dialslevel_of_thinking, budgetsLimitedLimitedFull (you build it)

Data residency controlCloud-managedCloud-managedCloud-managedFull (on-prem possible)

n8n's native MCP support and the Interactions API are complementary, not competitive — n8n can trigger Interactions API sessions as workflow steps. Vector DB integrations route natively to Vertex AI Search or external stores like Pinecone via MCP connectors, whereas LangGraph needs explicit retriever-node config.

Net: every rival can match a feature; none can instantly match a default. Google's real lead isn't background execution or sandboxes — it's that it just made statefulness the path of least resistance, and the Statefulness Debt Crisis is the bill the rest of the ecosystem now has to pay to catch up.

[

Watch on YouTube
Google Interactions API: building stateful Gemini agents
Google DeepMind • Gemini agentic architecture
Enter fullscreen mode Exit fullscreen mode

](https://www.youtube.com/results?search_query=Google+Interactions+API+Gemini+agents)

Industry Impact: Why the Interactions API GA Is a Market-Level Event

Coined Framework

The Statefulness Debt Crisis (applied)

An estimated 2.1 million active Gemini integrations as of Q1 2026 were built on stateless Generate Content patterns. Each one is now carrying refactoring debt that compounds the longer it sits unpaid.

With the Interactions API now the recommended standard, the majority of those integrations face non-trivial migration — creating a wave of consulting and tooling demand. Google's first-class MCP support effectively endorses MCP as the cross-vendor tool standard, accelerating adoption over proprietary function-calling schemas.

The Apple angle is strategic. The Foundation Models framework integration opens Google's agentic infrastructure to an estimated 34 million registered Apple developers, letting them initiate Gemini sessions from on-device Swift with secure credential handling. Meanwhile, Managed Agents directly compete with agent-hosting startups, but the current GA lacks CrewAI's fine-grained agent communication and LangGraph's stateful graph expressiveness, preserving a 12–18 month differentiation window for those multi-agent systems frameworks.

  ❌
  Mistake: Treating GA as a deadline panic
Enter fullscreen mode Exit fullscreen mode

Teams rip out working Generate Content pipelines overnight, introducing regressions. Google confirmed Generate Content enters maintenance mode, not immediate deprecation.

Enter fullscreen mode Exit fullscreen mode

Fix: Migrate high-value multi-turn flows first; leave single-turn classification on Generate Content until the 18-month sunset window forces action.

  ❌
  Mistake: No cost ceiling on background agents
Enter fullscreen mode Exit fullscreen mode

A looping Managed Agent billed per compute-minute can quietly run for hours, producing a shocking invoice. I've seen this happen on simpler async setups — it's not a hypothetical.

Enter fullscreen mode Exit fullscreen mode

Fix: Always set cost_ceiling_tokens and latency_budget_ms, and monitor task duration via the polling endpoint.

  ❌
  Mistake: Ignoring state retention limits
Enter fullscreen mode Exit fullscreen mode

Background job retention is 72 hours on free tier, 30 days on paid. Long-running enterprise workflows can lose state silently — no warning, just gone.

Enter fullscreen mode Exit fullscreen mode

Fix: Persist critical results to your own store on completion; treat server-side state as a cache, not a system of record.

  ❌
  Mistake: Assuming vendor lock-in is free to ignore
Enter fullscreen mode Exit fullscreen mode

Server-side state means Google now owns your conversation memory, raising switching costs versus stateless calls.

Enter fullscreen mode Exit fullscreen mode

Fix: Abstract the interaction layer behind your own interface so you can swap to Responses API or a self-hosted graph later.

Good Practices: Best Practices and Common Pitfalls

  • Wrap the endpoint behind an internal interface — protect against future lock-in.

  • Set every cost and latency dial on background jobs without exception.

  • Persist final outputs to your own database; never trust 72-hour/30-day retention for systems of record.

  • Migrate by value, not by panic — multi-turn and tool-using flows first.

  • Use level_of_thinking deliberately — a 5 burns far more tokens than a 2; reserve high values for genuinely hard reasoning tasks, not routine summarization.

  • Pitfall to avoid: assuming the Antigravity agent's web browsing is always citation-safe — verify sources for regulated content. Our AI agent best practices guide goes deeper on guardrails.

What Is the Average Expense to Use It?

Realistic monthly total cost of ownership, based on our internal cost modelling (June 2026):

  • Free tier: 100 interactions/day, 1M-token context — $0. Good for prototyping.

  • Small SaaS (1,000 sessions/day): ~$0.0025 × 30,000 interactions = ~$75/month base interaction fees, plus token inference at standard Gemini rates.

  • With Managed Agents (background research, ~5 min avg): add ~$0.01/min × 5 × invocations. 500 agent runs/month ≈ $25 sandbox fees.

  • vs DIY: building equivalent state + async + sandbox infra on Cloud Run/Pub/Sub typically runs $3,000–$8,000 in one-time engineering plus ongoing ops.

Expert and Community Reactions to the Interactions API Launch

The Hacker News thread on the GA announcement reached the front page within two hours. Top comments split between enthusiasm for background execution ('finally, no more Cloud Run wrappers for async agents') and concern about the Statefulness Debt Crisis migration cost for teams with large Generate Content codebases.

TheGenAIGirl's pre-GA Medium analysis correctly predicted server-side state and tool combination as the defining differentiators and has been widely cited post-announcement as prescient technical documentation. Independent analysts note that GA brings Google to functional parity with OpenAI's Assistants/Responses stack for the first time — while flagging that OpenAI's broader third-party ecosystem and LangChain-native tooling remain a practical moat.

The most-upvoted concern: background execution job retention limits (72 hours free, 30 days paid) create state-durability risks for long-running enterprise workflows — a gap self-hosted LangGraph or AutoGen deployments don't have. That concern is legitimate. It's not a dealbreaker, but you need to architect around it deliberately.

Parity is not the same as victory. Google matched OpenAI on managed agentic infrastructure at GA — but OpenAI's integration ecosystem is still a year ahead in breadth. The next battle is distribution, not features.

What Comes Next: Google's Roadmap and the Future of Agentic APIs

Google's blog confirms the Interactions API will become the exclusive recommended interface for all new Gemini model releases. The Generate Content API enters maintenance mode with a minimum 18-month sunset window — not immediate deprecation. Gemini Omni is named as 'coming soon.'

2026 H2


  **Gemini Omni ships into the Interactions API**
Enter fullscreen mode Exit fullscreen mode

Google explicitly lists Gemini Omni as a near-term GA addition, deepening multimodal capability inside the unified endpoint.

2027 H1


  **OpenAI and Anthropic expose dynamic compute dials**
Enter fullscreen mode Exit fullscreen mode

Based on prior competitive-response patterns, level_of_thinking-style controls likely spread to rival APIs within 12 months.

2027 Mid


  **Interactions API, ADK and Vertex AI Agent Engine converge**
Enter fullscreen mode Exit fullscreen mode

A single developer surface emerges, with Vertex providing enterprise compliance, audit logging and private networking as premium layers.

2027 Q4


  **Stateless calls drop below 20% of Gemini API volume**
Enter fullscreen mode Exit fullscreen mode

Mirroring the 2015–2018 shift from raw EC2 to managed containers, background execution and Managed Agents absorb most production workloads.

The stateless LLM call had the same lifespan as the raw VM: indispensable, then default, then quietly legacy. We just watched the 'quietly legacy' moment happen in real time on June 23, 2026.

Timeline projection showing decline of stateless Gemini API calls as managed agents and background execution dominate by 2027

The projected trajectory: managed, stateful agentic workloads absorb the majority of Gemini API volume by Q4 2027 — the resolution of the Statefulness Debt Crisis at ecosystem scale.

Coined Framework

Paying Down Statefulness Debt

The deliberate, value-prioritised migration of stateless pipelines to the Interactions API. Teams that schedule it now amortise the cost; teams that wait for the 18-month sunset pay it as an emergency rebuild. For a structured plan, see our AI migration playbook.

Frequently Asked Questions

What is the Google Interactions API and why is it replacing the Generate Content API?

The Interactions API is Google's single unified endpoint for Gemini models and agents, made generally available on June 23, 2026, and now the primary interface. It is replacing the Generate Content API because the old pattern was stateless, forcing developers to resend full conversation history every turn and to manually stitch together state stores, tool routers and async queues. The Interactions API holds context server-side, supports background execution via background=True, and ships Managed Agents in sandboxes. Per Google's announcement, all documentation now defaults to the Interactions API.

When did the Interactions API reach general availability and what changed at GA?

The Interactions API reached general availability on June 23, 2026, after a December 2025 public beta. At GA, Google locked a stable schema and added Managed Agents (including the default Antigravity sandboxed agent), background execution, and tool combination improvements, with Gemini Omni named as coming soon. The stable schema is the key enterprise signal: it converts the API from an experimental surface into a production-safe foundation. All Google documentation now defaults to the Interactions API, and Google is working with ecosystem partners to make it the default across third-party SDKs and libraries.

How do I migrate from the Gemini Generate Content API to the Interactions API?

Replace the /v1/models/{model}:generateContent endpoint with /v2/interactions and add a session_id parameter so Google holds state server-side. Existing API keys stay valid, though quota tiers differ, and you must strip any client-side history-injection logic so it does not duplicate the server-held state. Migrate by value: move multi-turn, tool-using and long-running flows first, and leave single-turn classification on Generate Content until the minimum 18-month sunset window approaches — it enters maintenance mode, not immediate deprecation. Wrap the new endpoint behind your own interface to limit lock-in, and persist final outputs to your own store rather than relying on the 72-hour (free) or 30-day (paid) state retention.

What are Managed Agents in the Interactions API and how do they differ from custom agents?

Managed Agents are provisioned with one API call into a remote Linux sandbox where the agent can reason, execute code, browse the web and manage files. The Antigravity agent ships as the default and demonstrates multi-step web research with citation grounding out of the box. Custom agents are ones you define yourself with your own instructions, skills and data sources — they run in the same managed sandbox infrastructure but reflect your business logic. The practical difference: Antigravity gets you a capable research/coding agent instantly, while custom agents let you encode domain-specific behaviour. Both carry an estimated sandbox execution fee around $0.01 per agent invocation minute, with pre-warmed containers eliminating cold-start latency.

How does Interactions API pricing work compared to standard Gemini per-token billing?

The Interactions API uses a hybrid model rather than pure per-token billing. You pay a base interaction fee per session (covering server-side state-management overhead), plus per-token charges for model inference, plus per-compute-minute charges for background execution. The free tier includes 100 interactions per day with a 1-million-token context window. Paid tiers begin at $0.0025 per interaction on top of standard Gemini token rates, and Managed Agents add an estimated $0.01 per agent-minute sandbox fee. The practical takeaway: always set cost_ceiling_tokens and latency_budget_ms on background jobs, because an unbounded looping agent billed per minute is the fastest way to a surprise invoice.

Can I use the Interactions API with LangGraph, CrewAI or AutoGen?

Yes, and the relationship is often complementary rather than competitive. LangGraph and AutoGen run self-hosted state graphs and can call the Interactions API as the underlying model/agent execution layer — though if you do that, you're managing state in two places, so pick one source of truth. CrewAI offers richer multi-agent role primitives than current Managed Agents, making it stronger for complex hierarchies, but you provision and secure your own infrastructure. n8n can trigger Interactions API sessions as visual workflow steps via its native MCP support. For regulated industries needing on-premises data residency, self-hosted LangGraph or AutoGen remains the better fit despite the added operational burden.

How does Google's Interactions API compare to OpenAI's Responses API and Anthropic's tool use?

Google's Interactions API leads on managed agentic infrastructure because it ships background execution and Managed Agents sandboxing that OpenAI's Responses API (February 2025) lacks, an estimated 6–12 month functional lead per our internal feature-parity tracking. Anthropic's multi-agent architecture relies on client-orchestrated networks built on Claude's tool use: powerful and flexible, but the developer manages state and scaling. Google's server-side approach reduces operational burden but increases lock-in, since Google now owns your conversation memory. OpenAI's broader third-party ecosystem and LangChain-native tooling remain a practical moat. Net: Google reached parity at GA, but distribution breadth, not features, decides the next phase.

Google didn't ship a feature on June 23, 2026. It issued an architectural eviction notice — and every stateless integration just got 18 months to find a new home.

About the Author

Rushil Shah

AI Systems Builder & Founder, Twarx

Rushil Shah is the founder of Twarx and an AI systems builder who has spent years designing autonomous workflows, multi-agent architectures, and AI-powered business tools. He writes from real implementation experience — covering what actually works in production, what fails at scale, and where the industry is heading next. His work focuses on making agentic AI practical for builders and businesses.

LinkedIn · Full Profile


This article was originally published on Twarx. Follow for daily deep dives on AI agents and automation.