How I Built ClaimSight: A 6-Agent AI Claims System with Google ADK and Gemini

# ai# geminiliveagentchallenge
How I Built ClaimSight: A 6-Agent AI Claims System with Google ADK and Geminipramodmisra

Insurance claims are broken. The average property claim takes 25 minutes on the phone, costs...


Insurance claims are broken. The average property claim takes 25 minutes on the phone, costs insurers $40 to process, and the industry loses $80 billion annually to fraud — now supercharged by AI-generated deepfakes. I built ClaimSight to fix this: a multi-agent AI system that handles claims from first contact to submission in one live conversation using voice, vision, and real-time fraud detection.

Here's how I built it with Google AI models and Google Cloud.

The Architecture: 6 Agents, 24 Tools

ClaimSight uses Google ADK (Agent Development Kit) to orchestrate 6 specialized agents, each powered by Gemini 2.5 Flash:

  1. Triage Agent — Greets the caller, verifies their policy, identifies the claim type, and routes to the right specialist
  2. Maya (Property) — Handles homeowner claims with 18 tools including damage assessment, photo capture, AI annotation, and cost estimation
  3. Alex (Auto) — Vehicle collision, theft, and comprehensive claims
  4. Jordan (Liability) — Bodily injury, premises liability, professional liability
  5. Fraud Sentinel — Runs 11 fraud detection tools in parallel behind every claim
  6. Weather Verifier — Cross-references storm claims against historical weather data

ADK made multi-agent orchestration surprisingly straightforward. Each agent is defined with its own system instructions, tools, and personality. The ADK runner handles agent-to-agent transfers, tool execution routing, and session state — I just had to define the graph.

# Agent definition with ADK
triage_agent = Agent(
    name="claimsight_triage",
    model="gemini-2.5-flash",
    instruction=TRIAGE_INSTRUCTIONS,
    tools=[policy_lookup],
    sub_agents=[maya_property, alex_auto, jordan_liability, fraud_sentinel, weather_verifier],
)
Enter fullscreen mode Exit fullscreen mode

See, Hear, Speak: True Multimodal UX

ClaimSight isn't a chatbot with extras — it's a fundamentally different interaction paradigm.

See — Live Camera Vision

The frontend captures frames from the user's camera at 1 FPS and sends them to the backend via WebSocket. When an agent needs to document damage, it calls photo_capture which grabs the latest frame — complete with a camera shutter flash effect. The captured photo is then sent to Gemini's image generation for AI damage annotation with severity-coded markers.

Hear — Natural Voice Input

Browser Speech Recognition API provides continuous speech-to-text with interim results. The user sees their words appear in real-time as they speak, and can seamlessly switch between voice and text input.

Speak — Distinct Agent Voices

Each agent has their own natural voice via Google Cloud Text-to-Speech using Journey and Neural2 voice models. Maya sounds different from Alex, which sounds different from Jordan. Text is split into sentences and spoken progressively — the chat text reveals in sync with the speech.

The hardest part was echo prevention: the agent's TTS voice would get picked up by the user's microphone and fed back as input. I solved this by pausing speech recognition during TTS playback and resuming with a delay after playback ends.

3-Layer Fraud Detection

This is the innovation I'm most proud of. The Fraud Sentinel runs 11 tools in parallel behind every claim:

Layer 1 — Visual Analysis: Detect AI-generated and manipulated damage photos. Checks for GAN artifacts, inconsistent lighting, and synthetic patterns using detect_ai_generated_image.

Layer 2 — Content Provenance: Verify image origin through C2PA content credentials. Check for stripped or forged metadata, analyze narrative consistency across the claimant's statement.

Layer 3 — Financial Verification: Cross-reference claim details with financial records through Plaid API integration. Detect inflated claims, staged losses, and suspicious patterns.

All three layers feed into a unified calculate_fraud_risk_score that the system uses for triage decisions.

Real-Time Architecture

The backend is a Python FastAPI server with a persistent WebSocket connection per session. Every message flows through a typed protocol:

  • transcript — Chat messages (user and agent)
  • tool_call / tool_result — Real-time tool execution visibility
  • image — AI-generated images (annotations, visualizations, infographics)
  • agent_transfer — Agent handoff animations
  • thinking — Processing indicators

The frontend Agent Brain panel shows every tool call as it happens, with a 9-step progress tracker that fills in as the claim advances. Users can watch the AI think in real-time — it's both a demo feature and a trust mechanism.

Deployment on Google Cloud

The entire system runs on Google Cloud Run with a multi-stage Docker build:

  1. Stage 1: Node.js builds the React frontend
  2. Stage 2: Python serves the FastAPI backend + static frontend files

Infrastructure is managed with Terraform — Cloud Run, Firestore, Artifact Registry, and Cloud Storage are all defined as code and reproducible from the repo.

gcloud run deploy claimsight --source . --region us-central1
Enter fullscreen mode Exit fullscreen mode

Google Cloud services used:

  • Cloud Run — Serverless container hosting
  • Cloud Text-to-Speech — Natural agent voices (Journey + Neural2)
  • Artifact Registry — Docker image storage
  • Cloud Storage — Media and document storage
  • Firestore — Claims database

13 Battle-Tested Scenarios

I didn't just build a demo — I built 13 comprehensive test scenarios covering the most complex claim disputes in US insurance:

  • Hurricane + tornado combo with mixed damage causes
  • Kitchen fire with smoke damage spreading to multiple rooms
  • Multi-vehicle pile-up with disputed fault
  • Hit-and-run with only partial plate number
  • Suspected deepfake damage photos
  • Contractor injury on residential property

Each scenario tests different agent capabilities, tool combinations, and edge cases.

Key Takeaways

  1. Google ADK abstracts away orchestration complexity. I spent my time on agent behavior and tools, not on routing logic.
  2. Gemini 2.5 Flash is fast enough for real-time conversation. Tool calls return in seconds, making the experience feel live.
  3. Cloud TTS Journey voices are transformative. The difference between browser SpeechSynthesis and Cloud TTS is the difference between a toy and a product.
  4. Voice-first UX requires different thinking. Prompts that read well as text sound terrible when spoken. I rewrote agent instructions multiple times for natural speech patterns.
  5. Fraud detection needs depth, not breadth. One detection method is trivially bypassed. Three independent layers make it genuinely hard to game.

Try It


Built for the Gemini Live Agent Challenge. #GeminiLiveAgentChallenge