I Built a Multi-Agent RAG System That Fact-Checks Its Own Answers — Here's How

I Built a Multi-Agent RAG System That Fact-Checks Its Own Answers — Here's How

# ai# python# langchain# opensource
I Built a Multi-Agent RAG System That Fact-Checks Its Own Answers — Here's HowToheed Asghar

DocForge is an open-source multi-agent RAG system built with LangGraph that routes queries, retrieves documents, synthesizes answers, and validates them — all automatically. Learn how four AI agents work together to eliminate hallucinations.

Every RAG system has the same Achilles' heel: hallucination. You ask a question, it retrieves some documents, and the LLM confidently generates an answer that sounds right but is subtly wrong. No warning, no citation, no second opinion.

I spent weeks building a system that fixes this. DocForge is an open-source multi-agent RAG pipeline where four specialized AI agents collaborate — and one of them exists solely to fact-check the others.

In this post, I'll walk you through the architecture, the problems it solves, and how you can run it yourself.

GitHub logo ToheedAsghar / DocForge

A RAG pipeline that doesn't trust its own answers. 4 AI agents collaborate to route queries, retrieve docs, synthesize answers, and catch hallucinations automatically.

DocForge

A Multi-Agent Retrieval-Augmented Generation (RAG) system built with LangGraph, featuring intelligent query routing, adaptive retrieval, fact-checking with automatic retry logic, and a FastAPI backend

Python LangGraph License

DocForge Demo

Key Features

Multi-Agent Architecture

  • Routing Agent — Classifies query complexity (simple lookup / complex reasoning / multi-hop) and generates an optimized search query for the vector database
  • Retrieval Agent — Adaptive document retrieval (3-10 docs based on complexity, with relaxed thresholds on retries)
  • Analysis Agent — Synthesizes coherent, cited answers from multiple sources using chain-of-thought reasoning
  • Validation Agent — Fact-checks every claim against source documents, identifies hallucinations, and corrects the answer if needed

Intelligent Workflow

  • Confidence-based validation skip — When retrieval scores are high, sources are sufficient, and no information gaps exist, validation is skipped entirely for faster responses
  • Automatic retry with adaptive strategy — On validation failure, the system retries retrieval with 50% more documents and a relaxed relevance threshold (up to 3 attempts)

Why Traditional RAG Falls Short

A standard RAG pipeline is straightforward: embed a query, retrieve similar chunks from a vector database, and pass them to an LLM to generate an answer. It works — until it doesn't.

Here are the failure modes I kept hitting:

  1. No query understanding — A simple factual lookup and a complex multi-hop question both get the same retrieval strategy
  2. Fixed retrieval — Always fetching the same number of documents regardless of question complexity
  3. No verification — The LLM's answer is accepted as-is, even when it contradicts or fabricates information beyond the source documents
  4. No recovery — When retrieval fails to find relevant documents, the system has no mechanism to retry with a different strategy

DocForge addresses every one of these with a multi-agent architecture.


The Architecture: Four Agents, One Pipeline

DocForge is built on LangGraph, which orchestrates four specialized agents into a stateful workflow:

User Query
    │
    ▼
┌─────────────────┐
│   Redis Cache    │ ◄── Check cache first (SHA-256 key)
└────────┬────────┘
         │ (cache miss)
         ▼
┌─────────────────┐
│  Routing Agent   │ ◄── Classify complexity, optimize search query
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Retrieval Agent  │ ◄── Fetch 3–10 docs from Pinecone
└────────┬────────┘     (50% more on retry, relaxed threshold)
         │
         ▼
┌─────────────────┐
│ Analysis Agent   │ ◄── Synthesize cited answer (chain-of-thought)
└────────┬────────┘
         │
         ▼
    Confidence Check
    │
    ├── High confidence ──▶ Skip validation ──▶ Return & Cache
    │
    └── Otherwise:
         │
         ▼
    ┌─────────────────┐
    │ Validation Agent │ ◄── Fact-check every claim
    └────────┬────────┘
             │
             ▼
        ├── Valid           ──▶ Return & Cache
        ├── Invalid (< 3)  ──▶ Retry from Retrieval (adaptive)
        └── Invalid (≥ 3)  ──▶ Return corrected answer & Cache
Enter fullscreen mode Exit fullscreen mode

Let me break down each agent.

1. Routing Agent — The Dispatcher

Not all questions are equal. "What is LangGraph?" is a simple lookup. "Compare the tradeoffs of LangGraph vs. CrewAI for multi-agent orchestration" requires complex reasoning across multiple sources.

The Routing Agent classifies every incoming query into one of three types:

  • Simple lookup — Direct factual questions (retrieves 3 documents)
  • Complex reasoning — Questions requiring synthesis across sources (retrieves 7 documents)
  • Multi-hop — Questions that chain multiple pieces of information (retrieves 10 documents)

It also rewrites the user's natural-language query into an optimized search query for the vector database, improving retrieval relevance.

2. Retrieval Agent — Adaptive Search

Based on the routing classification, the Retrieval Agent queries Pinecone with the appropriate number of documents and relevance threshold.

The key innovation here is adaptive retry. If the Validation Agent later rejects the answer, retrieval reruns with:

  • 50% more documents than the previous attempt
  • A relaxed relevance threshold to cast a wider net

This means the system self-corrects when initial retrieval wasn't sufficient.

3. Analysis Agent — The Synthesizer

The Analysis Agent takes the retrieved document chunks and synthesizes a coherent, cited answer using chain-of-thought reasoning. Every claim in the answer is tied back to a specific source document.

4. Validation Agent — The Fact-Checker

This is the agent that makes DocForge different. The Validation Agent independently fact-checks every claim in the synthesized answer against the source documents. It:

  • Identifies unsupported claims
  • Detects hallucinated information
  • Flags contradictions with sources
  • Provides a corrected answer when issues are found

If validation fails, the system retries from retrieval with an adaptive strategy — up to 3 attempts. If it still fails after maximum retries, it returns the best corrected answer it has.


Smart Optimizations That Matter in Production

Building a multi-agent system that's correct is one thing. Making it fast and cost-effective is another.

Confidence-Based Validation Skip

Not every answer needs fact-checking. When all three conditions are met, DocForge skips the Validation Agent entirely:

  • Retrieval scores are above 0.85
  • At least 3 source documents were used
  • No information gaps were detected

This saves 30–40% latency on high-confidence queries.

Redis Caching

Every query result is cached in Redis with a SHA-256 key and 1-hour TTL. Repeated queries return instantly — roughly 10x faster than a fresh pipeline run, with zero token cost.

Task-Specific Model Selection

Different agents need different capabilities. DocForge lets you assign different models per task:

# Fast, cheap model for simple routing decisions
GEMINI_ROUTING_MODEL=gemini-2.0-flash-lite

# More capable model for complex synthesis and validation
GEMINI_ANALYSIS_MODEL=gemini-2.5-flash
GEMINI_VALIDATION_MODEL=gemini-2.5-flash
Enter fullscreen mode Exit fullscreen mode

This cuts token costs by 40–50% compared to using a single expensive model for everything.

Dual LLM Provider Support

DocForge supports both OpenAI GPT (via OpenRouter) and Google Gemini. Switch providers with a single environment variable:

LLM_PROVIDER=gemini  # or "gpt"
Enter fullscreen mode Exit fullscreen mode

Getting Started in 5 Minutes

Prerequisites

Installation

git clone https://github.com/ToheedAsghar/DocForge.git
cd DocForge
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Configuration

Create a .env file:

LLM_PROVIDER=gemini
GEMINI_API_KEY=your-gemini-key
PINECONE_API_KEY=your-pinecone-key
PINECONE_ENVIRONMENT=us-east-1
PINECONE_INDEX_NAME=techdoc-intelligence
REDIS_URL=redis://localhost:6379
CACHE_ENABLED=true
Enter fullscreen mode Exit fullscreen mode

Ingest Your Documents

from backend.ingestion.pipeline import ingest_documents

stats = ingest_documents("./documents/", chunk_size=1000, chunk_overlap=200)
print(f"Ingested {stats['documents_loaded']} PDFs → {stats['chunks_created']} chunks")
Enter fullscreen mode Exit fullscreen mode

Query the System

from backend.agents.graph import run_graph

result = run_graph("What is LangGraph?")

print(result["fact_checked_answer"])
print(f"Sources: {len(result['retrieved_chunks'])} documents")
print(f"Validation: {'passed' if result['validation_passed'] else 'corrected'}")
print(f"Latency: {result['latency_ms']:.0f}ms")
Enter fullscreen mode Exit fullscreen mode

Or Use the REST API

uvicorn backend.main:app --host 0.0.0.0 --port 8000

curl -X POST http://localhost:8000/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What is LangGraph?"}'
Enter fullscreen mode Exit fullscreen mode

Docker is also supported:

docker-compose up
Enter fullscreen mode Exit fullscreen mode

The Tech Stack

Component Technology
Agent Orchestration LangGraph
LLM Providers OpenAI GPT-4o-mini (via OpenRouter), Google Gemini 2.5 Flash
Embeddings OpenAI text-embedding-3-small (1536 dims)
Vector Database Pinecone (serverless, cosine similarity)
Caching Redis (SHA-256 keys, 1-hour TTL)
API Framework FastAPI
LLM Framework LangChain
Configuration Pydantic Settings
Containerization Docker + Docker Compose

What I Learned Building This

A few takeaways from building a multi-agent RAG system:

1. Validation is worth the latency cost. In my testing, the Validation Agent caught hallucinated claims in roughly 15–20% of responses. That's 1 in 5 answers that would have been wrong without it.

2. Adaptive retry is better than aggressive retrieval. Instead of always retrieving 10+ documents (slow, expensive, noisy), start small and retry with more only when needed. Most queries are answered well with 3–5 documents.

3. Caching is a multiplier. In any production Q&A system, users ask similar questions repeatedly. Redis caching turned repeated queries from 3–5 second operations into sub-100ms responses.

4. Different tasks need different models. Routing a query is a simple classification task — it doesn't need GPT-4. Synthesizing a multi-source answer does. Task-specific model assignment is an easy win for cost optimization.


What's Next

DocForge is actively being developed. Here's what's on the roadmap:

  • Support for more document formats (DOCX, TXT, Markdown, HTML)
  • Conversation history and multi-turn chat
  • A frontend UI for non-technical users
  • Multi-tenancy support
  • Deployment guides for AWS, Railway, and Render

Try It Out

DocForge is fully open-source under the MIT license. If you're building a RAG system and tired of hallucinated answers, give it a spin:

GitHub: github.com/ToheedAsghar/DocForge

If you found this useful, a star on the repo would mean a lot. I'm also happy to answer questions in the comments — whether it's about the architecture, LangGraph, or multi-agent systems in general.


Built by Toheed Asghar with LangGraph, LangChain, Pinecone, and FastAPI.