I Built a Multi-Agent RAG System That Fact-Checks Its Own Answers — Here's How
# ai# python# langchain# opensource
Toheed Asghar
DocForge is an open-source multi-agent RAG system built with LangGraph that routes queries, retrieves documents, synthesizes answers, and validates them — all automatically. Learn how four AI agents work together to eliminate hallucinations.
Every RAG system has the same Achilles' heel: hallucination. You ask a question, it retrieves some documents, and the LLM confidently generates an answer that sounds right but is subtly wrong. No warning, no citation, no second opinion.
I spent weeks building a system that fixes this. DocForge is an open-source multi-agent RAG pipeline where four specialized AI agents collaborate — and one of them exists solely to fact-check the others.
In this post, I'll walk you through the architecture, the problems it solves, and how you can run it yourself.
A RAG pipeline that doesn't trust its own answers. 4 AI agents collaborate to route queries, retrieve docs, synthesize answers, and catch hallucinations automatically.
DocForge
A Multi-Agent Retrieval-Augmented Generation (RAG) system built with LangGraph, featuring intelligent query routing, adaptive retrieval, fact-checking with automatic retry logic, and a FastAPI backend
Key Features
Multi-Agent Architecture
Routing Agent — Classifies query complexity (simple lookup / complex reasoning / multi-hop) and generates an optimized search query for the vector database
Retrieval Agent — Adaptive document retrieval (3-10 docs based on complexity, with relaxed thresholds on retries)
Analysis Agent — Synthesizes coherent, cited answers from multiple sources using chain-of-thought reasoning
Validation Agent — Fact-checks every claim against source documents, identifies hallucinations, and corrects the answer if needed
Intelligent Workflow
Confidence-based validation skip — When retrieval scores are high, sources are sufficient, and no information gaps exist, validation is skipped entirely for faster responses
Automatic retry with adaptive strategy — On validation failure, the system retries retrieval with 50% more documents and a relaxed relevance threshold (up to 3 attempts)
A standard RAG pipeline is straightforward: embed a query, retrieve similar chunks from a vector database, and pass them to an LLM to generate an answer. It works — until it doesn't.
Here are the failure modes I kept hitting:
No query understanding — A simple factual lookup and a complex multi-hop question both get the same retrieval strategy
Fixed retrieval — Always fetching the same number of documents regardless of question complexity
No verification — The LLM's answer is accepted as-is, even when it contradicts or fabricates information beyond the source documents
No recovery — When retrieval fails to find relevant documents, the system has no mechanism to retry with a different strategy
DocForge addresses every one of these with a multi-agent architecture.
The Architecture: Four Agents, One Pipeline
DocForge is built on LangGraph, which orchestrates four specialized agents into a stateful workflow:
Not all questions are equal. "What is LangGraph?" is a simple lookup. "Compare the tradeoffs of LangGraph vs. CrewAI for multi-agent orchestration" requires complex reasoning across multiple sources.
The Routing Agent classifies every incoming query into one of three types:
Simple lookup — Direct factual questions (retrieves 3 documents)
Multi-hop — Questions that chain multiple pieces of information (retrieves 10 documents)
It also rewrites the user's natural-language query into an optimized search query for the vector database, improving retrieval relevance.
2. Retrieval Agent — Adaptive Search
Based on the routing classification, the Retrieval Agent queries Pinecone with the appropriate number of documents and relevance threshold.
The key innovation here is adaptive retry. If the Validation Agent later rejects the answer, retrieval reruns with:
50% more documents than the previous attempt
A relaxed relevance threshold to cast a wider net
This means the system self-corrects when initial retrieval wasn't sufficient.
3. Analysis Agent — The Synthesizer
The Analysis Agent takes the retrieved document chunks and synthesizes a coherent, cited answer using chain-of-thought reasoning. Every claim in the answer is tied back to a specific source document.
4. Validation Agent — The Fact-Checker
This is the agent that makes DocForge different. The Validation Agent independently fact-checks every claim in the synthesized answer against the source documents. It:
Identifies unsupported claims
Detects hallucinated information
Flags contradictions with sources
Provides a corrected answer when issues are found
If validation fails, the system retries from retrieval with an adaptive strategy — up to 3 attempts. If it still fails after maximum retries, it returns the best corrected answer it has.
Smart Optimizations That Matter in Production
Building a multi-agent system that's correct is one thing. Making it fast and cost-effective is another.
Confidence-Based Validation Skip
Not every answer needs fact-checking. When all three conditions are met, DocForge skips the Validation Agent entirely:
Retrieval scores are above 0.85
At least 3 source documents were used
No information gaps were detected
This saves 30–40% latency on high-confidence queries.
Redis Caching
Every query result is cached in Redis with a SHA-256 key and 1-hour TTL. Repeated queries return instantly — roughly 10x faster than a fresh pipeline run, with zero token cost.
Task-Specific Model Selection
Different agents need different capabilities. DocForge lets you assign different models per task:
# Fast, cheap model for simple routing decisionsGEMINI_ROUTING_MODEL=gemini-2.0-flash-lite
# More capable model for complex synthesis and validationGEMINI_ANALYSIS_MODEL=gemini-2.5-flash
GEMINI_VALIDATION_MODEL=gemini-2.5-flash
This cuts token costs by 40–50% compared to using a single expensive model for everything.
Dual LLM Provider Support
DocForge supports both OpenAI GPT (via OpenRouter) and Google Gemini. Switch providers with a single environment variable:
frombackend.agents.graphimportrun_graphresult=run_graph("What is LangGraph?")print(result["fact_checked_answer"])print(f"Sources: {len(result['retrieved_chunks'])} documents")print(f"Validation: {'passed'ifresult['validation_passed']else'corrected'}")print(f"Latency: {result['latency_ms']:.0f}ms")
Or Use the REST API
uvicorn backend.main:app --host 0.0.0.0 --port 8000
curl -X POST http://localhost:8000/api/v1/query \-H"Content-Type: application/json"\-d'{"query": "What is LangGraph?"}'
Docker is also supported:
docker-compose up
The Tech Stack
Component
Technology
Agent Orchestration
LangGraph
LLM Providers
OpenAI GPT-4o-mini (via OpenRouter), Google Gemini 2.5 Flash
Embeddings
OpenAI text-embedding-3-small (1536 dims)
Vector Database
Pinecone (serverless, cosine similarity)
Caching
Redis (SHA-256 keys, 1-hour TTL)
API Framework
FastAPI
LLM Framework
LangChain
Configuration
Pydantic Settings
Containerization
Docker + Docker Compose
What I Learned Building This
A few takeaways from building a multi-agent RAG system:
1. Validation is worth the latency cost. In my testing, the Validation Agent caught hallucinated claims in roughly 15–20% of responses. That's 1 in 5 answers that would have been wrong without it.
2. Adaptive retry is better than aggressive retrieval. Instead of always retrieving 10+ documents (slow, expensive, noisy), start small and retry with more only when needed. Most queries are answered well with 3–5 documents.
3. Caching is a multiplier. In any production Q&A system, users ask similar questions repeatedly. Redis caching turned repeated queries from 3–5 second operations into sub-100ms responses.
4. Different tasks need different models. Routing a query is a simple classification task — it doesn't need GPT-4. Synthesizing a multi-source answer does. Task-specific model assignment is an easy win for cost optimization.
What's Next
DocForge is actively being developed. Here's what's on the roadmap:
Support for more document formats (DOCX, TXT, Markdown, HTML)
Conversation history and multi-turn chat
A frontend UI for non-technical users
Multi-tenancy support
Deployment guides for AWS, Railway, and Render
Try It Out
DocForge is fully open-source under the MIT license. If you're building a RAG system and tired of hallucinated answers, give it a spin:
If you found this useful, a star on the repo would mean a lot. I'm also happy to answer questions in the comments — whether it's about the architecture, LangGraph, or multi-agent systems in general.
Built by Toheed Asghar with LangGraph, LangChain, Pinecone, and FastAPI.