Emil## The Problem Nobody Talks About Ask your RAG system: "What award did the director of Inception...
## The Problem Nobody Talks About
Ask your RAG system: "What award did the director of Inception win?"
This requires two hops:
Your retrieval engine does hop 1 fine. But hop 2? The embedding of the original query is nowhere near "Academy Award" in vector space. The answer sits at rank 665. Your top-20 retrieval window never sees it.
We tested this systematically on HotpotQA fullwiki — 5.2M Wikipedia articles, 500 multi-hop questions.
Every traditional method scored 0% Hit@20. BM25. Dense retrieval. Rerankers. All of them.
In 1958, Daniel Koshland proposed the induced-fit model of enzyme binding. Unlike the rigid "lock and key" model, enzymes change their shape to fit the substrate.
We applied the same principle to retrieval.
At each hop, IFR mutates the query embedding based on what it just found. The query literally reshapes itself to reach the next piece of evidence.
Query → [hop 1: find Film X] → mutate → [hop 2: find director] → mutate → [hop 3: find award] → found
This sounds elegant on paper. In practice, v1 was a disaster.
67% of failures came from catastrophic drift — the query mutated so aggressively that by hop 3, it had lost >80% of its original meaning. It was finding documents, but completely wrong ones.
We tested 8 drift correction approaches:
Most made things worse. The winner was embarrassingly simple:
# Blend 50% of original query at every hop
query_vector = 0.5 * mutated + 0.5 * original
# Hard reset if drift exceeds threshold
if cosine_sim(query_vector, original) < 0.5:
query_vector = original
Two lines of code. nDCG went from 0.197 to 0.317 (+61%).
Tested on HotpotQA fullwiki: 5.2M Wikipedia articles, 500 questions, 3 random seeds, single RTX 3060.
| Method | R@5 | R@10 | MRR |
|---|---|---|---|
| RAG-rerank baseline | 0.337 | 0.337 | 0.548 |
| IFR-hybrid+CE | 0.366 | 0.366 | 0.554 |
| Delta | +2.9% (p=0.0002) | +2.9% | +0.6% |
R@5 = R@10 because IFR surfaces all retrievable targets within the top 5 — ranks 6–10 add no new hits at this difficulty level.
Scaling: O(1) latency — 100x data growth = 1.1x latency growth. Beam traversal takes ~10ms on the full 5.2M corpus.
Raw beam search R@5 = 0.309. With cross-encoder reranking: 0.366 (+5.7 points).
The insight: drift noise scores high against the mutated query but low against the original. So the cross-encoder naturally filters it. Trying to eliminate drift at the beam level gives diminishing returns. The multi-layer pipeline is the actual solution.
Question for the community:
We fixed drift with a static 50% anchor blend — but this feels like a brute-force solution. Has anyone worked on adaptive blending that adjusts the anchor weight based on query complexity or hop distance? Curious what approaches you've tried.