Building Hindsight Without a Vector Database

Sharvani Sekar

"Did it seriously just link that three-week-old meeting note to today's board delay?" I stared at the...

"Did it seriously just link that three-week-old meeting note to today's board delay?" I stared at the logs as Hindsight correctly identified a project bottleneck without performing a single vector embedding lookup.

That moment was the payoff of a bet I'd been running for about two weeks: I wanted to build a project management assistant that could do real temporal reasoning—connecting old decisions to current outcomes—without spinning up a Postgres instance with pgvector, without provisioning a Pinecone index, and without any of the operational headaches of a dedicated vector store. What I ended up with was a FastAPI + React app called AI ProPilot, powered by Hindsight for persistent agent memory, and a newfound respect for how much hassle that choice dodged.

What the System Does
AI ProPilot is a team project management tool where the AI pulls real weight, not just sits there looking pretty. You can create tasks, assign them to teammates, log architectural decisions, schedule meetings, and then dig into it all through a natural-language chat. Ask "why is the backend delayed?" and it explains—not by running a graph query, but by pulling up stored task events, checking them against the live project state, and piecing together an answer with an LLM.

The core pieces are simple enough:

-FastAPI backend on port 8000, split into eight route modules like tasks, teams, decisions, meetings, integrations, ai, auth, and projects.
-React + Vite frontend on port 5173, as a single-page app with routing handled in App.jsx.
-Socket Nexus, a lightweight Socket.IO server on port 4000 for pushing real-time task updates to the dashboard.
-Groq with the llama-3.3-70b-versatile model for fast LLM inference.
-Hindsight handling persistent memory that sticks around between sessions.
That memory piece is the real star here.

The Memory Problem Nobody Tells You About
When I kicked off the AI features, every guide I saw pushed the usual script: embed your data, dump it in a vector DB, pull back by cosine similarity, stuff it into context. It's neat and makes sense, but it's a pile of infra to wrangle for a prototype.

My needs were different: the LLM had to remember actual events—like tasks finishing, decisions logged, deadlines blown—and reason across that timeline for questions. The breakthrough was realizing event history doesn't demand semantic vector pulls. It needs retrieval by relevance, like "what do you know about backend delays?"—and that's Hindsight's sweet spot.

TheHindsight docs call it a managed memory layer for agents: you retain facts, recall them via natural-language queries, and it manages the indexing magic under the hood. Just two methods in the API.
Here's the full memory.py—no ORMs, no schemas, no migrations:

# services/memory.py
async def store_memory(data: dict):
    text = str(data)
    if hs:
        await hs.retain(bank_id=HINDSIGHT_BANK_ID, content=text)

async def retrieve_memory(query: str):
    if hs:
        res_or_coro = hs.recall(bank_id=HINDSIGHT_BANK_ID, query=query)
        # SDK's recall() can be sync or async depending on context
        if hasattr(res_or_coro, "__await__") or asyncio.iscoroutine(res_or_coro):
            response = await res_or_coro
        else:
            response = res_or_coro
        if hasattr(response, 'results'):
            return [r.text for r in response.results]
    return []

Yeah, that await-detection hack isn't pretty. More on it later.

HINDSIGHT_BANK_ID is a simple string namespace—like "propilot_brain" here, pulled from .env. It's basically your memory collection; you could use separate ones for projects or users.

Writing to Memory: At the Seams of State Changes
The smart call was where to trigger store_memory—not in middleware or a cron job, but right inside the state-change functions. Every task shift, from create to done, gets serialized and shoved into Hindsight.

# services/task_service.py
async def mark_task_completed(db: Session, task_id: int) -> TaskItem:
    # ... update DB status ...

    completed_on_time = now <= task_deadline if db_task.deadline else None

    memory_payload = {
        "event": "task_completed",
        "task_name": db_task.task_name,
        "assigned_to": db_task.assigned_to,
        "priority": db_task.priority,
        "difficulty": db_task.difficulty,
        "completed_on_time": completed_on_time,
        "time_taken_seconds": time_taken
    }
    await store_memory(memory_payload)

Same deal for creations, tossing in AI rationale:

memory_payload = {
    "event": "task_created",
    "task_name": db_task.task_name,
    "ai_rationale": db_task.ai_rationale,  # why AI picked this assignee
    "deadline": str(db_task.deadline),
}
await store_memory(memory_payload)

That's how Hindsight connected that old "Freeze Friday Deployments" note to a recent task stall—both popped up together in an LLM context.

OpenClaw: The Retrieval Middleware
I built a little wrapper class called OpenClaw to bridge LLM calls and memory:

# services/openclaw_service.py
class OpenClaw:
    @staticmethod
    async def query_memory(query: str) -> str:
        memories = await retrieve_memory(query)
        if not memories:
            return "No relevant past decisions or notes found for this query."

        context = "### RELEVANT PROJECT CONTEXT (from Hindsight):\n"
        for i, mem in enumerate(memories):
            text = mem if isinstance(mem, str) else getattr(mem, 'text', str(mem))
            context += f"{i+1}. {text}\n"
        return context

In chat_service.py, it fires before every LLM hit:

async def process_chat_query(db: Session, query: str) -> dict:
    context = ""
    context += await OpenClaw.query_memory(query)  # Hindsight first

    if any(kw in query.lower() for kw in ["risk", "bottleneck", "analytics"]):
        insights = await get_insights(db)
        context += f"\n### REAL-TIME PROJECT ANALYTICS:\n"
        context += f"- Most Delayed: {insights['most_delayed_member']}\n"
        context += f"- Risk Insights: {', '.join(insights['risk_insights'])}\n"

    response = client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user",   "content": f"CONTEXT:\n{context}\n\nQUERY: {query}"}
        ],
        temperature=0.2,
    )

Hindsight feeds history, DB gives now-state, all mashed into one context. LLM doesn't care about sources. Low temp (0.2) keeps it from wandering off with good data.

What Actually Broke
Async got messy. That await sniff in memory.py? It's because hs.recall() flips between coroutine and not, based on event loop presence (Uvicorn vs. tests). Dev was full of "coroutine never awaited" warnings from missed awaits upstream—one slip, and results vanish. Moral: audit async chains hard if your SDK mixes sync/async.

Keyword classification is brittle. insights_service.py buckets tasks as backend/frontend/devops by scanning task_name for "database", "api", "react". Fine for demos, flops on "Fix Tuesday's mess." Better: add domain as a DBTask field and pipe it straight to payloads.

Practical Behavior Now
Seed via /seed, and chat nails:

1."Who is overloaded?" Grabs Hindsight completions, crunches delays per person, reports straight.

2."Why did we choose FastAPI?" Digs up the log: "Team picked FastAPI over Django REST for async speed and auto OpenAPI docs."

3."What are our current blockers?" Blends recall("task completion report") with fresh DB, spits risk summary.

LLM can even act—outputs [ACTION] JSON like {"type": "CREATE_TASK", ...}, which chat_service parses, hits DB, broadcasts via Socket.IO for instant dashboard refresh.
Lessons, Specifically
1.Memory-at-the-seam beats middleware. Storing inside mark_task_completed ensures events never miss; no "did the job run?" doubts from background syncs.

Two-method APIs rock. retain() and recall() slash ops costs vs. self-hosting vectors.
Payloads like structured logs. Dict shapes match what insights_service.py expects: json.loads(text), grab fields—no NLP mess.
Namespace banks upfront. "propilot_brain" works solo; prod needs propilot_{project_id} to dodge multi-user crosstalk.

5.Hybrid context wins. Best answers fuse Hindsight history with SQL now-state. One without the other falls short.

The system's rough—Outlook sync half-done, rules stubby, bot skeletal. But memory delivers: a PM tool where AI links your history to headaches is leagues beyond chat-on-tasks.
Hindsight made it doable without its own saga.