Building Hindsight into My Workspace Sidebar

Abhinaya

I thought 'context' just meant using a massive window for everything—until I watched Hindsight...

I thought 'context' just meant using a massive window for everything—until I watched Hindsight synthesize fifty disjointed chat messages into a coherent meeting summary using nothing but relational data and Mongoose population.

What I was trying to build

Over the last six months, I’ve been working on an AI-powered project manager.

The idea is straightforward.
Instead of digging through Slack messages, Jira tickets, or Zoom transcripts, I wanted a system where someone could just ask:

“Why are we still blocked on the Postgres migration?”

and get a direct, useful answer.

The stack itself is fairly standard:

FastAPI on the backend
React on the frontend
SQLAlchemy for structured data

That part wasn’t the problem.

The real difficulty showed up elsewhere.
It wasn’t about managing tasks or designing schemas.

It was about capturing and understanding how the team actually works—the informal decisions, shifting responsibilities, and context that never makes it into a database.

Where things broke: RAG

I started with a standard RAG setup where vector search on chat logs and the top results added to the prompt
It worked for simple queries. It failed for anything dynamic.

Example:

Bob: “I’ll fix the bug tomorrow”
Later: “Alice is taking it”

If the system retrieves only the first message, the answer is wrong.

The issue wasn’t retrieval quality.
It was that the system didn’t understand change over time.

What I needed instead

I needed the system to learn from sequences, not snapshots.
That led me to Hindsight.
Instead of storing text chunks, it builds structured memory that evolves.

How I implemented it

I added two operations:

1. Retain to store events
2. Reflect to synthesize patterns

Every important event goes into memory.

async def record_task_completion(db: Session, task_id: int):
    task = db.query(DBTask).get(task_id)

    memory_client.retain(
        user_id=task.assigned_to,
        text=f"Task '{task.task_name}' completed. AI Rationale: {task.ai_rationale}",
        metadata={"project_id": task.project_id, "type": "task_completion"}
    )

    memory_client.reflect(
        mission="Identify team velocity and skill patterns."
    )

What I got wrong first

I triggered reflection after every event.
That made the system unstable.

Reflection is not storage. It’s inference.
If you run it too early, it draws weak conclusions.

What worked better:

batch events
run reflection after meetings
or after a fixed number of updates

** Turning decisions into memory**

I used to treat decisions as database rows.

Now they exist in two places:

SQL (explicit record) and memory (learned understanding)

Example: “No Friday deployments”

Initially, it was seeded data.
Over time, the system started deriving it from past incidents.
**
Retrieval changed completely**

Instead of pulling raw messages, I started querying for patterns:

learned_patterns = memory_client.recall(
    query="What are our recurring project risks?",
    strategy="relational"
)

The output is not logs.
It’s synthesized observations.

Where this approach helped most

Conflicting information.

Example:

Docs say: “Use OAuth2”
Later decision: “Use API keys for MVP”

RAG may surface outdated docs.

The memory system prioritizes:

newer information
authoritative sources
repeated patterns

So it adapts automatically.

** What the system outputs now**

Instead of generic answers, I get:

Alice is faster on database tasks but currently overloaded
FastAPI is the standard due to async performance
Repeated delays are linked to a specific dependency choice

These are not stored facts.
They are learned conclusions.

What I learned

1. Reflection needs discipline
Run it in batches. Not continuously.

2. SQL and memory solve different problems
SQL answers “what happened”
Memory answers “what it means”

3. Metadata matters
User, project, and event type improve everything downstream

4. This is not search
If I need a past message, I use search
If I need understanding, I use memory

** Final thought**

I stopped thinking of this as an AI wrapper.

It’s a system that learns.

If an AI forgets what happened last week, the issue is not context size.

It’s the lack of a mechanism to reflect and update its understanding.

That’s what Hindsight changed for me.