Why Hindsight Replaced My Manual Reminders

# ai# agents# machinelearning# fastapi
Why Hindsight Replaced My Manual RemindersSakshi Shanbhag

Last night I Daydreamed about an AI that could quietly nudge a student team back on track. By...

Last night I Daydreamed about an AI that could quietly nudge a student team back on track. By morning, it had already happened: Hindsight had analyzed our overdue tasks and sent personalized, context‑aware reminders to every teammate who’d stalled.


What We Built
We started as a weekend experiment: a project‑management tool with a real AI backbone — not just another GPT‑4 wrapper with some canned prompts. I wanted a system that could assign tasks intelligently, summarize meetings on its own, track how the team performed over time, and actually remember what happened in the last sprint. Not a Jolt‑or‑Jira clone. Something that gets smarter the longer you use it.

The result is a FastAPI backend with nine distinct routers — auth, projects, teams, tasks, decisions, ai, meetings, integrations, and reports — backed by a SQLite database managed through SQLAlchemy 2.0, and a React + Vite frontend running on :5173. The core AI calls go to either Groq (fast, cheaper inference) or OpenAI depending on the task.
But the piece that makes the whole thing coherent — the thread that connects last‑sprint decisions to today’s recommendations — is Hindsight.


The Moving Parts
Here’s the rough shape of the system:

text
Frontend (React + Vite)
    │
    ▼
FastAPI (main.py) → 9 routers
    │
    ├── SQLite (SQLAlchemy ORM)      ← persistent state
    ├── Groq / OpenAI APIs           ← LLM inference
    └── Hindsight client             ← agent memory layer
Enter fullscreen mode Exit fullscreen mode

The backend is split into routes (HTTP handlers) and services(business logic). The services layer — suggestion_service, insights_service, chat_service, task_service, team_service, auth_service — is where most of the interesting stuff lives. The routes are deliberately thin: they parse requests, call services, and return responses. Standard FastAPI pattern, nothing surprising.

The real story lives in db_models.py — eleven SQLAlchemy models: DBTeamMember, DBTask, DBDecision, DBMeeting, DBMeetingParticipant, DBMeetingTranscript, DBMeetingRecording, DBMeetingSummary, DBIntegration, DBUserIntegration, and DBAutomationRule. That last one — DBAutomationRule — is where the “magic” begins.


The Memory Problem
When I first built the task‑assignment logic, it was stateless. A manager types “Implement OAuth2 login,” and the system suggests Alice because her skills column says “Backend, Python, API.” That works — until Alice is slammed, or until she’s spent two sprints struggling with auth‑related tickets. The system has no memory of that.

My first instinct was to write a pile of SQL queries: track completion rates, flag delays, compute a score per member per skill category. I started down that road and ended up with:

class DBTask(Base):
    __tablename__ = "tasks"
    id = Column(Integer, primary_key=True, index=True)
    task_name = Column(String, index=True)
    assigned_to = Column(String, index=True, nullable=True)
    status = Column(String, default="To Do")   # To Do, In Progress, Completed
    priority = Column(String, default="Medium")
    difficulty = Column(String, default="Medium")
    ai_rationale = Column(String, nullable=True)
    confidence_score = Column(Integer, default=0)
    deadline = Column(DateTime, nullable=True)
    created_at = Column(DateTime, default=lambda: datetime.now(timezone.utc))
Enter fullscreen mode Exit fullscreen mode

The ai_rationale and confidence_score columns were the key addition. Every time the system assigns a task, it stores why — a plain‑text rationale plus a numeric confidence. That gave us an audit trail, but it still didn’t solve the memory problem. If the system suggests Alice for a backend task, saves confidence_score=85, and then she misses the deadline, that outcome is recorded in the tasks table — but the next assignment cycle never reads it or learns from it.

That’s where Hindsight stepped in.


What Hindsight Actually Does
Hindsight is a memory layer for AI agents. You push events to it — task completions, delays, meeting outcomes, team decisions — and it stores them in a vector index. When your agent needs to make a decision, it queries that index and pulls relevant past episodes as context. The LLM sees what happened last time, not just what the current state is.

The integration in this project is a single pip install hindsight-client line in requirements.txt and an API key in .env.
Every event that matters — task completion, a missed deadline, a key architectural decision — gets written to Hindsight. When the suggestion_service runs, it pulls retrieval context from Hindsight before hitting the LLM. The prompt might look like:

Past context from memory:
Alice completed “Build setup database API backend” on time (difficulty: Hard)
Alice delayed “Fix UI dashboard bug” by 1 day (out‑of‑skill assignment)
Bob struggled with “Migrate database schema” (backend task for frontend dev)
Current request: Assign “Implement caching layer for API endpoints”
Team members: Alice (Backend, Python), Bob (Frontend, React), Charlie (DevOps, QA)

That past context is what turns a skill‑keyword match into a genuinely useful recommendation. Without it, you’d assign Bob “Migrate database schema” because he’s on the team and available. With it, you know Bob already tried that once and it went badly.

The agent memory page on Vectorize describes this pattern more formally: agents need episodic memory to improve over time, rather than starting fresh on every decision.


Decisions as First‑Class Citizens
One thing I’m proud of: the system treats architectural decisions as data, not just meeting notes in a Google Doc somewhere.

python
class DBDecision(Base):
    __tablename__ = "decisions"
    id = Column(Integer, primary_key=True, index=True)
    title = Column(String, index=True)
    content = Column(String)       # Full decision text
    decided_by = Column(String)    # User/team who made the decision
    category = Column(String, default="General")  # 'Architecture', 'Process', 'Hiring'
    created_at = Column(DateTime, default=lambda: datetime.now(timezone.utc))
Enter fullscreen mode Exit fullscreen mode

The seed data shows exactly what we needed this for:

python
db.add(DBDecision(
    title="Freeze Friday Deployments",
    content="After 3 incidents, team agreed: no production deployments on Fridays. "
            "Releases to happen Tuesday–Thursday only.",
    decided_by="Charlie",
    category="Process"
))
Enter fullscreen mode Exit fullscreen mode

That decision lives in the database and gets pushed to Hindsight. Now when someone asks AI ProPilot in the chat, “Why can’t we deploy today?”, the system retrieves that exact episode and answers with the real reasoning — not a hallucination, not a lookup in a static FAQ, but the actual decision made by Charlie after three real incidents. The chat_service is what drives this: it queries Hindsight, builds a grounded prompt, and sends it to the LLM.


Meeting Memory: The Transcript Pipeline
The meeting system started as an afterthought and became one of the most useful slices of the product. The schema covers the full lifecycle:

  • DBMeeting — the calendar event
  • DBMeetingParticipant — who attended
  • DBMeetingTranscript — speaker‑attributed transcript lines
  • DBMeetingRecording — URL + duration
  • DBMeetingSummary — AI‑generated overview, key points, action items, next steps

The DBMeetingSummary.action_items column is stored as a JSON‑stringified array of { "task": "...", "assignee": "..." } objects. After every meeting, the system parses the transcript, calls the LLM to extract action items, and writes them back. Those action items then feed into the task‑creation pipeline.

The uncomfortable part: I store JSON directly in a String column rather than a proper join table. It’s faster to query, harder to index. For a team of three, it’s fine. For fifty people with hundreds of meetings, you’d want a real action_items table with foreign keys. That’s a known limitation.


The Automation Rule Layer
The DBAutomationRule model is the infrastructure that makes the overnight reminders from the opening hook actually work:

python
class DBAutomationRule(Base):
    __tablename__ = "automation_rules"
    trigger_type = Column(String)  # 'TASK_COMPLETED', 'MEETING_SCHEDULED', 'DEADLINE_MISSED'
    action_type = Column(String)   # 'CHAT_NOTIFY', 'CREATE_TASK', 'SEND_REMINDER'
    config = Column(String)        # JSON: trigger/action parameters
    is_active = Column(Integer, default=1)
    last_triggered = Column(DateTime, nullable=True)
Enter fullscreen mode Exit fullscreen mode

A rule like trigger: DEADLINE_MISSED → action: SEND_REMINDER is what drives the nightly reminders. When the scheduler sees an overdue task, it fires the rule, pulls the assignee’s history from Hindsight, and generates a context‑aware message — not “hey your task is late,” but:

“Hey, you’ve completed 8 of 9 backend tasks on time; this UI bug has been stalled for 2 days. Want to pair with Bob on it?”

The config column is JSON‑in‑a‑string again. Same tradeoff as the action items.


What Actually Worked, What Didn’t

What worked

  • ai_rationale on every task assignment. Storing the reasoning at creation time is underrated. When you review assignments a week later, you understand exactly why a decision was made — and so does AI ProPilot when it queries the history. -** Hindsight as the connective tissue.** Writing events to a vector memory layer and retrieving them at inference time is simple to implement and has a disproportionate impact on response quality. The LLM gives better answers when it has real episodes to reason from.
  • FastAPI’s dependency injection for auth. Depends(auth_service.get_current_user) is clean and composable. Protected endpoints never forget to authenticate because the pattern makes it structural, not manual.

Treating decisions as first‑class data. The DBDecision table with category, decided_by, and content gives the chat system real grounding material.

What didn’t work as expected

  • JSON in Stringcolumns. Fast to ship, painful to query. If you need to filter action items by assignee, you can’t do it in SQL — you have to pull everything into Python and filter in memory.
  • Groq for long meeting transcripts. Groq is fast, but token‑window constraints forced us to chunk long transcripts before summarizing. OpenAI was more reliable for full‑session summaries.
  • SQLite under concurrent load. check_same_thread=False gets you far, but write contention becomes visible once multiple automation rules fire simultaneously. Postgres with connection pooling is the right move for anything beyond a small team.

Rules I’d Apply Next Time

  • Store LLM rationale alongside every AI decision from day one. It costs nothing and makes the system auditable and debuggable.
  • Use a dedicated memory layer early— Hindsight or something similar. Don’t wait until you’ve hand‑rolled five SQL queries that approximate retrieval.
  • JSON‑in‑a‑string columns are a smell. It’s fine for rapid prototyping, but plan the migration to proper tables before you have real production data.
  • Automation rules need idempotency. last_triggered is a start, but rules that fire twice on the same event create duplicate reminders. Worth designing a proper event‑deduplication mechanism upfront.
  • Separation of concerns between routes and services pays off immediately. The routes in this project are genuinely thin — they just call service functions. When I needed to rewrite the suggestion logic, I touched one file.

What This Thing Has Become
AI ProPilot now handles task assignment, meeting summaries, team insights, and overnight reminders — and the context‑awareness of those outputs is entirely a function of the memory layer. Without Hindsight writing and reading episodic events, it’s just another CRUD app with an LLM bolted on. With it, the agent actually behaves like it knows your team, remembers your past sprints, and gently nudges you back on track — often before you notice something’s gone astray.