I Got Distracted Watching a Golang Tutorial. So I Built an Project That Won't Let You.

# webdev# lingodotdev# ai# hackathon

Prateek HItli

How a 3-hour YouTube rabbit hole turned into LingoLearn an AI-powered app that turns any YouTube video into quiz-driven learning in 130+ languages, built with Next.js, Groq, and the Lingo.dev SDK.

I was watching 45-minute Golang concurrency tutorial. Fifteen minutes in, I noticed a thumbnail in the sidebar "10 Things You're Doing Wrong in OpenClaw." Clicked it. Then a video about OpenClaw controlling Kubernetes networking caught my eye. Twenty minutes later I was watching Youtube Shorts.

I never finished the Golang tutorial in that particular time frame, It got carried to next day.

YouTube is the world's largest classroom, but it's designed to distract you. Every sidebar thumbnail is a some kind of distraction. Every "recommended" video costs you 15–30 minutes of re-entry just to get back on track. And if English isn't your first language? You're passively reading auto-captions and hoping something sticks.

That's when I thought: what if the video quizzed you as you watched? What if it paused at key moments, checked your understanding, and wouldn't let you drift off to memes? I got the inspiration from (Udemy and Coursera)

So I built LingoLearn for the Lingo.dev hackathon. And yes, it works in 130+ languages.

Watch the full demo

Github Repo

What did the blank canvas look like before I started coding?

The constraints were clear from day one: this was a Lingo.dev hackathon, which meant the @lingo.dev/_sdk had to be central — not a footnote. I needed it doing real, meaningful work, not just wrapping a single string translation call to pad the integration.

The baseline I committed to before writing a line:

Lingo.dev SDK for all translation and locale detection non-negotiable.
Groq for LLM fast, structured JSON output, cheaper than OpenAI at hackathon scale, I checked Openrouter but then selected groq.
Next.js App Router API routes and other in one repo, no separate backend.
No database a constraint, not an oversight. localStorage and ship faster. In future versions can be improved.

The blank canvas was: a Next.js repo, two API keys, and a YouTube URL.

What problem am I actually solving?

Around 800 million non-native English speakers consume English-language YouTube content daily(got it from X ai). The platform's auto-translate captions are functional but passive you read them, you move on, nothing sticks.

LingoLearn makes the world better by turning passive video consumption into active, provable learning. "I watched a React tutorial" becomes "I passed quizzes on a React tutorial in Hindi." That's the difference between watching and understanding.

How does the high-level architecture work?

Here's the user journey from URL to certificate:

And the four-stage processing pipeline under the hood ingestion, quiz generation, translation, and playback:

The entire thing runs on Next.js App Router. API routes and frontend in one repo. No separate backend. No database(for now). Just localStorage and two API keys. A user pastes a YouTube URL, we extract the transcript via YouTube's InnerTube API, chunk it and send it to Groq for quiz generation, translate everything through the Lingo.dev SDK, and serve it back as an interactive learning session with a video player that pauses at breakpoints for quizzes.

The first 20 minutes of quizzes are generated upfront. Everything else is lazily prefetched in the background as you watch so you never wait.

I have built this for 7-Days Hackathon.

Which file or function contains the actual "magic"?

Two files contain the real intellectual work.

src/lib/ytdlp.ts The magic is the raw node:https call to YouTube's iOS player endpoint. This bypasses Next.js's built-in fetch interception entirely critical because Next.js was silently caching YouTube API responses and returning stale transcripts across different video URLs (more on that nightmare later).

src/lib/lingo.ts The flattening and reconstruction logic for translating nested quiz objects. This is where the Lingo.dev integration goes from "we called the SDK" to "we did something genuinely clever." Nested quiz objects can't be passed directly to localizeObject. So the module walks the breakpoint array, flattens every translatable string into a flat array, translates the whole thing in batched API calls, then rebuilds the nested structure using a pointer. One translation pipeline for an entire lesson's worth of quiz content.

What are the 15–20 lines that represent my biggest breakthrough?

This is from src/lib/ytdlp.ts — the InnerTube transcript extraction. This took longer than any other single piece of the codebase:

async function fetchPlayerData(videoId: string): Promise<Record<string, unknown>> {
  const body = JSON.stringify({
    context: {
      client: {
        clientName: "IOS",
        clientVersion: "20.03.2",
        deviceModel: "iPhone16,2",
        hl: "en",
        gl: "US",
      },
    },
    videoId,
  });
  const text = await rawPost(
    `${YT_BASE}/youtubei/v1/player?prettyPrint=false`,
    body,
    {
      "Content-Type": "application/json",
      "User-Agent": "com.google.ios.youtube/20.03.2 (iPhone16,2; U; CPU iOS 18_2_1 like Mac OS X)",
      "X-YouTube-Client-Name": "5",
      "X-YouTube-Client-Version": "20.03.2",
    }
  );
  return JSON.parse(text) as Record<string, unknown>;
}

What alternative approaches did I consider, and why did I reject them?

yt-dlp CLI binary The gold standard for YouTube extraction. Rejected: requires a Python binary on the server. Breaks immediately on any serverless deployment (Vercel, Netlify). I needed the demo to run everywhere without exotic dependencies.

OpenAI for quiz generation — The obvious choice. Rejected: Groq's llama-3.3-70b-versatile with response_format: { type: "json_object" } is materially faster and cheaper. A 2-second Groq response versus an 8-second OpenAI response is a visible difference when a hackathon judge is watching your demo.

Prisma + PostgreSQL — Considered for about 20 minutes on day one. Rejected: adding a database means adding auth, migrations, connection pooling, and a page of setup instructions. localStorage gets 90% of the value at 5% of the time cost. For a hackathon, that's the right trade-off.

Passing nested objects directly to localizeObject My first instinct. Rejected by reality: the SDK expects flat key-value structures. I had to build the flatten → batch → reconstruct pipeline instead (see below).

Q: What was the most frustrating bug, and what was the undocumented fix?

Next.js silently caching YouTube API responses.

This one cost me hours.

What was happening: paste Video URL A, get the transcript. Go back to homepage, paste Video URL B. The transcript that came back was still Video URL A's content. Different URL, same response.

Every search result said cache: 'no-store'. Added it. Bug persisted. Added revalidate: 0. Still broken. I was convinced my code was wrong for an embarrassingly long time before I realized the framework was working against me.

The actual problem: Next.js's instrumentation layer patches the global fetch inside API routes and applies its own caching/deduplication layer on top, ignoring my cache directives in edge cases related to how the URL was constructed dynamically.

The fix: Abandon fetch entirely for this call and drop down to raw node:https:

async function rawPost(url: string, body: string, headers: Record<string, string>): Promise<string> {
  const { request } = await import("https");
  const parsed = new URL(url);
  return new Promise((resolve, reject) => {
    const bodyBuf = Buffer.from(body, "utf-8");
    const req = request(
      {
        hostname: parsed.hostname,
        path: parsed.pathname + parsed.search,
        method: "POST",
        headers: { ...headers, "Content-Length": bodyBuf.length },
      },
      (res) => {
        const chunks: Buffer[] = [];
        res.on("data", (c: Buffer) => chunks.push(c));
        res.on("end", () => resolve(Buffer.concat(chunks).toString("utf-8")));
      }
    );
    req.on("error", reject);
    req.write(bodyBuf);
    req.end();
  });
}

The caching issue disappeared completely because Next.js only instruments the global fetch — not the Node.js https module. Sometimes the fix is to go under the framework, not around it.

At what point did my clean design break contact with reality?

The localizeObject integration. My original design was elegant: generate quiz breakpoints as a nested array of objects, pass the whole thing to localizeObject, get back a translated version. Clean. Simple. Wrong.

Reality: localizeObject expects a flat key-value structure. Nested arrays of objects with sub-arrays inside them? Nope.

The workaround I built isn't pretty, but it works perfectly:

export async function translateBreakpoints(
  breakpoints: Breakpoint[],
  sourceLocale: string,
  targetLocale: string
): Promise<Breakpoint[]> {
  const engine = getEngine();

  // Step 1: Walk the nested structure, flatten into a string array
  const flatStrings: string[] = [];
  breakpoints.forEach((bp) => {
    flatStrings.push(bp.topic);
    bp.primaryQuestions.forEach((q) => {
      flatStrings.push(q.question);
      flatStrings.push(q.explanation || "");
      q.options.forEach(opt => flatStrings.push(opt));
    });
    bp.retryQuestions.forEach((q) => {
      flatStrings.push(q.question);
      flatStrings.push(q.explanation || "");
      q.options.forEach(opt => flatStrings.push(opt));
    });
  });

  // Step 2: Chunk and translate in parallel batches
  const chunkSize = 50;
  const PARALLEL_BATCH = 3;
  const translatedStrings: string[] = [];
  const chunks: string[][] = [];
  for (let i = 0; i < flatStrings.length; i += chunkSize) {
    chunks.push(flatStrings.slice(i, i + chunkSize));
  }
  for (let i = 0; i < chunks.length; i += PARALLEL_BATCH) {
    const batch = chunks.slice(i, i + PARALLEL_BATCH);
    const results = await Promise.all(
      batch.map(chunk =>
        engine.localizeStringArray(chunk, { sourceLocale, targetLocale })
      )
    );
    results.forEach(r => translatedStrings.push(...r));
  }

  // Step 3: Reconstruct the nested structure using a pointer
  let ptr = 0;
  return breakpoints.map((bp) => {
    const topic = translatedStrings[ptr++];
    const primaryQuestions = bp.primaryQuestions.map((q) => {
      const question = translatedStrings[ptr++];
      const explanation = translatedStrings[ptr++];
      const options = q.options.map(() => translatedStrings[ptr++]);
      return { ...q, question, explanation, options };
    });
    const retryQuestions = bp.retryQuestions.map((q) => {
      const question = translatedStrings[ptr++];
      const explanation = translatedStrings[ptr++];
      const options = q.options.map(() => translatedStrings[ptr++]);
      return { ...q, question, explanation, options };
    });
    return { ...bp, topic, primaryQuestions, retryQuestions };
  });
}

Flatten. Batch translate with localizeStringArray. Reconstruct with a pointer. One translation pipeline for an entire lesson's worth of content. The code looks ugly in a PR, but it's completely reliable in production. Chunking in batches of 50 with 3 parallel requests avoids payload-too-large errors without sacrificing speed.

What trade-offs and technical debt did I accept to cross the finish line?

Being honest:

No database localStorage(for now) means your learning history is trapped on one browser. You can't share certificates via URL, and clearing your browser wipes everything. I'd add Supabase in a heartbeat with more time.
Companion GIF states are identical The 15 pixel companions have a full state machine (idle, celebrating, encouraging), but all three states currently use the same GIF. The wiring is real. The art assets aren't. Calculated scope-cut.
No auth No user accounts means no cross-device sync, no social features, no leaderboards. For a hackathon demo, "paste URL and go" beats "sign up first."
Rough token estimation The transcript chunking uses words * 1.3 as a token approximation. It works well enough, but a proper tokenizer would be more accurate for edge cases.
Rate limit handling is reactive Groq's free tier rate limits get hit at 3am during hackathons. The retry logic (exponential backoff + jitter) handles it, but a queue-based approach would be more robust.

What are the fun extras that give LingoLearn personality?

Pixel-Art Gamification

I wanted LingoLearn to feel like a game, not another soulless corporate ed-tech tool. The combination of retro pixel-art companions (VT323 font, chunky borders) with modern glassmorphism panels creates something that bridges "playing a game" and "learning a language."

There are 15 pixel companions(from soul-knights) wizards, knights, rogues, cats. Each has a state machine: idle when you're watching, celebrating when you nail a quiz, encouraging when you miss one. They follow your cursor and talk to you through speech bubbles. It's silly. It works.

Client-Side Certificate Generation

When you complete all quizzes, LingoLearn generates a downloadable PDF certificate entirely in the browser. html2canvas renders the certificate DOM element (complete with your chosen pixel companion, video title, and completion date) into a canvas, then jsPDF converts it to a downloadable PDF. No server, no sign-up, no data leaving your browser.

Adaptive Quiz Frequency

Not all videos are created equal. A 5-minute explainer needs 2 quizzes. A 2-hour lecture needs 10:

Video Duration	Breakpoints	Questions per Breakpoint
< 10 min	2	2
10–30 min	3–4	2
30–60 min	4–6	3
60–120 min	6–8	3
> 120 min	8–10 (capped)	3

How can someone clone, run, and verify this right now?

git clone https://github.com/Prateek1771/LingoLearn.git
cd LingoLearn
npm install

# Create .env.local
echo "GROQ_API_KEY=your_key_here" >> .env.local
echo "LINGODOTDEV_API_KEY=your_key_here" >> .env.local

npm run dev

Open http://localhost:3000. Paste any YouTube URL with captions. Pick a language. The AI does the rest.

You'll need a Groq API key (free tier works) and a Lingo.dev API key. That's it. No database setup, no Docker, no binary dependencies.

GitHub Repo | Demo Video

If someone forks this today, what's the most obvious missing piece to build next?

Real backend persistence with shareable certificate URLs.

Right now, certificates are generated client-side and downloaded as a PDF. They're not hosted anywhere. The highest leverage addition: a lightweight persistence layer Supabase is the obvious choice that stores completed session summaries and generates unique certificate URLs.

Other ideas for contributors:

Collaborative learning Watch with friends, compete on quiz scores in real-time.
Spaced repetition Resurface quiz questions from past sessions based on forgetting curves.
Custom quiz injection Let teachers add their own questions at specific timestamps.

Acknowledgements

Huge thanks to Lingo.dev for organizing this hackathon and building the SDK that makes LingoLearn's 130+ language support possible. The Lingo.dev SDK isn't a footnote in this project — it powers the entire translation pipeline: quizzes, transcripts, companion dialogue, certificate labels, and UI strings.

Thanks for Soul-knights im using their character gifs

Thanks to Groq for blazing-fast inference that makes real-time quiz generation feel instant. When your demo depends on sub-3-second AI responses, Groq delivers.