Map Any Vertical's Competitive Landscape Using the YC Database (with code)

# competitiveintelligence# marketmapping# ycombinator# strategy
Map Any Vertical's Competitive Landscape Using the YC Database (with code)NexGenData

Founders, BD leads, and M&A analysts use YC's company directory as the most accurate competitive map of any tech vertical. Here's how to filter, score, and rank competitors in 80 lines of Python.

Map Any Vertical's Competitive Landscape Using the YC Database (with code)

Most competitive-intelligence tooling is built for the wrong job. Crunchbase Pro, PitchBook, and CB Insights are optimized for "who raised money last week," not "who is actually building in my vertical." If you're a founder preparing for an investor meeting, a BD lead mapping integration partners, or an M&A analyst hunting roll-up candidates, what you actually need is a current, accurate, deduplicated list of every active company building something adjacent to you — and you need it indexed by industry, sub-industry, tags, team size, and stage.

Y Combinator's company directory turns out to be the single best public dataset for this. It's not the most comprehensive list of tech companies (Crunchbase wins on raw size), but it's the most usefully tagged and the lowest-noise. Every YC company is human-curated, fits a thesis ("we're building X for Y"), is tagged consistently across batches, and is updated quarterly. Compared with Crunchbase's millions of half-stale records, YC's ~5,000 active alumni are a sharper signal at exactly the layer of the market where competitive landscapes get formed.

This post is the playbook and the working code for using YC as a competitive-intelligence database — not a sourcing one. The NexGenData YC Companies Directory actor wraps the data ingestion if you'd rather skip straight to analysis.

Who Actually Uses This (And Why)

The buyer pool for "YC as competitive map" is different from the VC-sourcing buyer pool. Four user profiles dominate:

Founders preparing fundraises. Every investor in 2026 will ask the same question on call one: "Who else is doing this, and why are you better?" The naive answer ("nobody really, we're unique") loses the deal. The good answer names 8-12 competitors, segments them into 2-3 strategic buckets, and explains how the founder's wedge cuts through. YC's directory is where founders build that map. A typical founder pulls every YC company tagged with their 2-3 closest tags, filters to batches W22-onwards (when their thesis became fundable), and ends up with a list of 30-60 companies to position against.

BD and partnerships leads. A partnerships lead at a Series B B2B-SaaS company spends ~30% of their time finding integration targets. The standard process — read TechCrunch, ask sales for asks, check Crunchbase — produces stale lists. Pulling current-batch YC companies tagged with adjacent categories (e.g., a payroll API company wants to find every HRtech, fintech-infra, and embedded-finance YC company in batches S24-S26) routinely surfaces 20-40 integration candidates the BD lead's internal list missed.

M&A analysts and corp-dev teams. Vertical roll-up theses depend on cleanly enumerating the companies in the vertical. For "vertical SaaS in dental practice management" or "AI-powered legal-tech for solo practitioners," YC alumni often represent 30-50% of the addressable acquisition targets — and YC's tagging makes them findable in a way Crunchbase doesn't.

Industry journalists, strategists, and analysts. A piece titled "10 startups defining [vertical] in 2026" is the standard format for industry coverage. The journalist's research process is exactly the use case YC's directory solves: enumerate companies in vertical, filter by stage and team size, and write up the most interesting 8-12.

None of these users are doing pre-seed sourcing. They don't care about being first to the deal. They care about completeness and accuracy of a vertical map.

The YC Database As Competitive Map

The YC company directory has roughly 5,000 active companies across all batches back to 2005. Each company exposes:

  • name, slug, website, description, long_description
  • batch (e.g. "W26", "S25", "W19")
  • industry (one of ~12 top-level categories)
  • subindustry (one of ~80 sub-categories)
  • tags (3-8 specific descriptors per company — the gold layer)
  • team_size, stage, regions, isHiring
  • top_company, nonprofit flags

The tags field is the highest-signal column for competitive mapping. YC's internal taxonomy uses tags like api, developer-tools, vertical-saas, embedded-finance, legal-tech, mlops, gen-ai, agents, voice-ai, synthetic-data. Tags are applied consistently across batches by YC's content team, which means a tag query like legal-tech returns every YC company at any stage that's building legal tooling — exactly the kind of slice a competitive analyst needs.

Fetching the Database

YC's company list page is fully client-rendered against an Algolia search index. The page bundle exposes a hardcoded application ID and a public read-only API key — both visible in browser dev tools. Once you have those, you can query the index directly:

import httpx
import asyncio
from typing import Optional

YC_ALGOLIA_URL = "https://45bwzj1sgc-dsn.algolia.net/1/indexes/YCCompany_production/query"
YC_ALGOLIA_HEADERS = {
    "X-Algolia-Application-Id": "45BWZJ1SGC",
    "X-Algolia-API-Key": "Y2VkOWQyMTJlYjZkZjE3MDRkY2YyNjBmYmIzMjVhMzA1ZmRlYTQ4OTUyZjEyZjRiNzc0OWQ4MjRmMzVlYmUxN3RhZ0ZpbHRlcnM9JTViJTIyJTVEJmZpbHRlcnM9aXNIaXJpbmclM0F0cnVl",
    "Content-Type": "application/json",
}

async def fetch_yc_page(page: int, tag_filter: Optional[str] = None) -> dict:
    """Fetch one page of YC companies, optionally filtered by tag."""
    payload = {
        "query": "",
        "hitsPerPage": 1000,
        "page": page,
    }
    if tag_filter:
        payload["facetFilters"] = [[f"tags:{tag_filter}"]]

    async with httpx.AsyncClient(headers=YC_ALGOLIA_HEADERS, timeout=30) as client:
        r = await client.post(YC_ALGOLIA_URL, json=payload)
        r.raise_for_status()
        return r.json()

async def fetch_all_yc_companies(tag_filter: Optional[str] = None) -> list[dict]:
    """Paginate through every YC company matching the filter."""
    first_page = await fetch_yc_page(0, tag_filter)
    total_pages = first_page.get("nbPages", 1)
    all_hits = list(first_page.get("hits", []))

    for page in range(1, total_pages):
        page_data = await fetch_yc_page(page, tag_filter)
        all_hits.extend(page_data.get("hits", []))

    return all_hits
Enter fullscreen mode Exit fullscreen mode

A full database pull (no filter) yields ~5,000 records in 8-12 seconds. A single-tag pull (e.g. legal-tech) yields 40-200 records in 2-3 seconds. Both are well within "interactive analysis" latency.

Mapping A Vertical: Filtering By Industry + Tags + Team Size

The right filter combination depends on your specific question. For "find every current competitor in voice-AI for customer support," the filter is:

def is_competitor(company: dict, vertical: dict) -> bool:
    # Tag-based vertical match
    company_tags = set(company.get("tags", []))
    if not (company_tags & set(vertical["required_tags"])):
        return False

    # Optional: industry/subindustry narrowing
    if vertical.get("required_industries"):
        if company.get("industry") not in vertical["required_industries"]:
            return False

    # Optional: stage/team-size filtering
    team = company.get("team_size", 0) or 0
    if not (vertical["min_team"] <= team <= vertical["max_team"]):
        return False

    # Exclude dead/inactive companies
    if company.get("stage") == "Inactive":
        return False

    return True

VOICE_AI_CX = {
    "required_tags": ["voice-ai", "customer-support", "conversational-ai"],
    "required_industries": ["B2B"],
    "min_team": 2,
    "max_team": 200,
}
Enter fullscreen mode Exit fullscreen mode

The required_tags are an OR — a company matches if it carries any of the listed tags. Voice-AI for customer support, in 2026, surfaces ~35 active YC companies with this filter. That's the universe of YC-funded competitors. To find non-YC competitors, you'd cross-reference with Crunchbase or Product Hunt; but the YC slice alone is the highest-density 35 names in the vertical.

Scoring By Competitive Proximity

Filtering produces a candidate list. Ranking it by competitive proximity — how directly each candidate competes with your specific positioning — is what turns the list into a useful map. A simple proximity score:

def competitive_proximity_score(company: dict, focal: dict) -> float:
    score = 0.0

    # Tag overlap (highest signal)
    company_tags = set(company.get("tags", []))
    focal_tags = set(focal["tags"])
    tag_overlap = company_tags & focal_tags
    score += len(tag_overlap) * 10  # 10 points per shared tag

    # Sub-industry exact match (next strongest signal)
    if company.get("subindustry") == focal.get("subindustry"):
        score += 15

    # Team-size adjacency (companies at your stage are most direct threats)
    company_team = company.get("team_size", 0) or 0
    focal_team = focal.get("team_size", 10)
    team_ratio = min(company_team, focal_team) / max(company_team, focal_team, 1)
    score += team_ratio * 10  # 0-10 points, peaks when team sizes match

    # Description keyword overlap (catches positioning similarity)
    desc = (company.get("description", "") + " " +
            company.get("long_description", "")).lower()
    keyword_hits = sum(1 for kw in focal["positioning_keywords"]
                       if kw in desc)
    score += min(20, keyword_hits * 4)  # cap at 20

    # Recency multiplier (newer batches = more current threats)
    batch = company.get("batch", "")
    if batch.startswith(("S25", "W26", "S26")):
        score *= 1.3
    elif batch.startswith(("W24", "S24", "W25")):
        score *= 1.1

    return round(score, 1)
Enter fullscreen mode Exit fullscreen mode

The output is a single number per competitor. Sorting your filtered candidate list by this score produces a competitive map ranked from "most directly threatens us" to "adjacent but not direct." For a fundraise deck, you'd typically take the top 8-12 names and segment them into 2-3 strategic buckets ("legacy incumbents," "VC-funded direct competitors," "AI-native startups, like us but earlier").

Full Pipeline: From Tag To Ranked Competitive Map

End-to-end, the workflow is:

async def map_vertical(focal: dict, vertical: dict) -> list[dict]:
    # 1. Pull every YC company tagged with the primary vertical tag
    primary_tag = vertical["required_tags"][0]
    candidates = await fetch_all_yc_companies(tag_filter=primary_tag)

    # 2. Apply filter
    competitors = [c for c in candidates if is_competitor(c, vertical)]

    # 3. Score by proximity
    for c in competitors:
        c["proximity_score"] = competitive_proximity_score(c, focal)

    # 4. Sort
    return sorted(competitors, key=lambda c: c["proximity_score"], reverse=True)


# Example: founder of a voice-AI customer-support startup
FOCAL_COMPANY = {
    "tags": ["voice-ai", "customer-support", "conversational-ai", "agents"],
    "subindustry": "Customer Service",
    "team_size": 6,
    "positioning_keywords": ["agent", "voice", "conversation", "automate",
                              "support", "interrupt", "real-time"],
}

ranked = asyncio.run(map_vertical(FOCAL_COMPANY, VOICE_AI_CX))

print(f"{'Company':<28} {'Batch':<6} {'Team':<5} {'Score':<7}")
for c in ranked[:15]:
    print(f"{c['name']:<28} {c['batch']:<6} "
          f"{c.get('team_size', '?'):<5} {c['proximity_score']:<7}")
Enter fullscreen mode Exit fullscreen mode

For a typical vertical, you'll get a ranked list of 30-60 competitors in under 15 seconds of runtime. The top 10 are the names you need to address in your fundraise deck. The 30-60 range is your full market map. The long tail (positions 60+) is the adjacent space — useful for spotting partnership opportunities or future expansion vectors.

Cross-Referencing: Where YC Data Alone Falls Short

YC's directory is dense and well-tagged, but it has two notable blind spots:

Non-YC competitors. A vertical may have 100 total companies and only 35 in YC. For a fundraise deck, the other 65 matter too — especially the venture-backed series-B-and-later companies that don't appear in YC's directory because they predate YC's vertical expansion or were founded outside the program. Cross-reference YC's slice with a fresh-funding feed like the NexGenData Startup Funding Tracker actor (which pulls SEC Form D filings and TechCrunch funding coverage) to surface the non-YC venture-backed competitors.

Stealth competitors. Companies that haven't publicly disclosed what they're building won't appear in YC's directory under your tags. They might appear in Show HN posts before they ever publish a website. For founders worried about "is anyone stealth-building this?" the NexGenData Show HN Tracker actor surfaces every recent Show HN with positioning text you can keyword-match against your space.

Consumer-product competitors. YC tags consumer products differently from B2B SaaS. If your vertical is consumer-facing, supplement YC's data with NexGenData Product Hunt Launches, which captures consumer launches across all categories and stages.

Investor concentration signal. If you want to know not just "who are my competitors" but "who's investing in this vertical at scale," institutional 13F filings via the NexGenData SEC Form 13F Holdings Tracker actor reveal which crossover funds have built positions in adjacent public-company equivalents — a leading indicator for where they'll deploy private capital next.

How This Compares To Paid Competitive-Intelligence Tools

The default tools for competitive mapping — Crunchbase Pro, PitchBook, CB Insights — were built for a different workflow. Here's how they compare for the specific use case of "build a current, ranked map of competitors in vertical X":

Tool Cost Tag accuracy Update freshness Programmable filter Best for
Crunchbase Pro $99-$999/mo Low (auto-tagged, noisy) 11-day median lag for new entries CSV export, limited API Tracking funding events post-hoc
PitchBook $20K+/yr seat Medium (editorial review) 2-4 week lag for private cos Excel/API for enterprise Enterprise M&A / IPO research
CB Insights $50K+/yr seat High (curated reports) Quarterly market maps Limited; mostly read-only Trend reports & pre-built market maps
YC Directory + Python Free / $0.01 per result via Apify Very high (YC-curated tags) Live (Algolia index) Full Python filter/score logic Founder competitive decks, BD partnership scans, vertical roll-up theses

For a founder building a single competitive deck, paying $999/month for Crunchbase Pro to maybe surface 5 more competitors than YC's free directory is poor ROI. For an enterprise M&A team running 12 vertical mandates in parallel, PitchBook earns its seat — but they should still cross-reference YC for the early-stage layer that PitchBook covers thinly.

The YC + Python combination occupies a niche the paid tools don't serve well: programmatic, current, tag-accurate competitive mapping for a single vertical, at zero or near-zero marginal cost.

The Practical Output

A typical run of this pipeline for "voice-AI customer support" surfaces:

  • ~35 YC-funded competitors after filtering
  • Ranked 1-35 by proximity score
  • Segmentable into 3-4 strategic buckets by batch + team size
  • Exportable to CSV for the appendix of a fundraise deck or a BD partnership tracker
  • Refreshable in under 15 seconds as YC adds new batches

That output, for free or near-free, replaces what most early-stage founders pay for in consulting hours from someone running CB Insights or PitchBook on their behalf. For a BD lead at a Series B SaaS company, it replaces an afternoon of manual research per partnership target. For an M&A analyst, it's the first-pass filter that gets handed to senior analysts for deeper diligence.

When YC Isn't Enough

A final caveat: YC's directory is biased toward US-headquartered, English-language, software-focused companies. If your vertical is bio, hardware, deep-tech, or international consumer, YC's coverage thins out. For those verticals, you'll need to combine YC's slice with industry-specific datasets (clinicaltrials.gov for bio, Crunchbase or PitchBook for international, Hacker News Show HN for deep-tech). YC is the highest-density single source for SaaS, fintech-infra, dev-tools, and AI verticals — and a useful complementary source for everything else.


Tooling Stack

NexGenData publishes 195+ actors for competitive-intelligence and market-mapping workflows. All pay-per-result, no monthly minimum: