Sitemap Parser That Auto-Discovers from robots.txt

# seo# xml# javascript# webdev
Sitemap Parser That Auto-Discovers from robots.txtАлексей Спинов

Most websites have sitemaps, but finding them can be tricky. Here's a parser that auto-discovers. ...

Most websites have sitemaps, but finding them can be tricky. Here's a parser that auto-discovers.

Discovery Logic

  1. Check robots.txt for Sitemap: directive
  2. Try common paths: /sitemap.xml, /sitemap_index.xml
  3. Parse XML with cheerio xmlMode
  4. Handle sitemap indexes recursively

Recursive Parsing

Sitemap indexes contain links to child sitemaps:

<sitemapindex>
  <sitemap><loc>https://site.com/sitemap-1.xml</loc></sitemap>
  <sitemap><loc>https://site.com/sitemap-2.xml</loc></sitemap>
</sitemapindex>
Enter fullscreen mode Exit fullscreen mode

Parse each child, aggregate all URLs.

Output

{
  "url": "https://stripe.com/sitemap.xml",
  "lastmod": "2026-03-15",
  "changefreq": "weekly",
  "priority": 0.8
}
Enter fullscreen mode Exit fullscreen mode

Stripe.com has 4,817 URLs across 6 child sitemaps.

I built a Sitemap Parser on Apify — search knotless_cadence sitemap.