Web Scraping Without the Headaches: One API Call to Clean Data

# python# webdev# scraping# tutorial
Web Scraping Without the Headaches: One API Call to Clean DataOzor

Web scraping usually means installing Puppeteer, fighting with anti-bot systems, rotating proxies,...

Web scraping usually means installing Puppeteer, fighting with anti-bot systems, rotating proxies, and parsing messy HTML. What if you could skip all of that?

curl -X POST "https://agent-gateway-kappa.vercel.app/v1/agent-scraper/scrape" \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://news.ycombinator.com", "format": "markdown"}'
Enter fullscreen mode Exit fullscreen mode

One call. Clean markdown back. No Selenium, no Puppeteer, no proxies.

Get a Free API Key

curl -X POST https://agent-gateway-kappa.vercel.app/api/keys/create
Enter fullscreen mode Exit fullscreen mode

200 free calls, no signup required.

Example 1: Scrape a Blog Post

import requests

API = "https://agent-gateway-kappa.vercel.app"
KEY = "gw_your_key_here"
HEADERS = {
    "Authorization": f"Bearer {KEY}",
    "Content-Type": "application/json"
}

def scrape(url, fmt="markdown"):
    r = requests.post(f"{API}/v1/agent-scraper/scrape",
        headers=HEADERS,
        json={"url": url, "format": fmt})
    return r.json()

# Scrape a blog post as clean markdown
result = scrape("https://example.com/blog/interesting-post")
print(result["content"][:500])
Enter fullscreen mode Exit fullscreen mode

The response gives you extracted text content — no nav bars, no footers, no cookie banners. Just the article.

Example 2: Compare Competitor Pricing

competitors = [
    "https://competitor1.com/pricing",
    "https://competitor2.com/pricing",
    "https://competitor3.com/pricing",
]

for url in competitors:
    data = scrape(url)
    print(f"\n--- {url} ---")
    # Look for price-related content
    for line in data.get("content", "").split("\n"):
        if "$" in line or "price" in line.lower() or "month" in line.lower():
            print(line.strip())
Enter fullscreen mode Exit fullscreen mode

Example 3: Build a News Aggregator

import json

SOURCES = {
    "Hacker News": "https://news.ycombinator.com",
    "Lobsters": "https://lobste.rs",
    "Dev.to": "https://dev.to",
}

all_news = {}
for name, url in SOURCES.items():
    result = scrape(url)
    all_news[name] = {
        "title": result.get("title", ""),
        "content_preview": result.get("content", "")[:300],
        "url": url
    }
    print(f"Scraped {name}: {len(result.get('content', ''))} chars")

# Save aggregated news
with open("news_digest.json", "w") as f:
    json.dump(all_news, f, indent=2)
Enter fullscreen mode Exit fullscreen mode

Example 4: Screenshot + Scrape Combo

The same API key gives you screenshots too:

def screenshot(url, viewport="desktop"):
    r = requests.post(f"{API}/v1/agent-screenshot/screenshot",
        headers=HEADERS,
        json={"url": url, "viewport": viewport})
    return r.json()

# Get both text content AND a visual screenshot
url = "https://example.com"
text_data = scrape(url)
visual = screenshot(url)

print(f"Text: {len(text_data.get('content', ''))} chars")
print(f"Screenshot: {visual.get('url', 'check response')}")
Enter fullscreen mode Exit fullscreen mode

Example 5: Feed Scraped Data to an LLM

Scraping is great, but combining it with an LLM makes it powerful:

def ask_llm(prompt):
    """Use the built-in LLM proxy."""
    r = requests.post(f"{API}/v1/agent-llm/chat",
        headers=HEADERS,
        json={"messages": [{"role": "user", "content": prompt}]})
    return r.json()

# Scrape a page and summarize it
page = scrape("https://blog.example.com/long-technical-post")
content = page.get("content", "")[:3000]

summary = ask_llm(f"Summarize this article in 3 bullet points:\n\n{content}")
print(summary)
Enter fullscreen mode Exit fullscreen mode

Why Not Just Use Puppeteer/Playwright?

You absolutely can. But here's what you skip with an API:

Problem DIY API
Browser install Install Chrome/Chromium None
Anti-bot detection Rotate proxies, user agents Handled
JavaScript rendering Full browser needed Handled
Memory usage 200MB+ per browser instance 0
Maintenance Update selectors when sites change Not your problem
Scaling Manage browser pool Just make more calls

For one-off scripts and small projects, the API saves hours of setup.

Rate Limits and Pricing

  • Free: 200 credits (1 scrape = 1 credit)
  • Paid: $1 = 1,000 credits via USDC or Monero
  • Rate limit: 300 req/min

Check your remaining credits:

curl "https://agent-gateway-kappa.vercel.app/api/keys/balance" \
  -H "Authorization: Bearer YOUR_KEY"
Enter fullscreen mode Exit fullscreen mode

The Full API Catalog

The scraping API is one of 39 services on the same gateway. Same key also gives you:

  • DNS lookups, GeoIP, URL shortening
  • PDF generation, code execution, file storage
  • Crypto prices, wallet operations, on-chain analytics
  • Webhook testing, email sending, task scheduling

Browse everything: api-catalog-three.vercel.app


Next time you need data from a webpage, try the one-liner before reaching for Puppeteer. You might not need it.