Web Scraping Without the Headaches: One API Call to Clean Data

# python# webdev# scraping# tutorial

Ozor

Web scraping usually means installing Puppeteer, fighting with anti-bot systems, rotating proxies,...

Web scraping usually means installing Puppeteer, fighting with anti-bot systems, rotating proxies, and parsing messy HTML. What if you could skip all of that?

curl -X POST "https://agent-gateway-kappa.vercel.app/v1/agent-scraper/scrape" \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://news.ycombinator.com", "format": "markdown"}'

One call. Clean markdown back. No Selenium, no Puppeteer, no proxies.

Get a Free API Key

curl -X POST https://agent-gateway-kappa.vercel.app/api/keys/create

200 free calls, no signup required.

Example 1: Scrape a Blog Post

import requests

API = "https://agent-gateway-kappa.vercel.app"
KEY = "gw_your_key_here"
HEADERS = {
    "Authorization": f"Bearer {KEY}",
    "Content-Type": "application/json"
}

def scrape(url, fmt="markdown"):
    r = requests.post(f"{API}/v1/agent-scraper/scrape",
        headers=HEADERS,
        json={"url": url, "format": fmt})
    return r.json()

# Scrape a blog post as clean markdown
result = scrape("https://example.com/blog/interesting-post")
print(result["content"][:500])

The response gives you extracted text content — no nav bars, no footers, no cookie banners. Just the article.

Example 2: Compare Competitor Pricing

competitors = [
    "https://competitor1.com/pricing",
    "https://competitor2.com/pricing",
    "https://competitor3.com/pricing",
]

for url in competitors:
    data = scrape(url)
    print(f"\n--- {url} ---")
    # Look for price-related content
    for line in data.get("content", "").split("\n"):
        if "$" in line or "price" in line.lower() or "month" in line.lower():
            print(line.strip())

Example 3: Build a News Aggregator

import json

SOURCES = {
    "Hacker News": "https://news.ycombinator.com",
    "Lobsters": "https://lobste.rs",
    "Dev.to": "https://dev.to",
}

all_news = {}
for name, url in SOURCES.items():
    result = scrape(url)
    all_news[name] = {
        "title": result.get("title", ""),
        "content_preview": result.get("content", "")[:300],
        "url": url
    }
    print(f"Scraped {name}: {len(result.get('content', ''))} chars")

# Save aggregated news
with open("news_digest.json", "w") as f:
    json.dump(all_news, f, indent=2)

Example 4: Screenshot + Scrape Combo

The same API key gives you screenshots too:

def screenshot(url, viewport="desktop"):
    r = requests.post(f"{API}/v1/agent-screenshot/screenshot",
        headers=HEADERS,
        json={"url": url, "viewport": viewport})
    return r.json()

# Get both text content AND a visual screenshot
url = "https://example.com"
text_data = scrape(url)
visual = screenshot(url)

print(f"Text: {len(text_data.get('content', ''))} chars")
print(f"Screenshot: {visual.get('url', 'check response')}")

Example 5: Feed Scraped Data to an LLM

Scraping is great, but combining it with an LLM makes it powerful:

def ask_llm(prompt):
    """Use the built-in LLM proxy."""
    r = requests.post(f"{API}/v1/agent-llm/chat",
        headers=HEADERS,
        json={"messages": [{"role": "user", "content": prompt}]})
    return r.json()

# Scrape a page and summarize it
page = scrape("https://blog.example.com/long-technical-post")
content = page.get("content", "")[:3000]

summary = ask_llm(f"Summarize this article in 3 bullet points:\n\n{content}")
print(summary)

Why Not Just Use Puppeteer/Playwright?

You absolutely can. But here's what you skip with an API:

Problem	DIY	API
Browser install	Install Chrome/Chromium	None
Anti-bot detection	Rotate proxies, user agents	Handled
JavaScript rendering	Full browser needed	Handled
Memory usage	200MB+ per browser instance	0
Maintenance	Update selectors when sites change	Not your problem
Scaling	Manage browser pool	Just make more calls

For one-off scripts and small projects, the API saves hours of setup.

Rate Limits and Pricing

Free: 200 credits (1 scrape = 1 credit)
Paid: $1 = 1,000 credits via USDC or Monero
Rate limit: 300 req/min

Check your remaining credits:

curl "https://agent-gateway-kappa.vercel.app/api/keys/balance" \
  -H "Authorization: Bearer YOUR_KEY"

The Full API Catalog

The scraping API is one of 39 services on the same gateway. Same key also gives you:

DNS lookups, GeoIP, URL shortening
PDF generation, code execution, file storage
Crypto prices, wallet operations, on-chain analytics
Webhook testing, email sending, task scheduling

Browse everything: api-catalog-three.vercel.app

Next time you need data from a webpage, try the one-liner before reaching for Puppeteer. You might not need it.