Give Your AI Agent Superpowers: Screenshots, Scraping, Code Execution, and 36 More APIs

# ai# python# machinelearning# tutorial
Give Your AI Agent Superpowers: Screenshots, Scraping, Code Execution, and 36 More APIsOzor

Most AI agents are chatbots with extra steps. They can generate text, but they can't do anything....

Most AI agents are chatbots with extra steps. They can generate text, but they can't do anything. They can't take screenshots, look up DNS records, check crypto prices, run code, or generate PDFs.

Here's how to fix that with one API key and a few Python functions.

The Problem

Your LLM can reason, but it has no tools. It can say "I'll check the website" but it can't actually load a webpage. It can say "let me calculate" but it can't run code. It's stuck generating text about actions instead of taking them.

The Solution: A Tool API

One API key gives your agent 39 real-world capabilities:

curl -X POST https://agent-gateway-kappa.vercel.app/api/keys/create
Enter fullscreen mode Exit fullscreen mode
{"key": "gw_abc123...", "credits": 200}
Enter fullscreen mode Exit fullscreen mode

No signup. 200 free calls. Here's what your agent can now do.

Tool 1: See the Web (Screenshots)

import requests

API = "https://agent-gateway-kappa.vercel.app"
KEY = "gw_your_key_here"
H = {"Authorization": f"Bearer {KEY}", "Content-Type": "application/json"}

def take_screenshot(url, viewport="desktop"):
    """Give your agent eyes — screenshot any webpage."""
    r = requests.post(f"{API}/v1/agent-screenshot/screenshot",
        headers=H, json={"url": url, "viewport": viewport})
    return r.json()
Enter fullscreen mode Exit fullscreen mode

Your agent can now visually verify deployments, check competitor sites, or capture evidence.

Tool 2: Read the Web (Scraping)

def scrape_page(url, fmt="markdown"):
    """Extract clean content from any URL."""
    r = requests.post(f"{API}/v1/agent-scraper/scrape",
        headers=H, json={"url": url, "format": fmt})
    return r.json()
Enter fullscreen mode Exit fullscreen mode

Returns clean markdown — no nav bars, no ads. Feed this directly into your LLM's context.

Tool 3: Run Code (Sandboxed)

def execute_code(code, language="python"):
    """Run code in a secure sandbox."""
    r = requests.post(f"{API}/v1/agent-coderunner/execute",
        headers=H, json={"language": language, "code": code})
    return r.json()
Enter fullscreen mode Exit fullscreen mode

Your agent can now write AND run code. Data analysis, calculations, file processing — all sandboxed.

Tool 4: Look Up IPs and Domains

def geolocate(ip):
    """Get location, ISP, timezone for any IP."""
    r = requests.get(f"{API}/v1/agent-geo/geo/{ip}", headers=H)
    return r.json()

def dns_lookup(domain):
    """Resolve DNS records for any domain."""
    r = requests.get(f"{API}/v1/agent-dns/resolve/{domain}", headers=H)
    return r.json()
Enter fullscreen mode Exit fullscreen mode

Tool 5: Crypto & DeFi Data

def get_crypto_price(token):
    """Get live price for any token."""
    r = requests.get(f"{API}/v1/crypto-feeds/prices?ids={token}&vs=usd",
        headers=H)
    return r.json()
Enter fullscreen mode Exit fullscreen mode

Putting It Together: A Research Agent

Here's a complete agent that uses multiple tools to research a topic:

class ResearchAgent:
    def __init__(self, api_key):
        self.api = "https://agent-gateway-kappa.vercel.app"
        self.headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        }

    def scrape(self, url):
        r = requests.post(f"{self.api}/v1/agent-scraper/scrape",
            headers=self.headers,
            json={"url": url, "format": "markdown"})
        return r.json().get("content", "")

    def screenshot(self, url):
        r = requests.post(f"{self.api}/v1/agent-screenshot/screenshot",
            headers=self.headers,
            json={"url": url, "viewport": "desktop"})
        return r.json()

    def run_code(self, code):
        r = requests.post(f"{self.api}/v1/agent-coderunner/execute",
            headers=self.headers,
            json={"language": "python", "code": code})
        return r.json()

    def generate_pdf(self, html):
        r = requests.post(f"{self.api}/v1/agent-pdfgen/generate",
            headers=self.headers,
            json={"html": html})
        return r.json()

    def research(self, url):
        """Full research pipeline: scrape, analyze, report."""
        # Step 1: Get the content
        content = self.scrape(url)
        print(f"Scraped {len(content)} chars from {url}")

        # Step 2: Analyze with code
        analysis = self.run_code(f"""
text = '''{content[:2000]}'''
words = text.split()
sentences = text.split('.')
print(f'Words: {{len(words)}}')
print(f'Sentences: {{len(sentences)}}')
print(f'Avg words/sentence: {{len(words)//max(len(sentences),1)}}')

# Extract key topics (simple keyword frequency)
from collections import Counter
common = Counter(w.lower() for w in words if len(w) > 5).most_common(10)
print(f'\\nTop keywords:')
for word, count in common:
    print(f'  {{word}}: {{count}}')
""")
        print(f"Analysis: {analysis}")

        # Step 3: Take a screenshot for the report
        screenshot = self.screenshot(url)
        print(f"Screenshot captured")

        # Step 4: Generate a PDF report
        report_html = f\"\"\"
        <h1>Research Report: {url}</h1>
        <h2>Content Summary</h2>
        <pre>{content[:1000]}</pre>
        <h2>Analysis</h2>
        <pre>{analysis}</pre>
        \"\"\"
        pdf = self.generate_pdf(report_html)
        print(f"PDF report generated")

        return {"content": content, "analysis": analysis,
                "screenshot": screenshot, "pdf": pdf}

# Usage
agent = ResearchAgent("gw_your_key_here")
report = agent.research("https://news.ycombinator.com")
Enter fullscreen mode Exit fullscreen mode

For LLM Function Calling

If you're using OpenAI-style function calling, define tools like this:

tools = [
    {
        "type": "function",
        "function": {
            "name": "screenshot",
            "description": "Take a screenshot of a webpage",
            "parameters": {
                "type": "object",
                "properties": {
                    "url": {"type": "string", "description": "URL to screenshot"},
                    "viewport": {"type": "string", "enum": ["desktop", "tablet", "mobile"]}
                },
                "required": ["url"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "scrape",
            "description": "Extract text content from a webpage",
            "parameters": {
                "type": "object",
                "properties": {
                    "url": {"type": "string", "description": "URL to scrape"},
                    "format": {"type": "string", "enum": ["markdown", "html", "json"]}
                },
                "required": ["url"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "execute_code",
            "description": "Run Python or JavaScript code in a sandbox",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {"type": "string"},
                    "language": {"type": "string", "enum": ["python", "javascript"]}
                },
                "required": ["code"]
            }
        }
    }
]
Enter fullscreen mode Exit fullscreen mode

Map function names to API calls, and your LLM can autonomously decide which tools to use.

All 39 Services

The same API key works across all of these:

Category Services
Web Screenshot, Scraper, DNS Lookup, GeoIP, URL Shortener
Crypto Live Prices, Wallet (9 chains), On-chain Analytics, DeFi Data
Dev Tools Code Runner, PDF Generator, File Storage, Webhook Tester
Infrastructure Task Scheduler, Event Bus, Secret Manager, Email, Identity
AI LLM Proxy, Text Transform, Image Processing

Full catalog with interactive demos: api-catalog-three.vercel.app

Pricing

  • Free: 200 credits, no signup
  • Paid: $1 = 1,000 credits (USDC on Base or Monero)
  • AI-native: Supports x402 protocol for automatic per-request payments

Stop building agents that can only talk. Give them tools.