Give Your AI Agent Superpowers: Screenshots, Scraping, Code Execution, and 36 More APIs

# ai# python# machinelearning# tutorial

Ozor

Most AI agents are chatbots with extra steps. They can generate text, but they can't do anything....

Most AI agents are chatbots with extra steps. They can generate text, but they can't do anything. They can't take screenshots, look up DNS records, check crypto prices, run code, or generate PDFs.

Here's how to fix that with one API key and a few Python functions.

The Problem

Your LLM can reason, but it has no tools. It can say "I'll check the website" but it can't actually load a webpage. It can say "let me calculate" but it can't run code. It's stuck generating text about actions instead of taking them.

The Solution: A Tool API

One API key gives your agent 39 real-world capabilities:

curl -X POST https://agent-gateway-kappa.vercel.app/api/keys/create

{"key": "gw_abc123...", "credits": 200}

No signup. 200 free calls. Here's what your agent can now do.

Tool 1: See the Web (Screenshots)

import requests

API = "https://agent-gateway-kappa.vercel.app"
KEY = "gw_your_key_here"
H = {"Authorization": f"Bearer {KEY}", "Content-Type": "application/json"}

def take_screenshot(url, viewport="desktop"):
    """Give your agent eyes — screenshot any webpage."""
    r = requests.post(f"{API}/v1/agent-screenshot/screenshot",
        headers=H, json={"url": url, "viewport": viewport})
    return r.json()

Your agent can now visually verify deployments, check competitor sites, or capture evidence.

Tool 2: Read the Web (Scraping)

def scrape_page(url, fmt="markdown"):
    """Extract clean content from any URL."""
    r = requests.post(f"{API}/v1/agent-scraper/scrape",
        headers=H, json={"url": url, "format": fmt})
    return r.json()

Returns clean markdown — no nav bars, no ads. Feed this directly into your LLM's context.

Tool 3: Run Code (Sandboxed)

def execute_code(code, language="python"):
    """Run code in a secure sandbox."""
    r = requests.post(f"{API}/v1/agent-coderunner/execute",
        headers=H, json={"language": language, "code": code})
    return r.json()

Your agent can now write AND run code. Data analysis, calculations, file processing — all sandboxed.

Tool 4: Look Up IPs and Domains

def geolocate(ip):
    """Get location, ISP, timezone for any IP."""
    r = requests.get(f"{API}/v1/agent-geo/geo/{ip}", headers=H)
    return r.json()

def dns_lookup(domain):
    """Resolve DNS records for any domain."""
    r = requests.get(f"{API}/v1/agent-dns/resolve/{domain}", headers=H)
    return r.json()

Tool 5: Crypto & DeFi Data

def get_crypto_price(token):
    """Get live price for any token."""
    r = requests.get(f"{API}/v1/crypto-feeds/prices?ids={token}&vs=usd",
        headers=H)
    return r.json()

Putting It Together: A Research Agent

Here's a complete agent that uses multiple tools to research a topic:

class ResearchAgent:
    def __init__(self, api_key):
        self.api = "https://agent-gateway-kappa.vercel.app"
        self.headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        }

    def scrape(self, url):
        r = requests.post(f"{self.api}/v1/agent-scraper/scrape",
            headers=self.headers,
            json={"url": url, "format": "markdown"})
        return r.json().get("content", "")

    def screenshot(self, url):
        r = requests.post(f"{self.api}/v1/agent-screenshot/screenshot",
            headers=self.headers,
            json={"url": url, "viewport": "desktop"})
        return r.json()

    def run_code(self, code):
        r = requests.post(f"{self.api}/v1/agent-coderunner/execute",
            headers=self.headers,
            json={"language": "python", "code": code})
        return r.json()

    def generate_pdf(self, html):
        r = requests.post(f"{self.api}/v1/agent-pdfgen/generate",
            headers=self.headers,
            json={"html": html})
        return r.json()

    def research(self, url):
        """Full research pipeline: scrape, analyze, report."""
        # Step 1: Get the content
        content = self.scrape(url)
        print(f"Scraped {len(content)} chars from {url}")

        # Step 2: Analyze with code
        analysis = self.run_code(f"""
text = '''{content[:2000]}'''
words = text.split()
sentences = text.split('.')
print(f'Words: {{len(words)}}')
print(f'Sentences: {{len(sentences)}}')
print(f'Avg words/sentence: {{len(words)//max(len(sentences),1)}}')

# Extract key topics (simple keyword frequency)
from collections import Counter
common = Counter(w.lower() for w in words if len(w) > 5).most_common(10)
print(f'\\nTop keywords:')
for word, count in common:
    print(f'  {{word}}: {{count}}')
""")
        print(f"Analysis: {analysis}")

        # Step 3: Take a screenshot for the report
        screenshot = self.screenshot(url)
        print(f"Screenshot captured")

        # Step 4: Generate a PDF report
        report_html = f\"\"\"
        <h1>Research Report: {url}</h1>
        <h2>Content Summary</h2>
        <pre>{content[:1000]}</pre>
        <h2>Analysis</h2>
        <pre>{analysis}</pre>
        \"\"\"
        pdf = self.generate_pdf(report_html)
        print(f"PDF report generated")

        return {"content": content, "analysis": analysis,
                "screenshot": screenshot, "pdf": pdf}

# Usage
agent = ResearchAgent("gw_your_key_here")
report = agent.research("https://news.ycombinator.com")

For LLM Function Calling

If you're using OpenAI-style function calling, define tools like this:

tools = [
    {
        "type": "function",
        "function": {
            "name": "screenshot",
            "description": "Take a screenshot of a webpage",
            "parameters": {
                "type": "object",
                "properties": {
                    "url": {"type": "string", "description": "URL to screenshot"},
                    "viewport": {"type": "string", "enum": ["desktop", "tablet", "mobile"]}
                },
                "required": ["url"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "scrape",
            "description": "Extract text content from a webpage",
            "parameters": {
                "type": "object",
                "properties": {
                    "url": {"type": "string", "description": "URL to scrape"},
                    "format": {"type": "string", "enum": ["markdown", "html", "json"]}
                },
                "required": ["url"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "execute_code",
            "description": "Run Python or JavaScript code in a sandbox",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {"type": "string"},
                    "language": {"type": "string", "enum": ["python", "javascript"]}
                },
                "required": ["code"]
            }
        }
    }
]

Map function names to API calls, and your LLM can autonomously decide which tools to use.

All 39 Services

The same API key works across all of these:

Category	Services
Web	Screenshot, Scraper, DNS Lookup, GeoIP, URL Shortener
Crypto	Live Prices, Wallet (9 chains), On-chain Analytics, DeFi Data
Dev Tools	Code Runner, PDF Generator, File Storage, Webhook Tester
Infrastructure	Task Scheduler, Event Bus, Secret Manager, Email, Identity
AI	LLM Proxy, Text Transform, Image Processing

Full catalog with interactive demos: api-catalog-three.vercel.app

Pricing

Free: 200 credits, no signup
Paid: $1 = 1,000 credits (USDC on Base or Monero)
AI-native: Supports x402 protocol for automatic per-request payments

Stop building agents that can only talk. Give them tools.