How to Automate Code Documentation with the Claude API and Python

# python# ai# claude# documentation

Jon Ibañez del campo

You just finished a statistics assignment. The code works. The results are correct. Then you read...

You just finished a statistics assignment.

The code works. The results are correct. Then you read the submission requirements and see it: "all functions must be properly documented."

You have twenty functions. No docstrings. Twenty minutes left.

That's the problem this tutorial solves. A Python script that takes any function, sends it to Claude, and gets back a complete docstring — parameters, return values, exceptions, and a usage example. All automatically.

This is part three of a series on building real tools with the Claude API. Parts one and two cover the basics and a code review chatbot. This one is immediately useful.

What We're Building

A script that does one thing well: takes a Python function as input and returns a complete, properly formatted docstring.

Input:  def calculate_pca(data, n_components):
Output: """
        Performs Principal Component Analysis on the input dataset.

        Args:
            data: Input matrix of shape (n_samples, n_features).
            n_components: Number of principal components to return.

        Returns:
            Transformed data of shape (n_samples, n_components).

        Raises:
            ValueError: If n_components exceeds the number of features.

        Example:
            result = calculate_pca(X, n_components=2)
        """

Setup

Same environment from the previous tutorials. If you're starting fresh:

mkdir doc-generator
cd doc-generator
python -m venv venv

Activate:

# Mac/Linux
source venv/bin/activate

# Windows
venv\Scripts\activate

Install:

pip install anthropic python-dotenv

Create your .env:

ANTHROPIC_API_KEY=your-key-here

The Core Function

The idea is simple: send Claude your function and ask for a docstring back. The system prompt does most of the work.

from dotenv import load_dotenv
from anthropic import Anthropic

load_dotenv()
client = Anthropic()

def generate_docstring(function_code: str) -> str:
    """Generate a docstring for a given Python function."""

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system=(
            "You are a Python documentation assistant. "
            "When given a Python function, return only a Google-style docstring for it. "
            "Include: a one-line summary, Args, Returns, Raises (if applicable), and Example. "
            "Return only the docstring text inside triple quotes. No explanation, no extra text."
        ),
        messages=[
            {"role": "user", "content": function_code}
        ]
    )

    return response.content[0].text

The key instruction is "Return only the docstring text inside triple quotes. No explanation, no extra text." Without that, Claude adds commentary around the docstring and you'd need to parse it out.

Testing It

sample_function = """
def calculate_mean(values):
    total = sum(values)
    return total / len(values)
"""

docstring = generate_docstring(sample_function)
print(docstring)

Output:

"""
Calculate the arithmetic mean of a list of values.

Args:
    values: A list of numeric values.

Returns:
    The arithmetic mean as a float.

Raises:
    ZeroDivisionError: If the input list is empty.
    TypeError: If the list contains non-numeric values.

Example:
    mean = calculate_mean([1, 2, 3, 4, 5])
    # Returns: 3.0
"""

Claude infers the edge cases — empty list, wrong types — even though the original function doesn't handle them. That's useful: it documents what the function should handle, not just what it does.

Inserting the Docstring Into the Function

Getting the docstring is half the job. Inserting it automatically into the original function is the other half.

def insert_docstring(function_code: str, docstring: str) -> str:
    """Insert a generated docstring into a function definition."""
    lines = function_code.split("\n")

    # Find the line with the function definition
    for i, line in enumerate(lines):
        if line.strip().startswith("def "):
            # Insert the docstring after the def line
            indent = "    "  # Standard 4-space indent
            docstring_lines = docstring.strip().split("\n")
            indented = [indent + line for line in docstring_lines]
            lines = lines[:i+1] + indented + lines[i+1:]
            break

    return "\n".join(lines)

Test it:

sample_function = """
def calculate_mean(values):
    total = sum(values)
    return total / len(values)
"""

docstring = generate_docstring(sample_function)
documented = insert_docstring(sample_function, docstring)
print(documented)

Output:

def calculate_mean(values):
    """
    Calculate the arithmetic mean of a list of values.

    Args:
        values: A list of numeric values.

    Returns:
        The arithmetic mean as a float.

    Raises:
        ZeroDivisionError: If the input list is empty.
        TypeError: If the list contains non-numeric values.

    Example:
        mean = calculate_mean([1, 2, 3, 4, 5])
        # Returns: 3.0
    """
    total = sum(values)
    return total / len(values)

Processing an Entire File

One function at a time works for quick fixes. For a full assignment with twenty functions, you want to process the whole file at once.

import re

def extract_functions(file_content: str) -> list[str]:
    """Extract all function definitions from a Python file."""
    pattern = r"(def \w+\(.*?\):(?:\n(?:    .+|\s*))*)"
    return re.findall(pattern, file_content, re.MULTILINE)

def document_file(input_path: str, output_path: str) -> None:
    """Read a Python file, document all functions, and save the result."""
    with open(input_path, "r") as f:
        content = f.read()

    functions = extract_functions(content)
    print(f"Found {len(functions)} functions. Generating docstrings...\n")

    documented_content = content

    for i, function in enumerate(functions):
        print(f"Processing function {i+1}/{len(functions)}...")

        # Skip functions that already have docstrings
        if '"""' in function or "'''" in function:
            print(f"  Already documented, skipping.")
            continue

        docstring = generate_docstring(function)
        documented_function = insert_docstring(function, docstring)
        documented_content = documented_content.replace(function, documented_function)

    with open(output_path, "w") as f:
        f.write(documented_content)

    print(f"\nDone. Documented file saved to: {output_path}")

Usage:

document_file("statistics_assignment.py", "statistics_assignment_documented.py")

It reads the original file, skips functions that already have docstrings, generates the missing ones, and saves a new file. The original stays untouched.

The Full Script

import re
from dotenv import load_dotenv
from anthropic import Anthropic, APIError, RateLimitError, APIConnectionError

load_dotenv()
client = Anthropic()

def generate_docstring(function_code: str) -> str:
    try:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            system=(
                "You are a Python documentation assistant. "
                "When given a Python function, return only a Google-style docstring for it. "
                "Include: a one-line summary, Args, Returns, Raises (if applicable), and Example. "
                "Return only the docstring text inside triple quotes. No explanation, no extra text."
            ),
            messages=[{"role": "user", "content": function_code}]
        )
        return response.content[0].text

    except RateLimitError:
        return '"""Rate limit reached. Document this function manually."""'
    except APIConnectionError:
        return '"""Connection failed. Document this function manually."""'
    except APIError as e:
        return f'"""API error {e.status_code}."""'

def insert_docstring(function_code: str, docstring: str) -> str:
    lines = function_code.split("\n")
    for i, line in enumerate(lines):
        if line.strip().startswith("def "):
            indent = "    "
            docstring_lines = docstring.strip().split("\n")
            indented = [indent + line for line in docstring_lines]
            lines = lines[:i+1] + indented + lines[i+1:]
            break
    return "\n".join(lines)

def extract_functions(file_content: str) -> list[str]:
    pattern = r"(def \w+\(.*?\):(?:\n(?:    .+|\s*))*)"
    return re.findall(pattern, file_content, re.MULTILINE)

def document_file(input_path: str, output_path: str) -> None:
    with open(input_path, "r") as f:
        content = f.read()

    functions = extract_functions(content)
    print(f"Found {len(functions)} functions. Generating docstrings...\n")

    documented_content = content

    for i, function in enumerate(functions):
        print(f"Processing function {i+1}/{len(functions)}...")

        if '"""' in function or "'''" in function:
            print("  Already documented, skipping.")
            continue

        docstring = generate_docstring(function)
        documented_function = insert_docstring(function, docstring)
        documented_content = documented_content.replace(function, documented_function)

    with open(output_path, "w") as f:
        f.write(documented_content)

    print(f"\nDone. Saved to: {output_path}")

if __name__ == "__main__":
    document_file("your_file.py", "your_file_documented.py")

Common Mistakes

The system prompt needs to be strict
If you don't tell Claude to return only the docstring, it adds sentences like "Here's the docstring for your function:" — and that ends up in your code. The instruction "No explanation, no extra text" is not optional.

Functions with existing docstrings
The script checks for """ before calling the API. Remove that check and it overwrites documentation you already wrote.

Large files hit rate limits
If you have a file with fifty functions, Claude will process them fast enough to trigger rate limiting. Add a small delay between calls:

import time
time.sleep(0.5)  # Half a second between calls

What's Next

The script handles the common case well. A few natural extensions if you want to take it further:

Support for NumPy-style docstrings instead of Google-style — change the system prompt and the format changes with it.

A command-line interface so you can run python doc_generator.py myfile.py directly from the terminal without editing the script.

Batch processing for entire directories — walk through a project folder and document every .py file at once.

Full Claude API documentation at platform.claude.com/docs.

# One function. One API call. One docstring.

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="Return only a Google-style docstring. No extra text.",
    messages=[{"role": "user", "content": your_function_code}]
)

print(response.content[0].text)