How to Deploy a Gradio App on AWS — Two Approaches Compared

# aws# machinelearning# python# tutorial

HatmanStack

Gradio makes it easy to build ML demo interfaces, but deploying them to production is another story....

Gradio makes it easy to build ML demo interfaces, but deploying them to production is another story. Hosting platforms like HuggingFace Spaces work for prototyping, but when builds start failing due to dependency drift and you need reliability, you need your own infrastructure.

In this tutorial, you'll learn two ways to deploy a Gradio application on AWS:

App Runner — an always-on managed service ($0.12/day)
Lambda with container images — a serverless, pay-per-use approach (pennies per invocation)

Both approaches use real configuration files from a working deployment. By the end, you'll understand the cost and architectural tradeoffs well enough to choose the right one for your project.

Prerequisites

An AWS account
A working Gradio application
Basic familiarity with Docker and AWS services

Option 1: AWS App Runner

App Runner is a managed service for web applications and containers. You point it at a repository or container registry, and it handles scaling, load balancing, and TLS. Most of the configuration lives in an apprunner.yaml file in your repo's root directory:

  version: 1.0
  runtime: python312
  run:
    pre-run:
      - echo "Installing dependencies..."
      - pip3 install --upgrade pip
      - pip3 install -r requirements.txt
    command: python3 app.py
    network:
      port: 7860
      env: APP_PORT
    env:
      - name: GRADIO_SERVER_NAME
        value: "0.0.0.0"
      - name: GRADIO_SERVER_PORT
        value: "7860"
      - name: RATE_LIMIT
        value: "20"
    secrets:
      - name: MY_SECRET
        value-from: "arn:aws:secretsmanager:us-west-2:<your-secret-arn>"

Nothing about your Gradio code changes. The custom configuration lets you specify Python 3.12 and other settings not available through the App Runner console.

App Runner Cost Breakdown

You're billed $0.0064 per vCPU-hour and $0.007 per GB-hour. You can scale down to 0.25 vCPU and 0.5 GB of memory, which works out to roughly $0.12 per day for an always-on service that auto-scales under load.

One thing to remember: grant the Instance Security Role permissions to communicate with other AWS services. If your Gradio app calls Bedrock, Secrets Manager, or S3, you need to add those permissions to the container's security role — not just the deployment role.

Option 2: AWS Lambda with Container Images

While App Runner's always-on cost is reasonable, Lambda would be ideal for applications with bursty or infrequent traffic. Lambda has a 250 MB size limit for combined layers and deployment packages, and aggressively trimming Gradio's dependencies to fit a zip deployment isn't practical.

Instead, you can use a container image with the https://github.com/awslabs/aws-lambda-web-adapter, which lets Lambda run any HTTP application — including Gradio.

You need two files in your repo: a Dockerfile and a buildspec.yaml for CodeBuild.

  The Dockerfile

  FROM public.ecr.aws/docker/library/python:3.12-slim

  WORKDIR /app

  COPY requirements.txt .
  RUN pip install --no-cache-dir --upgrade pip
  RUN pip install --no-cache-dir -r requirements.txt

  COPY . .

The Lambda Web Adapter translates Lambda invocations into HTTP requests

  COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.9.0 /lambda-adapter /opt/extensions/lambda-adapter

  CMD ["python3", "app.py"]

The key line is the COPY --from that pulls in the Lambda Web Adapter. This adapter sits between Lambda's invocation model and your HTTP application, translating Lambda events into standard HTTP requests that Gradio understands.

The CodeBuild Spec

  version: 0.2

  env:
    variables:
      AWS_REGION: "us-west-2"
      AWS_ACCOUNT_ID: "<your-account-id>"
      IMAGE_REPO_NAME: "production/gradio-demo"
      IMAGE_TAG: "latest"

  phases:
    pre_build:
      commands:
        - echo Logging in to Amazon ECR...
        - aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com

    build:
      commands:
        - IMAGE_URI=$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG
        - docker build -t $IMAGE_URI .

    post_build:
      commands:
        - docker push $IMAGE_URI

Before running the build, create a repository in ECR to store the container image. In CodeBuild, create a new project using an S3 bucket or GitHub as the source, and select an EC2 compute environment (not Lambda compute — Lambda build containers don't include Docker).

Configuring Gradio for Lambda

One critical change: your Gradio app must listen on 0.0.0.0 port 8080 so the Lambda Web Adapter can route traffic to it. Update your launch call:

  server_port = int(os.environ.get("AWS_LAMBDA_HTTP_PORT", 8080))
  demo.launch(server_name="0.0.0.0", server_port=server_port)

Deploying the Lambda Function

In Lambda, create a new function using the container image approach. Select your ECR image and enable a Function URL — that's all you need to get the Gradio app accessible over HTTPS.

Lambda Cost Optimization

AWS recommends running container images with 2048 or 4096 MB of memory, but Gradio typically consumes 125–300 MB during operation. Setting the Lambda function to 512 MB works well and provides a buffer.

Here's how the costs compare:


  ┌────────────────────────────────────────────┬────────────────────┐
  │               Configuration                │        Cost        │
  ├────────────────────────────────────────────┼────────────────────┤
  │ 4096 MB, always-on (EventBridge keep-warm) │ ~$5.76/day         │
  ├────────────────────────────────────────────┼────────────────────┤
  │ 512 MB, always-on (EventBridge keep-warm)  │ ~$0.71/day         │
  ├────────────────────────────────────────────┼────────────────────┤
  │ 512 MB, on-demand (cold starts)            │ ~$0.002/invocation │
  └────────────────────────────────────────────┴────────────────────┘

The tradeoff with on-demand is cold starts — a few extra seconds on the first request. But once the Gradio frontend loads, successive calls just grab container role credentials and are fast.

Handling Cold Starts

The Lambda Web Adapter checks your app's readiness by polling / (the root path). Gradio and FastAPI provide a dedicated health endpoint at /healthz that responds faster during startup. Set this in your Lambda environment variables:

  AWS_LWA_READINESS_CHECK_PATH=/healthz

This reduces the chance of the adapter timing out before your app finishes initializing.

Which Should You Choose?


  ┌──────────────────┬────────────────┬───────────────────────────────────┐
  │      Factor      │   App Runner   │              Lambda               │
  ├──────────────────┼────────────────┼───────────────────────────────────┤
  │ Cost (always-on) │ ~$0.12/day     │ ~$0.71/day (512 MB)               │
  ├──────────────────┼────────────────┼───────────────────────────────────┤
  │ Cost (on-demand) │ Not applicable │ Pennies per invocation            │
  ├──────────────────┼────────────────┼───────────────────────────────────┤
  │ Cold starts      │ None           │ A few seconds                     │
  ├──────────────────┼────────────────┼───────────────────────────────────┤
  │ Scaling          │ Automatic      │ Automatic                         │
  ├──────────────────┼────────────────┼───────────────────────────────────┤
  │ Setup complexity │ Lower          │ Higher (Docker + CodeBuild + ECR) │
  └──────────────────┴────────────────┴───────────────────────────────────┘

Choose App Runner if you want simplicity, consistent response times, and your app gets steady traffic.

Choose Lambda if your traffic is bursty, you want to minimize costs during idle periods, or you're already invested in the serverless ecosystem.

Either service works well for Gradio deployments. The right choice depends on your traffic pattern and how much you care about cold starts versus idle costs.