GenAI Isn't Just for Product Teams

# ai# devops# aws
GenAI Isn't Just for Product TeamsAlexander Pazik

Most GenAI use cases today focus on product teams. Build a customer chatbot. Generate marketing copy....

Most GenAI use cases today focus on product teams. Build a customer chatbot. Generate marketing copy. Develop a new product feature.

But DevOps, Site Reliability Engineering (SRE), and Cloud Center of Excellence (CCoE) teams have use cases too. Investigate an incident. Create a runbook. Generate cost optimization recommendations.

These are repetitive tasks that take time away from reliability improvements.

It's not that operations teams don't see the potential of GenAI. They're waiting for something useful — something that fits into their actual workflows, with code they can deploy and evaluate.

The gap is relevance, not readiness. What's missing is:

  • Practical use cases matched to real operational tasks
  • Deployable code samples that are production-ready
  • Flexible patterns that can be customized

The GenAI for Ops Demo Library was created to address this.

Introducing the GenAI for Ops Demo Library

The GenAI for Ops Demo Library is a collection of deployable code samples that demonstrate how generative AI can solve real operational challenges across security, cost optimization, resilience, and automation use cases. You can deploy each demo as-is or customize them to your environment.

There are currently 12 available demos:

Use Case Demos
Security AI-Powered Security Posture with Prowler + DevOps Agent, AI Incident Response Playbook Builder
Cost Optimization AI-Powered Graviton Migration Assessment, AWS GenAI Cost Optimization Kiro Power
Operations Automation AI-Powered Technical Documentation Generation, AI-Powered Legacy System Automation, AI Password Reset Chatbot, AWS Services Lifecycle Tracker, AI Lambda Runtime Migration Assistant
Observability Intelligent EKS Incident Investigation with Amazon DevOps Agent, Intelligent AWS Site-to-Site VPN Tunnel Investigation with Amazon DevOps Agent
Resilience Natural Language Chaos Engineering with AWS FIS

Technical Stack

Each demo is built on AWS services and AI integration patterns familiar to operations teams:

  • Amazon CloudWatch for metrics, logs, and alarms
  • AWS Lambda for serverless compute
  • Amazon Simple Notification Service (SNS) for event routing
  • AWS Cloud Development Kit (CDK) for infrastructure as code
  • Amazon Bedrock and Amazon Nova for foundation model access
  • Amazon Bedrock AgentCore for multi-step AI orchestration
  • Model Context Protocol (MCP) servers for standardized tool integration

Demo Structure

Additionally, each demo includes a deployment guide, technical design document, deployment script(s), and cost estimates with optimization tips.

To show how these demos work in practice, here's a walkthrough of one.

Example: Site-to-Site VPN Tunnel Investigation with AWS DevOps Agent

AWS Site-to-Site VPN tunnels fail for a lot of reasons: pre-shared key mismatches, IKE proposal incompatibilities, dead-peer-detection timeouts, Border Gateway Protocol (BGP) session drops, route withdrawals, throughput degradation. When a tunnel goes down at 2:00 AM, your on-call SRE has to read through CloudWatch metrics, VPN tunnel logs, and IPsec config to figure out what happened. That takes time and negatively impacts your Mean Time to Resolution (MTTR). This demo shows how AWS DevOps Agent autonomously triages these and other incidents, providing root cause analysis and actions for resolution.

Overview

The demo deploys a self-contained VPN environment and creates a DevOps Agent Space to investigate failures automatically.

When a tunnel fails or performance drops, DevOps Agent:

  1. Reads VPN tunnel logs from CloudWatch and correlates metrics across both tunnels
  2. Queries a self-contained MCP server for business context (service dependencies, cost impact, compliance status)
  3. Produces a root cause analysis (RCA) and detailed mitigation plan

Architecture

The demo has three layers:

Network layer

  • An Amazon Virtual Private Cloud (VPC) (10.0.0.0/16) and a simulated on-premises VPC (172.16.0.0/16) linked by a Site-to-Site VPN with two IPsec tunnels
  • An Amazon EC2 instance customer gateway running Libreswan for IPsec and GoBGP for BGP on Amazon Linux 2023

Monitoring layer

  • CloudWatch alarms to monitor the tunnel state, performance, and other failures
  • An SNS topic to trigger a Lambda function that sends a webhook to DevOps Agent

Intelligence layer

  • A DevOps Agent Space for DevOps Agent to access resources and investigate VPN operational issues

How it Works

Tunnel Fails / Performance Degrades
             ↓
  CloudWatch Alarm Changes State
             ↓
    SNS Notification Received
             ↓
     Lambda Function Invoked
             ↓
DevOps Agent Investigation Starts
             ↓
     Investigation Completes
     → Root Cause Identified
     → Remediation Plan Generated
Enter fullscreen mode Exit fullscreen mode

Common Failure Scenarios

The demo includes 10 failure scenarios to inject and watch DevOps Agent investigate:

IKE

  • PSK mismatch (key rotation gone wrong)
  • DPD timeout (firewall blocking IKE traffic)
  • Proposal mismatch (incompatible DH group)
  • Traffic selector mismatch (subnet change breaking BGP)
  • Tunnel shutdown (customer gateway-initiated teardown)

BGP

  • BGP daemon down
  • ASN mismatch after maintenance
  • Hold timer expired (blocked keepalives)

Other

  • BGP route withdrawal (prefix no longer advertised)
  • Throughput degradation (performance drops while tunnels stay up)

The Results

Faster incident resolution. Autonomous investigation of VPN failures and performance degradation reduces MTTR from hours to minutes

Fewer repeat incidents. Targeted recommendations address incident root causes and strengthen VPN tunnel resilience

Greater operational efficiency. Less time spent on repetitive investigations and more time spent on high-value work

Cost Estimate

Each demo is built with AWS Well-Architected Framework Cost Optimization pillar in mind, so running costs stay minimal.

Resource Hourly Cost
VPN connection (1.25 Gbps) $0.05
2× t3.micro EC2 instances $0.03
4× Public IPv4 addresses $0.02
4× CloudWatch alarms < $0.01
Lambda, SNS, CloudWatch < $0.01
Total ~$0.12/hour

This specific demo is designed to be deployed, tested, and torn down. If left running continuously, the monthly cost is estimated to be ~$88/month ($0.12 × 730 hours).

Get Started

  1. Explore: Browse the demo library and choose a demo that aligns with your use case
  2. Try: Deploy the demo in your AWS account
  3. Contribute: Submit a pull request with your demo
  4. Feedback: Take the quick survey and share your feedback