How I Audited My Infra After the LiteLLM Supply Chain Attack (And What I'm Doing Differently Now)

How I Audited My Infra After the LiteLLM Supply Chain Attack (And What I'm Doing Differently Now)

# security# python# ai# devops
How I Audited My Infra After the LiteLLM Supply Chain Attack (And What I'm Doing Differently Now)Jay

I woke up to a Slack thread on March 24, 2026, that made my stomach drop. LiteLLM, the Python proxy...

I woke up to a Slack thread on March 24, 2026, that made my stomach drop. LiteLLM, the Python proxy I'd been running to route LLM calls across providers, had been backdoored with credential-stealing malware. Versions 1.82.7 and 1.82.8, published by a threat actor called TeamPCP, contained a three-stage payload that harvested SSH keys, cloud credentials, Kubernetes secrets, and cryptocurrency wallets. PyPI quarantined the entire package.

What surprised me was the targeting. LiteLLM is literally an API key management gateway. It holds credentials for every LLM provider your org uses. If you wanted to compromise one package to get access to everything, this was the perfect pick.

This wasn't a one-off either. It was the third hit in a five-day campaign. Aqua Security's Trivy scanner got compromised on March 19 (GHSA-69fq-xp46-6x23). Checkmarx's KICS GitHub Actions followed on March 23 (kics-github-action#152, Checkmarx Update). LiteLLM was the final target on March 24 (litellm#24512, LiteLLM Update). The attack chain was present in 36% of cloud environments, often pulled in as a transitive dependency through agent frameworks nobody audited.

How the Attack Chain Worked

As confirmed in LiteLLM's official security update, the project's CI/CD pipeline ran Trivy without pinning to a specific version. When the compromised Trivy action executed inside LiteLLM's GitHub Actions runner, it exfiltrated the PYPI_PUBLISH token. TeamPCP used that stolen token to push malicious releases directly to PyPI.

Version 1.82.7 embedded the payload in proxy/proxy_server.py, firing on import. Version 1.82.8 was worse: it included a .pth file called litellm_init.pth that executed on every Python process startup, regardless of whether you ever imported LiteLLM. Python's site module processes all .pth files in site-packages during interpreter initialization, as documented in the GitHub issue.

The payload used double base64 encoding:

import os, subprocess, sys
subprocess.Popen([
    sys.executable, "-c",
    "import base64; exec(base64.b64decode('...'))"
])
Enter fullscreen mode Exit fullscreen mode

Once running, it executed three stages. Stage 1 harvested credentials: SSH keys, AWS/GCP/Azure tokens, environment variables, .env files, Kubernetes configs, Docker configs, database credentials, shell history, browser cookies, and cryptocurrency wallets. Stage 2 deployed privileged Alpine pods into the kube-system namespace on every reachable Kubernetes node, grabbing cluster secrets and service account tokens. Stage 3 installed sysmon.py as a systemd service that polled checkmarx[.]zone/raw for additional payloads, giving the attacker persistent access even after discovery.

All stolen data was encrypted and POSTed to models.litellm[.]cloud, a lookalike domain controlled by TeamPCP.

The Blast Radius Was Bigger Than I Expected

The .pth execution model is what made this particularly nasty. On any machine where LiteLLM 1.82.8 was installed, the malware fired every time Python started. Not when you imported the package. Not when you used the proxy. Every single Python process.

That means a data scientist running Jupyter, a DevOps engineer running Ansible, a backend dev spinning up a Flask server: all compromised if the package sat anywhere in their Python environment. The malware just ran silently alongside whatever they were actually doing.

Here's the part that really got me: you didn't need to install it yourself. If any package in your dependency tree pulled LiteLLM in, the payload still executed. As reported in GitHub issue #24512, the researcher who found this discovered it because their Cursor IDE pulled LiteLLM through an MCP plugin without any manual installation.

I checked my own environment and found LiteLLM listed in the Required-by field for a framework I'd installed months ago. I had no idea it was there.

How I Checked If I Was Affected

Here's what I ran across my local machine, CI/CD runners, Docker images, and staging:

pip show litellm | grep Version
pip cache list litellm
find / -name "litellm_init.pth" 2>/dev/null
Enter fullscreen mode Exit fullscreen mode

Then I scanned egress logs. Any traffic to models.litellm[.]cloud or checkmarx[.]zone means confirmed exfiltration:

# CloudWatch
fields @timestamp, @message
| filter @message like /models\.litellm\.cloud|checkmarx\.zone/

# Nginx
grep -E "models\.litellm\.cloud|checkmarx\.zone" /var/log/nginx/access.log
Enter fullscreen mode Exit fullscreen mode

And checked for transitive installation:

pip show litellm  # Check "Required-by" field
Enter fullscreen mode Exit fullscreen mode

If other packages list LiteLLM there, it entered your environment as a transitive dependency without your knowledge.

The Incident Response Steps I Followed

Step 1: Kill everything immediately. Stop all LiteLLM containers and scale Kubernetes deployments to zero:

docker ps | grep litellm | awk '{print $1}' | xargs docker kill
kubectl scale deployment litellm-proxy --replicas=0 -n your-namespace
Enter fullscreen mode Exit fullscreen mode

Step 2: Rotate every credential on affected machines. The malware harvested everything it could reach. I treated the following as fully compromised: cloud provider tokens (AWS access keys, GCP service account keys, Azure AD tokens), all SSH keys in ~/.ssh/, database passwords and connection strings from .env files, every LLM provider API key (OpenAI, Anthropic, Gemini), Kubernetes service accounts and CI/CD tokens, and any crypto wallet files present on the machine. If you have crypto wallets on an affected host, move funds immediately.

Step 3: Hunt for persistence artifacts. The malware planted privileged pods in Kubernetes and installed a systemd backdoor. Check for both:

# Check for lateral movement
kubectl get pods -n kube-system | grep -i "node-setup"
find / -name "sysmon.py" 2>/dev/null

# Full removal
pip uninstall litellm -y && pip cache purge
rm -rf ~/.cache/uv
find $(python -c "import site; print(site.getsitepackages()[0])") \
    -name "litellm_init.pth" -delete
rm -rf ~/.config/sysmon/ ~/.config/systemd/user/sysmon.service
docker build --no-cache -t your-image:clean .
Enter fullscreen mode Exit fullscreen mode

Do not downgrade to a previous version. Remove entirely and replace.

The Deeper Problem with Self-Hosted Python Proxies

I've been thinking about this since the cleanup, and honestly, the structural issue here goes beyond one compromised package.

LiteLLM's Python proxy pulls in hundreds of transitive dependencies: ML frameworks, data processing libraries, provider SDKs. Every one of those is a trust decision most teams make automatically with pip install --upgrade. When you add LiteLLM, you're not just trusting LiteLLM. You're trusting every package it depends on, every package those depend on, and every maintainer account tied to each one.

The .pth attack vector is especially concerning because most supply chain scanning tools focus on setup.py, __init__.py, and defined entry points. The .pth mechanism is a legitimate Python feature for path configuration that has been completely overlooked as an injection vector. I expect this technique to show up in future attacks. Traditional scanning would not have caught it.

There's also a response-time problem. The LiteLLM maintainers didn't rotate their CI/CD credentials for five days after the Trivy disclosure on March 19. If the maintainers couldn't react fast enough, downstream teams had no realistic chance. When you self-host, you inherit the blast radius.

What I Moved To (And Why)

After cleaning up, I needed to replace the routing layer. The options I evaluated fell into two buckets: self-hosted alternatives (which carry the same dependency tree risk) and managed gateways (which eliminate it).

I ended up switching to a managed gateway approach. Prism (by Future AGI) is one example of this pattern. Instead of installing a Python package to route requests, you point your OpenAI SDK at a managed endpoint. Your attack surface goes from an entire Python environment with hundreds of dependencies to an API key and a URL.

The migration was a two-line change:

Before (LiteLLM):

from litellm import completion
response = completion(model="gpt-5", messages=[{"role": "user", "content": "Hello"}])
Enter fullscreen mode Exit fullscreen mode

After (managed gateway):

from openai import OpenAI
client = OpenAI(base_url="https://gateway.futureagi.com", api_key="sk-prism-your-key")
response = client.chat.completions.create(
    model="gpt-5", messages=[{"role": "user", "content": "Hello"}]
)
Enter fullscreen mode Exit fullscreen mode

Same OpenAI SDK format, same model names, same response schema. TypeScript works identically:

import OpenAI from "openai";
const client = new OpenAI({
    baseURL: "https://gateway.futureagi.com",
    apiKey: "sk-prism-your-key"
});
const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello" }]
});
Enter fullscreen mode Exit fullscreen mode

Provider keys sit in the gateway dashboard instead of .env files scattered across developer machines. You can read the full docs for setup details.

For Kubernetes deployments, the swap is just environment variables:

env:
  - name: LLM_BASE_URL
    value: "https://gateway.futureagi.com"  # was http://litellm-proxy:4000
  - name: LLM_API_KEY
    value: "sk-prism-your-key"
Enter fullscreen mode Exit fullscreen mode

Then delete the LiteLLM pod, its service, Postgres, and Redis. That's infrastructure you no longer maintain or patch.

One feature I didn't expect to use heavily is semantic caching. It matches queries that mean the same thing but use different wording, so "What is your return policy?" and "How do I return an item?" hit the same cache entry. Cached responses come back with X-Prism-Cost: 0.

from prism import Prism, GatewayConfig, CacheConfig
client = Prism(
    api_key="sk-prism-your-key",
    base_url="https://gateway.futureagi.com",
    config=GatewayConfig(
        cache=CacheConfig(enabled=True, mode="semantic", ttl="5m", namespace="prod"),
    ),
)
Enter fullscreen mode Exit fullscreen mode

The gateway also applies guardrails (PII detection, prompt injection prevention) at the routing layer before requests reach the provider. That's 18+ checks I previously didn't have at all.

What This Means Going Forward

The EU Cyber Resilience Act now holds organizations legally responsible for the security of open-source components in their products. SOC 2 Type II audits scrutinize dependency management. "We pull the latest from PyPI" won't pass a controls review anymore. If your product ran LiteLLM and customer credentials were exfiltrated, the liability is yours, not the maintainer's. For background on AI compliance and LLM security, Future AGI has an enterprise breakdown worth reading.

Dependency pinning alone doesn't fix this. Pinning prevents pulling a new malicious version but not a compromised maintainer overwriting an existing tag. Hash verification (pip install --hash=sha256:<exact_hash>) is the real control, though adoption is low because the tooling is painful.

Every team running LLM applications now faces a clear architectural choice: self-host and inherit the full supply chain risk, or use a managed gateway and shrink the trust boundary to an API endpoint. After March 24, the risk math changed.

I spent two days rotating credentials and auditing Kubernetes pods because of a package I didn't even know was in my dependency tree. I'd rather spend that time shipping features.