The UK Government Just Merged This Open-Source AI Security Benchmark Into Their National Evaluation Framework

# security# ai# opensource# python

Vaishnavi Gudur

What Happened Last month, the UK Government's AI Safety Institute merged AgentThreatBench...

What Happened

Last month, the UK Government's AI Safety Institute merged AgentThreatBench into their official inspect_evals framework — the same framework they use to evaluate frontier AI models from OpenAI, Anthropic, and Google DeepMind.

AgentThreatBench is an open-source adversarial benchmark I built that contains 200+ attack payloads specifically designed to test whether AI agents can resist memory poisoning attacks.

Why This Matters

AI agents are increasingly being deployed with persistent memory — they remember past conversations, user preferences, and context across sessions. This creates a new attack surface: memory poisoning.

An attacker who can inject malicious content into an agent's memory can:

Exfiltrate sensitive data on subsequent sessions
Override safety instructions persistently
Manipulate agent behavior without the user's knowledge

The OWASP Agentic Security Initiative identified this as ASI06 — Agent Memory Poisoning.

What AgentThreatBench Tests

The benchmark covers 5 attack categories:

Category	Payloads	Description
Prompt Injection	40+	Instructions disguised as memory content
Protected Key Tampering	40+	Attempts to overwrite system-level keys
Sensitive Data Leakage	40+	PII/credential exfiltration via memory
Size Anomaly	40+	Memory inflation / resource exhaustion
Behavioral Drift	40+	Gradual personality/instruction shifts

How to Use It

pip install agentthreatbench

# Run the full benchmark against your agent
atb run --target your_agent_endpoint --output results.json

# Or use individual attack categories
atb run --category prompt_injection --target your_agent_endpoint

The BEIS Validation

The UK Government's AI Safety Institute uses inspect_evals to:

Evaluate frontier models before deployment decisions
Benchmark safety mitigations across providers
Track regression in safety properties over time