
ArshTechProIf you have ever pushed code to production and quietly hoped nobody probes it too hard, this project...
If you have ever pushed code to production and quietly hoped nobody probes it too hard, this project is for you.
Strix is an open-source tool that runs autonomous AI agents against your application the way a real attacker would: it spins up your app, pokes at it, tries to break in, and proves whether a vulnerability is real before it tells you about it. No SaaS lock-in required, no waiting weeks for a pentest firm to get back to you.
In this article I will walk through what Strix actually does, how it is different from the static analysis tools you already have, and how to run your first scan.
Most security tooling developers use day to day falls into two buckets:
Neither of these actually runs your application and tries to exploit it. That's traditionally been the job of a human penetration tester, and hiring one is slow and expensive.
Strix tries to close that gap by giving AI agents an actual hacker's toolkit and letting them attack a running instance of your app, the same way a person would.
Strix agents come with:
Instead of one agent doing everything, Strix uses a "graph of agents" model, so multiple specialized agents can work in parallel on different parts of your app and share what they find with each other.
Critically, when Strix reports a vulnerability, it comes with an actual proof-of-concept demonstrating that the exploit works, not just a pattern match that says "this line looks suspicious." That is the main thing that separates it from a linter with a security label on it.
It can detect things like:
You need two things before you start:
Install it:
curl -sSL https://strix.ai/install | bash
Configure which model powers the agents:
export STRIX_LLM="openai/gpt-5.4"
export LLM_API_KEY="your-api-key"
Then point it at a target:
strix --target ./app-directory
That's it. On first run it will pull the sandbox Docker image, then start testing. Results land in strix_runs/<run-name>.
Strix will also remember your config after the first run, saving it to ~/.strix/cli-config.json so you don't have to set environment variables every time.
Strix isn't limited to scanning a directory on your machine. A few other ways to point it at something:
# Review a GitHub repo directly
strix --target https://github.com/org/repo
# Black-box test a live deployed app
strix --target https://your-app.com
# Test both the source code and the deployed app together
strix -t https://github.com/org/app -t https://your-app.com
If your app needs a login, you can hand Strix credentials and let it test authenticated flows:
strix --target https://your-app.com \
--instruction "Perform authenticated testing using credentials: user:pass"
You can also steer it toward specific concerns instead of a generic sweep:
strix --target api.your-app.com \
--instruction "Focus on business logic flaws and IDOR vulnerabilities"
For more detailed rules of engagement, scope, or exclusions, hand it a file instead of a one-liner:
strix --target api.your-app.com --instruction-file ./instruction.md
If you want to run Strix as part of an automated job rather than interactively, use non-interactive mode:
strix -n --target https://your-app.com
It prints findings in real time and exits with a non-zero status code if it finds vulnerabilities, which makes it straightforward to fail a build on real findings.
This is where Strix gets genuinely useful for a team: instead of running a security scan occasionally, you run it on every pull request and block insecure code before it merges.
A minimal GitHub Actions setup:
name: strix-penetration-test
on:
pull_request:
jobs:
security-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Install Strix
run: curl -sSL https://strix.ai/install | bash
- name: Run Strix
env:
STRIX_LLM: ${{ secrets.STRIX_LLM }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
run: strix -n -t ./ --scan-mode quick
A couple of practical notes if you're setting this up:
fetch-depth: 0 in the checkout step. On PR runs, Strix automatically scopes a quick review to just the changed files, and it needs full git history to resolve that diff correctly. If it can't, pass --diff-base explicitly.export STRIX_LLM="openai/gpt-5.4"
export LLM_API_KEY="your-api-key"
# Optional: point at a local model instead
export LLM_API_BASE="your-api-base-url"
# Optional: enables search capability for the agents
export PERPLEXITY_API_KEY="your-api-key"
# Optional: control how much the model "thinks"
export STRIX_REASONING_EFFORT="high" # default is high; quick scans use medium
As for which model to run it with, the project currently recommends:
openai/gpt-5.4)anthropic/claude-sonnet-4-6)vertex_ai/gemini-3-pro-preview)It also supports Vertex AI, Bedrock, Azure, and local models — worth checking the LLM providers docs if you want to run something self-hosted.
A reasonable way to think about where Strix fits alongside what you probably already run:
| Tool type | What it catches | What it misses |
|---|---|---|
| SAST / linters | Risky code patterns | Runtime behavior, business logic |
| Dependency scanners | Known CVEs in packages | Bugs in your own code |
| Strix | Exploitable, validated vulnerabilities with a working PoC | Anything outside the scope you give it |
Strix is not a replacement for code review or a full professional pentest on a critical system, but as a fast, repeatable layer that actually tries to exploit your app before an attacker does, it fills a gap that static tooling structurally can't.
If you want to run this against something disposable first rather than your production app, that's a reasonable way to get a feel for it — spin up a small local project, point Strix at it, and see what it turns up.
One important note directly from the maintainers: only test applications you own or have explicit permission to test. You are responsible for using it ethically and legally.