I Built Tautest: A Mutation Testing Workflow for AI-Written Tests

# testing# opensource# ai# javascript

Can Bilmez

Tautest is an open-source CLI and GitHub Action that uses StrykerJS to catch weak AI-written tests and generate AI-ready fix prompts.

AI coding agents are getting really good at writing tests.

But I kept running into one uncomfortable problem:

Passing tests do not always mean strong tests.

Sometimes an AI agent writes tests that pass, but those tests only confirm that the current implementation runs. They do not necessarily prove that the behavior is protected.

That is why I built Tautest.

Tautest is an open-source CLI and GitHub Action that runs mutation testing on changed source lines, finds weak tests, and generates an AI-ready fix prompt for Claude Code, Cursor, Codex, or human reviewers.

GitHub:

https://github.com/canblmz1/tautest

npm package:

https://www.npmjs.com/package/tautest

The problem

Let’s say your code has a condition like this:

if (age >= 65) {
  return subtotal * 0.2;
}

Your normal tests might pass.

But what if this condition is mutated to:

if (age > 65) {
  return subtotal * 0.2;
}

If your tests still pass, then the exact boundary at 65 is not protected.

That is a weak test.

This is the kind of thing Tautest is designed to expose.

Demo

Regular tests pass, but Tautest finds a surviving mutant that the tests missed. After adding the missing boundary test, the mutation score improves to 100%.

What Tautest does

Tautest is not a mutation testing engine.

It uses StrykerJS as the mutation testing engine and adds a workflow layer around it.

Tautest:

reads changed source lines from git diff
runs StrykerJS mutation testing only on those changed lines
parses surviving mutants
generates Markdown, JSON, and terminal reports
writes an AI-ready fix prompt
can post a sticky GitHub PR comment
supports Vitest
has Jest beta support

The goal is simple:

Do not just ask whether the tests pass. Ask whether the tests fail when the behavior is mutated.

Example output

A regular test run can be green:

Test Files  1 passed
Tests       3 passed

But Tautest can still find a surviving mutant:

Tautest: MIXED (75.00%, threshold 60.00%)
Killed: 3 | Survived: 1 | No coverage: 0

Top surviving mutants:
- src/discount.ts:2 EqualityOperator

The surviving mutant:

age >= 65  ->  age > 65

After adding the missing boundary test:

it("applies the senior discount at exactly 65", () => {
  expect(calculateDiscount(65, 80)).toBe(16);
});

Tautest reports:

Tautest: STRONG (100.00%, threshold 60.00%)
Killed: 4 | Survived: 0

The AI fix prompt workflow

One thing I wanted Tautest to do was help AI coding agents write better tests without letting them rewrite production code.

So Tautest generates a file:

.tautest/fix-prompt.md

The prompt includes rules like:

do not change production code
only edit or add test files
every new test must pass against the original code
every new test must fail against the mutant behavior
do not weaken existing assertions
do not write filler tests like expect(true).toBe(true)

The workflow becomes:

Run Tautest.
Open .tautest/fix-prompt.md.
Paste it into Claude Code, Cursor, Codex, or use it yourself.
Add the missing test.
Run your normal tests.
Run Tautest again.

Install

For Vitest projects:

pnpm add -D tautest @stryker-mutator/core @stryker-mutator/vitest-runner
pnpm exec tautest init --yes --runner vitest --no-install
pnpm exec tautest doctor
pnpm exec tautest run --base origin/main

For Jest projects, Jest support is currently beta:

pnpm add -D tautest @stryker-mutator/core @stryker-mutator/jest-runner
pnpm exec tautest init --yes --runner jest --no-install

GitHub Action usage

Tautest also ships with a GitHub Action that can run on pull requests and post a sticky PR comment.

name: Tautest

on:
  pull_request:

permissions:
  contents: read
  pull-requests: write

jobs:
  tautest:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - uses: actions/setup-node@v4
        with:
          node-version: 20

      - uses: pnpm/action-setup@v4
        with:
          version: 10

      - run: pnpm install --frozen-lockfile
      - run: pnpm build

      - uses: canblmz1/tautest/packages/github-action@v1
        with:
          base: ${{ github.base_ref }}
          threshold: 60
          comment: changes
          cache: true

Important notes:

fetch-depth: 0 is required because Tautest needs git history.
pull-requests: write is required for sticky PR comments.
The v1 action currently ships from the monorepo path.

What Tautest does not do

Tautest is intentionally limited.

It does not:

implement its own mutation engine
replace StrykerJS
call any LLM API
prove that your tests are perfect
fully support monorepos in v1
classify AI-written tests with certainty

It is a deterministic workflow:

changed source lines -> mutation testing -> surviving mutants -> report -> fix prompt

Why I built it

AI coding agents are useful, but I do not want to blindly trust generated tests.

I wanted a workflow where an AI agent can write or improve tests, but a deterministic tool checks whether those tests actually protect behavior.

That is the main idea behind Tautest.

It is not:

AI wrote tests, so trust them.

It is:

AI wrote tests, now mutate the changed code and see whether those tests actually fail.

Current status

Tautest v1.0.0 is published.

Validated before v1:

tautest@1.0.0 published
@tautest/core@1.0.0 published
Release Readiness workflow passed
source-changing PR smoke passed
mutation run completed in GitHub Actions
JSON output parsed
sticky PR comment create and update verified
artifact upload verified

Roadmap

Some things I want to improve next:

Node 24 GitHub Action runtime migration
better cache observability
monorepo beta support
standalone GitHub Action repo
PR line annotations
more Jest fixtures