Your Test Suite Is Lying to You: Mutation Testing in Rust with cargo-mutants

You hit 80% code coverage. You ship. Three weeks later, production is on fire because someone changed a comparison from > to >= and every test still passed.

Coverage metrics are a proxy for confidence, not the real thing. A line can be executed and not verified — if your test calls a function but never asserts on the result, the coverage tool counts it as covered. That is not a test. That is a lie wearing a test’s clothes.

Mutation testing is the antidote. Instead of asking "did our tests run this code?", it asks "would our tests notice if this code were wrong?" The tool corrupts your source one small change at a time and checks whether your test suite catches the sabotage. If it doesn’t, you have a gap.

For Rust, the tool that does this well is cargo-mutants. It’s fast, ergonomic, and integrates cleanly into CI. This article walks through everything from first run to production-grade integration — including the parts that will trip you up.

Project repository: https://github.com/sourcefrog/cargo-mutants

Why Mutation Testing Over Coverage?

Let me show you the problem concretely. Here’s a function and its "test":

pub fn is_admin(role: &str) -> bool {
    role == "admin"
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_is_admin() {
        is_admin("admin"); // look ma, 100% coverage
    }
}

That test contributes to your coverage report. It executes is_admin. But it asserts nothing. A mutation testing tool will change role == "admin" to role != "admin" and rerun the tests. They’ll pass. Busted.

The mutation survived. That’s a missed mutant — the kind of bug that slips through review, past CI, and into prod.

How cargo-mutants Works

cargo-mutants copies your source tree to a scratch directory, applies a single mutation to one file, runs cargo test, and records whether the tests caught it. It does this for every mutation it can generate across the whole codebase.

The mutations it applies include:

Replacing return values with defaults (true→false, 0→1, None, String::new())
Replacing boolean expressions with literals
Deleting entire function bodies (returning the type’s default)
Swapping arithmetic and comparison operators

It does not require you to instrument your code or change your build setup. It works on your existing cargo test command, which means it respects your existing test configuration, features, and workspace structure.

Installation

cargo install cargo-mutants

That’s it. No system dependencies, no daemon, no config required to get started.

If you’re on a machine where compile times matter (they will), consider checking that your build cache is warm before the first run — cargo-mutants compiles your project for each surviving mutant candidate.

Your First Run

Navigate to your Rust project and run:

cargo mutants

You’ll see output like:

Found 47 mutants to test
ok       Unmutated baseline in 3.2s
MISSED   src/auth.rs:14:5: replace is_admin -> bool with true
MISSED   src/auth.rs:14:5: replace is_admin -> bool with false
caught   src/parser.rs:82:9: replace parse_header -> Option<Header> with None
caught   src/parser.rs:101:5: replace validate -> bool with true
...

Caught means your tests detected the mutation — good. Missed means the mutation survived — bad. Unviable means the mutated code didn’t compile (cargo-mutants skips these, they’re expected noise from type-system mutations).

After the run, you’ll find a mutants.out/ directory in your project root containing:

mutants.out/
  mutants.json        # machine-readable full results
  missed.txt          # just the missed mutants — your priority list
  caught.txt
  unviable.txt
  outcomes.json       # timing + outcome per mutant

Start with missed.txt. Every entry is a question: why didn’t your tests catch this?

Reading the Output and Prioritizing

Not every missed mutant needs a fix. You have to think about what the code actually does.

A missed mutant on a logging function (replace log_event -> () with ()) is probably fine — you might not want to test logging behavior. A missed mutant on an authorization check is a five-alarm fire.

The critical discipline here is triage by risk, not blanket coverage. Look at what the function does. Ask: if this mutation shipped to production, what breaks? If the answer is "nothing visible to users", move on. If the answer is "access control, data integrity, money", write the test.

Cargo-mutants tells you the file, line, and exact mutation. Use that to navigate directly to the code and decide.

Configuration

For anything beyond a toy project, you’ll want a cargo-mutants.toml in your repo root (or configure via [mutants] in .cargo/config.toml).

# cargo-mutants.toml

# How many parallel test jobs to run — defaults to number of CPUs
# Lower this if you're memory-constrained or running in CI
jobs = 4

# Timeout per mutant in seconds — prevent hangs from mutations that
# cause infinite loops
timeout = 60

# Skip these paths entirely — generated code, vendored deps, build scripts
exclude_globs = [
    "src/generated/**",
    "build.rs",
    "benches/**",
]

# Only mutate these paths (useful when iterating on a specific module)
# examine_globs = ["src/auth/**", "src/payments/**"]

# Additional cargo test flags passed through
# Useful for running only unit tests (faster) vs. full integration suite
cargo_test_args = ["--lib"]

A few things worth calling out:

timeout is not optional in practice. If a mutation produces an infinite loop (changing a loop bound to a constant, for example), cargo-mutants will hang forever without this. Set it to 2–3× your normal test suite runtime.

jobs controls parallelism. Each job compiles and runs the full test suite. On a 4-core machine running a large project, 2 jobs is often the sweet spot — you get parallelism without thrashing the compiler cache.

cargo_test_args lets you skip integration tests for the mutation pass. Integration tests are slow and cargo-mutants runs the suite hundreds of times. Use --lib to run only unit tests during mutation testing, then run integration tests separately.

Targeting Specific Files or Functions

Running cargo-mutants on a full codebase can take 30–90 minutes for a medium-sized project. During active development, you want to target just what you’re working on.

# Mutate only the auth module
cargo mutants --file src/auth.rs

# Mutate a specific function
cargo mutants --file src/auth.rs --function is_admin

This workflow pairs well with the development loop: write code, write tests, run targeted mutation testing, fix gaps, commit.

Gotchas

Compilation cache cold starts. The first run on a fresh machine compiles every mutant from scratch. This can be brutally slow. Make sure your ~/.cargo/registry is populated and your target/ directory is warm before you care about timing. In CI, this means caching the target/ directory between runs — without it, you’re recompiling the world hundreds of times.

Unviable mutants aren’t free. cargo-mutants still has to attempt each unviable mutant to discover it won’t compile. On large codebases, these compile failures add up. If you see a high unviable count and slow runs, exclude_globs on files with heavy generic/macro usage can help.

Integration tests and side effects. If your integration tests write to disk, call external services, or require a database, running them inside cargo-mutants will cause chaos at scale. Segregate test types and use cargo_test_args to control which ones run during mutation testing.

False confidence from #[should_panic]. A test that expects a panic will pass if the mutated code panics for a different reason. This is a known limitation — don’t rely on #[should_panic] for the behavior you care most about.

Mutations in fn main are often unviable or low-value. Binary entry points usually don’t have interesting logic that benefits from mutation testing. Add src/main.rs to exclude_globs if it’s just bootstrapping.

Integrating into CI

The goal in CI is not to block every PR on mutation score — that’s too strict and kills developer velocity. The goal is to surface regressions and provide signal.

Here’s a GitHub Actions workflow that runs cargo-mutants on PRs and uploads results as an artifact:

# .github/workflows/mutation-tests.yml
name: Mutation Testing

on:
  pull_request:
    branches: [main]
  schedule:
    # Run nightly on main — full suite, not PR-scoped
    - cron: '0 2 * * *'

jobs:
  mutants:
    name: cargo-mutants
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Install Rust toolchain
        uses: dtolnay/rust-toolchain@stable

      - name: Cache Cargo registry and build artifacts
        uses: actions/cache@v4
        with:
          path: |
            ~/.cargo/registry
            ~/.cargo/git
            target/
          key: ${{ runner.os }}-cargo-mutants-${{ hashFiles('**/Cargo.lock') }}
          restore-keys: |
            ${{ runner.os }}-cargo-mutants-

      - name: Install cargo-mutants
        run: cargo install cargo-mutants --locked

      - name: Run mutation tests
        # Don't fail the build — treat as informational for now
        # Change `continue-on-error` to `false` once you've cleared the backlog
        continue-on-error: true
        run: cargo mutants --jobs 2 --timeout 60 -- --lib

      - name: Upload mutation results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: mutants-out
          path: mutants.out/
          retention-days: 14

The continue-on-error: true is intentional for brownfield projects. You probably have a pile of missed mutants on day one. Blocking all PRs immediately creates friction and resentment. Upload the results, review them async, write tests for the critical ones, and tighten the gate once your baseline is clean.

For greenfield projects or high-assurance modules, drop continue-on-error and fail the build on any missed mutant in the files you care about:

cargo mutants --file src/auth.rs --timeout 60 -- --lib
# Exit code 1 if any mutants are missed in that file

Production-Ready Practices

Baseline your score, then protect it. Run cargo-mutants on your main branch, capture the missed mutant count, and write it into CI. Reject PRs that increase the missed count. This prevents death by a thousand cuts without requiring you to fix everything up front.

Prioritize by blast radius, not alphabetical order. Start fixing missed mutants in authentication, authorization, data validation, and financial logic. The logger can wait.

Treat missed mutants as a code smell, not a metric. If you have a function with many missed mutants, the problem might not be missing assertions — it might be that the function is too complex and needs breaking apart. Mutation testing is also a design tool.

Lock cargo-mutants version in CI. Use cargo install cargo-mutants --version X.Y.Z --locked to prevent surprise behavior changes between CI runs. The --locked flag enforces the Cargo.lock inside cargo-mutants itself.

Don’t mutate test code. cargo-mutants ignores #[cfg(test)] blocks by default. This is correct behavior — you’re testing your production code’s logical correctness, not the tests themselves.

A Real Example: Fixing a Missed Mutant

Here’s what the fix cycle looks like. You see in missed.txt:

MISSED   src/billing.rs:34:5: replace charge_user -> Result<(), BillingError> with Ok(())

You open billing.rs:34. The function calls your payment processor and returns a result. You grep your test suite — there’s a test that calls charge_user, but it only checks that the call doesn’t panic. It doesn’t assert the return value or verify the payment processor was called.

You add an assertion:

#[test]
fn charge_user_returns_err_on_declined_card() {
    let mut mock_processor = MockPaymentProcessor::new();
    mock_processor
        .expect_charge()
        .returning(|_| Err(ProcessorError::Declined));

    let result = charge_user(&mock_processor, &test_invoice());

    assert!(result.is_err());
    assert_matches!(result.unwrap_err(), BillingError::PaymentDeclined);
}

Rerun: caught. You’ve now proven your test suite would detect a broken return path. That’s the entire workflow, applied to the things that matter.

The Bigger Picture

Mutation testing doesn’t replace unit testing, integration testing, or code review. It augments them by asking a harder question than any of them individually: is the behavior of this code actually pinned down by the tests, or are we just hoping?

For most production Rust projects, a run on the critical modules will reveal 5–15% missed mutants in places that genuinely matter. That’s not a failure — it’s intelligence you didn’t have before. Now you know where to focus.

cargo-mutants is fast enough to use practically, stable enough to trust in CI, and produces actionable output. There’s no good reason not to run it on anything you’re shipping.