Go Fuzzing: How to Find the Bugs Your Tests Will Never Catch

There’s a specific category of bug that unit tests almost never find: the one that only appears on input you never thought to write. You test happy paths, a few error paths, maybe some edge cases you remember from last time something broke. Then production gets a 47-byte payload that panics your parser, and you spend a Friday evening reading a stack trace.

Fuzzing is the answer to that, and Go has had a native fuzzer since 1.18. It’s not a third-party tool, not a CI plugin that needs a separate budget approval — it’s go test -fuzz. Most teams haven’t touched it. That’s a mistake.

This article covers how to write fuzz targets that actually surface real bugs, not just how to get the syntax right. We’ll focus on the places where bugs hide: parsing boundaries, integer arithmetic edges, encoding round-trips, and invariant violations across state machines.

Official docs: https://go.dev/doc/fuzz

Why your current tests don’t cover this

Unit tests encode what you already know. You write TestParseIPv4 and you test "192.168.1.1", "", "999.0.0.1". Those are all cases you thought of. The fuzzer doesn’t think — it mutates, and it does so guided by coverage feedback. It knows when a new byte combination reaches a branch nobody’s hit before, and it keeps pulling that thread.

The class of bugs fuzzing finds is different:

Off-by-one panics in slice indexing inside parsers
Integer overflows in length-prefix protocols
Infinite loops caused by malformed recursive structures
Inconsistency between Marshal and Unmarshal — they don’t roundtrip cleanly
Panics reachable only through specific multi-field combinations

These aren’t exotic. They show up in real codebases constantly.

The basics: anatomy of a fuzz target

A fuzz test is a normal Go test file, with one constraint: the test function takes *testing.F instead of *testing.T.

// fuzz_test.go
package parser

import (
    "testing"
)

func FuzzParseRecord(f *testing.F) {
    // Seed corpus — real examples the fuzzer mutates from
    f.Add([]byte("name=Alice;age=30"))
    f.Add([]byte(""))
    f.Add([]byte("name=;age=0"))

    f.Fuzz(func(t *testing.T, data []byte) {
        // The fuzzer calls this with your seeds and mutations
        rec, err := ParseRecord(data)
        if err != nil {
            // Errors are fine — panics are not
            return
        }
        // Invariant: if we parsed successfully, re-encoding must work
        _ = rec.Encode()
    })
}

Run it:

go test -fuzz=FuzzParseRecord -fuzztime=60s

The -fuzztime flag is important. Without it, the fuzzer runs forever. In CI, set a time budget. In a dedicated fuzzing session, let it run for hours.

When it finds a crash, it saves the input to testdata/fuzz/FuzzParseRecord/ — and from that point forward, go test (without -fuzz) will replay that input as a regression test automatically. That’s the corpus, and it’s worth committing.

Targeting boundaries that actually matter

The generic advice is "fuzz your parsers." That’s true but underspecified. Here’s a more surgical breakdown of where to aim.

1. Length-prefix and framing protocols

If you’re parsing any binary format where a field encodes the length of what follows — protobuf, MessagePack, your own wire format — that’s where panics hide.

// A naive length-prefix reader — intentionally broken for demonstration
func ReadFrame(data []byte) ([]byte, error) {
    if len(data) < 4 {
        return nil, errors.New("too short")
    }
    length := binary.BigEndian.Uint32(data[:4])
    // BUG: no bounds check — if length > len(data)-4, this panics
    return data[4 : 4+length], nil
}

func FuzzReadFrame(f *testing.F) {
    f.Add([]byte{0, 0, 0, 5, 'h', 'e', 'l', 'l', 'o'})
    f.Add([]byte{0, 0, 0, 0})
    f.Add([]byte{255, 255, 255, 255}) // length = 4GB — classic

    f.Fuzz(func(t *testing.T, data []byte) {
        _, _ = ReadFrame(data)
        // Any panic here is a bug
    })
}

The fuzzer will almost immediately generate a payload where the declared length exceeds the actual data. If your code panics instead of returning an error, you have a bug.

The fix is always the same: validate that 4+length <= uint32(len(data)) before slicing. But the point is that fuzzing finds this without you having to reason about it — you just write the harness and let it run.

2. String parsing with ambiguous delimiters

CSV, HTTP headers, URL parsing, config file formats — anything where the same byte can be a delimiter or data depending on context is fertile ground.

// A simplified key=value parser
func ParseKV(input string) (map[string]string, error) {
    result := make(map[string]string)
    for _, pair := range strings.Split(input, "&") {
        if pair == "" {
            continue
        }
        parts := strings.SplitN(pair, "=", 2)
        if len(parts) != 2 {
            return nil, fmt.Errorf("invalid pair: %q", pair)
        }
        result[parts[0]] = parts[1]
    }
    return result, nil
}

func FuzzParseKV(f *testing.F) {
    f.Add("a=1&b=2")
    f.Add("key=val=ue")   // value contains delimiter
    f.Add("=empty_key")
    f.Add("&&&")

    f.Fuzz(func(t *testing.T, input string) {
        result, err := ParseKV(input)
        if err != nil || result == nil {
            return
        }
        // Invariant: every key must be non-empty
        for k := range result {
            if k == "" {
                t.Errorf("empty key produced from input %q", input)
            }
        }
    })
}

Notice the invariant check. Fuzzing without assertions is much weaker — you’ll only catch panics. When you add invariant checks, you catch logic bugs too: wrong output from technically non-panicking code.

3. Encoder/decoder round-trips

Any time you have Encode and Decode, the fuzzer can verify that they’re true inverses of each other.

func FuzzJSONRoundTrip(f *testing.F) {
    f.Add(`{"name":"Alice","age":30}`)
    f.Add(`{}`)
    f.Add(`{"unicode":"\u0000"}`)

    f.Fuzz(func(t *testing.T, input string) {
        var first map[string]any
        if err := json.Unmarshal([]byte(input), &first); err != nil {
            return // invalid JSON, skip
        }

        // Re-encode
        encoded, err := json.Marshal(first)
        if err != nil {
            t.Errorf("Marshal failed on valid JSON: %v", err)
            return
        }

        // Re-decode
        var second map[string]any
        if err := json.Unmarshal(encoded, &second); err != nil {
            t.Errorf("second Unmarshal failed: %v", err)
            return
        }

        // Deep equality check
        if !reflect.DeepEqual(first, second) {
            t.Errorf("round-trip mismatch:\n  first:  %v\n  second: %v", first, second)
        }
    })
}

This pattern works for any codec: protobuf, CBOR, MessagePack, your custom binary format. If Decode(Encode(x)) != x, something’s wrong.

4. Integer arithmetic and overflow

This one catches people off-guard because Go doesn’t have implicit overflow exceptions — it wraps silently. If your code does math based on user-controlled integers, the fuzzer will find the wrap.

// Allocates a buffer based on user-supplied dimensions
func NewGrid(width, height int) ([]byte, error) {
    if width <= 0 || height <= 0 {
        return nil, errors.New("dimensions must be positive")
    }
    // BUG: width*height can overflow int on 32-bit or with large values
    size := width * height
    if size <= 0 {
        return nil, errors.New("overflow detected") // You need this check
    }
    return make([]byte, size), nil
}

func FuzzNewGrid(f *testing.F) {
    f.Add(10, 10)
    f.Add(1, 1)
    f.Add(0, 5)

    f.Fuzz(func(t *testing.T, width, height int) {
        buf, err := NewGrid(width, height)
        if err != nil {
            return
        }
        // If we got a buffer, it should have the right size
        if len(buf) != width*height {
            t.Errorf("buffer size mismatch")
        }
    })
}

A fuzzer running on a 64-bit system with int as 64-bit will quickly throw values like math.MaxInt32 + 1 at width and height and watch width * height wrap negative. Your make([]byte, size) will then panic or allocate nothing useful.

5. State machines and multi-step protocols

Some bugs only appear after a specific sequence of operations, not on a single input. You can fuzz sequences too.

type Operation struct {
    Op  byte   // 0=push, 1=pop, 2=peek
    Val int32
}

func FuzzStack(f *testing.F) {
    // Encode a sequence of ops as a byte slice: [op, val(4 bytes), op, val, ...]
    f.Add([]byte{0, 0, 0, 0, 42, 1, 0, 0, 0, 0}) // push 42, pop

    f.Fuzz(func(t *testing.T, data []byte) {
        s := NewStack()
        const stride = 5
        for i := 0; i+stride <= len(data); i += stride {
            op := data[i]
            val := int32(binary.BigEndian.Uint32(data[i+1 : i+5]))
            switch op % 3 {
            case 0:
                s.Push(val)
            case 1:
                s.Pop() // must not panic on empty stack
            case 2:
                s.Peek() // must not panic on empty stack
            }
        }
        // Invariant: size is always non-negative
        if s.Len() < 0 {
            t.Error("stack size went negative")
        }
    })
}

This approach — encoding a sequence of operations into a single byte slice — is a standard technique for fuzzing stateful systems.

Gotchas

Gotcha: the fuzzer is only as good as your seed corpus. If you seed it with only valid, well-formed inputs, it’ll mostly explore valid space. Seed it with boundary cases: empty inputs, max-length inputs, inputs with every delimiter at every position, null bytes, high Unicode codepoints. The fuzzer mutates from seeds — bad seeds mean slow progress.

Gotcha: t.Error vs panic. The fuzzer catches panics automatically. It does not automatically catch wrong output. If your parser silently returns garbage on malformed input, the fuzzer won’t notice unless you write an assertion. Most missed bugs are in this category.

Gotcha: the corpus file format is not raw bytes. Files under testdata/fuzz/ use a text encoding — they’re not the raw bytes your function receives. Don’t try to write them by hand. Let the fuzzer generate them, then commit the useful ones.

Gotcha: -fuzztime is wall time, not CPU time. On a multicore machine, the fuzzer runs one goroutine per GOMAXPROCS. go test -fuzz=. -fuzztime=60s -parallel=8 gives you 8x the coverage per 60 seconds. Worth knowing when you’re on a beefy CI machine.

Gotcha: go test without -fuzz replays the corpus but doesn’t fuzz. This is correct behavior — the corpus is a regression suite. But it means finding new bugs requires running with -fuzz. Don’t confuse "corpus replay passes" with "the code is fuzz-clean."

Gotcha: don’t fuzz non-deterministic functions. If your function produces different output for the same input (random IDs, timestamps, nonces), invariant-based fuzzing will produce false positives constantly. Either mock out the non-determinism or don’t check round-trip equality for those fields.

Production-ready practices

Run fuzzing in CI with a time budget. A 30-second fuzz run in CI catches a surprising amount — mostly because developers never run it locally at all. Set it up like this:

# .github/workflows/fuzz.yml
name: Fuzz

on: [push, pull_request]

jobs:
  fuzz:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with:
          go-version: '1.22'
      - name: Run fuzz targets
        run: |
          go test ./... -fuzz=FuzzParseRecord -fuzztime=30s
          go test ./... -fuzz=FuzzReadFrame   -fuzztime=30s

Commit your corpus. The files under testdata/fuzz/ are regression tests. Every interesting input the fuzzer finds becomes a permanent test case. Commit them alongside your code. This means the fuzzer’s discoveries accumulate over time instead of being lost.

Separate long-running fuzzing from CI. For security-critical code — parsers, crypto helpers, deserialization — run dedicated overnight fuzzing sessions on a separate machine or in a scheduled job. Instrument with go test -fuzz=. -fuzztime=8h. Treat corpus additions from those runs as separate PRs.

Write tight invariants, not just panic guards. The difference between a weak fuzz target and a strong one is the quality of assertions inside the closure. "Doesn’t panic" catches crashes. "Decoded value matches re-encoded value" catches correctness bugs. "Parsed struct fields satisfy constraint X" catches logic bugs. All three categories are worth your time.

Use f.Add generously with pathological inputs. For string parsers:

f.Add("")
f.Add("\x00")
f.Add(strings.Repeat("A", 65536)) // max-ish length
f.Add("ÄÖÜ")                      // multibyte UTF-8
f.Add("\r\n\r\n")                  // HTTP-adjacent
f.Add("../../../etc/passwd")       // path traversal seed
f.Add("<script>alert(1)</script>") // injection seed

Each of these primes the fuzzer to explore a different region of input space.

Profile allocations if the fuzzer OOMs. A fuzzer running a tight loop can expose quadratic allocations fast. If your machine starts swapping, add a runtime/debug.FreeOSMemory() call periodically in long-running harnesses, or just check: does your parser allocate proportionally to input size, or does it blow up?

Putting it together: a real-world target

Here’s a complete fuzz harness for a hypothetical line protocol — the kind of thing you’d write for a metrics ingestion endpoint:

// protocol: "metric_name value timestamp\n"
// e.g.: "cpu.load 0.87 1716633600\n"

package lineproto

import (
    "bytes"
    "testing"
)

func FuzzParseLine(f *testing.F) {
    f.Add([]byte("cpu.load 0.87 1716633600\n"))
    f.Add([]byte("mem.used 1048576 1716633600\n"))
    f.Add([]byte("\n"))
    f.Add([]byte(""))
    f.Add([]byte("no-spaces-here\n"))
    f.Add([]byte("a b c d\n"))                    // too many fields
    f.Add([]byte("metric NaN 0\n"))               // non-numeric value
    f.Add([]byte(string([]byte{0x00, 0x01}) + "\n")) // binary in name

    f.Fuzz(func(t *testing.T, data []byte) {
        // Must not panic
        m, err := ParseLine(data)
        if err != nil {
            return
        }

        // Invariant 1: name is non-empty and printable ASCII
        for _, b := range []byte(m.Name) {
            if b < 0x20 || b > 0x7e {
                t.Errorf("non-printable byte 0x%02x in metric name from input %q", b, data)
            }
        }

        // Invariant 2: re-serialization is consistent
        serialized := m.Serialize()
        m2, err := ParseLine(serialized)
        if err != nil {
            t.Errorf("re-parse of serialized metric failed: %v\ninput: %q\nserialized: %q",
                err, data, serialized)
            return
        }

        if m.Name != m2.Name || m.Value != m2.Value || m.Timestamp != m2.Timestamp {
            t.Errorf("round-trip mismatch:\n  original: %+v\n  reparsed: %+v", m, m2)
        }

        // Invariant 3: serialized form ends with newline
        if !bytes.HasSuffix(serialized, []byte("\n")) {
            t.Errorf("serialized metric does not end with newline: %q", serialized)
        }
    })
}

This single harness covers three distinct bug classes: panics, character validation failures, and round-trip inconsistencies. A developer writing unit tests would need to enumerate every case manually. The fuzzer finds them systematically.

When fuzzing isn’t the right tool

Fuzzing is excellent for: anything that parses external input, encoders/decoders, state machines driven by external data, and arithmetic on untrusted values.

It’s less useful for: pure business logic with no external input, code that’s primarily about orchestration or side effects, and functions where the output depends on external state you can’t control. For those, property-based testing with a library like rapid is often a better fit — it gives you more expressive generators while still exploring the input space beyond what you’d write by hand.

The two tools complement each other. Fuzzing for the raw "can this input blow up my code" question. Property-based testing for "does this code satisfy a mathematical property across all inputs."

Final thought

The teams that get the most from Go’s fuzzer are the ones that treat it like a first-class testing tool, not an afterthought. You write fuzz targets when you write the code, you commit the corpus, and you run it in CI. When the fuzzer finds a crash, you don’t just fix the specific bug — you look at what class of input triggered it, strengthen the bounds checks, and add more seeds so the fuzzer keeps exploring that region.

It’s the closest thing to automated adversarial testing available in a standard Go install. Use it.