Understanding Fuzzer Architecture Through a Working Toy Example

Written by Reflare Research Team | Jun 17, 2026 12:36:16 PM

What does it take to build a working fuzzer? Through a simple Python tool, a vulnerable C parser, and three planted bugs, this toy example explains the architecture, feedback loops, and limits behind modern bug-finding.

All good architecture needs a good kicking.

Fuzzing is a testing technique that feeds a program a flood of generated inputs and watches for the ones that make it misbehave: crash, corrupt memory, hang, or trip an assertion. Instead of writing test cases by hand to cover the situations a developer thought of, a fuzzer runs the target over and over against inputs nobody anticipated, and modern fuzzers use feedback from each run to steer the next one toward code they haven't exercised yet. It's one of the most productive bug-finding methods in use, and tools built on it have turned up tens of thousands of defects in browsers, kernels, and libraries.

The goal here is to show how one is built through a small but complete toy example: a fuzzer in a few hundred lines of Python, paired with a short C program containing three planted memory-safety bugs for it to find. The fuzzer finds all three bugs, but not all the same way, and that difference turns out to be the most instructive part: one bug falls to the basic technique, and the other two need something most fuzzers don't have.

The architecture

A naive fuzzer is a loop: make a random input, run the program, check for a crash, repeat. That catches shallow bugs, and historically it caught a lot of them, but it stalls on anything deeper, because nothing tells it it's making progress. An input that slips past a program's first validation check and an input that gets rejected immediately look identical to it, so the search never builds on what worked. It keeps rolling dice from scratch, which is fine for a bug sitting near the surface but hopeless for one guarded behind a specific check the dice rarely satisfy.

The idea that makes modern fuzzing effective is to instrument the target so each run reports which parts of the program it reached. An input that reaches new code is saved and mutated further; one that reaches nothing new is thrown away. The signal grading each input is its coverage, and over many iterations the saved inputs drift from shallow behavior toward deep behavior: from being rejected at the first check to being accepted, parsed, and acted upon. This grey-box model is the basis of tools like AFL, libFuzzer, and honggfuzz, and almost all of them are built from the same handful of parts.

Coverage comes from instrumentation baked into the target, recording what executed. After each run the fuzzer compares that record against everything seen in the campaign so far, and an input that reached something new is the one worth keeping. The usual granularity is control-flow edges, the transitions between basic blocks; the coarser option is which lines ran.

The corpus is the set of inputs the fuzzer has kept, each one having reached some coverage nothing before it did. It begins from a few user-supplied seeds and grows as the search unlocks new code. It serves as the fuzzer's memory of interesting behavior and as the pool of parents for new candidates.

The mutation engine produces new candidates by altering existing ones: flipping bits, setting bytes, doing arithmetic on byte values, inserting boundary constants, deleting chunks, or splicing two inputs together. A dictionary of meaningful byte strings is a common addition, spliced in to satisfy checks that random edits would rarely produce.

The execution engine runs the target on each candidate. Throughput matters here, because the number of bugs found scales with how many candidates get tried, so real fuzzers spend a lot of effort making each run cheap.

The crash oracle decides when something actually went wrong. Not every bug crashes on its own; a memory error can silently return garbage. Sanitizers, compiler-inserted runtime checks, turn these quiet faults into loud, immediate aborts the fuzzer can detect, and they are the standard way to catch memory bugs.

Triage handles the output. A campaign produces many crashing inputs that often trace back to a few underlying bugs, so the crashes get deduplicated into distinct defects, each with a reproducing input.

The central loop joins these: pick a parent from the corpus, mutate it, run it, check the oracle, read the coverage. New coverage adds the candidate to the corpus; a crash sends it to triage; anything else is dropped. Then repeat, often millions of times. That loop, with those parts, is what almost all production fuzzing is, and it is enough to find most bugs.

Where mutation hits a wall

The loop has a blind spot. Coverage works as a guide only when getting closer to interesting code shows up as new coverage, and some checks offer no such gradient. The clearest case is an exact comparison against a wide value, where execution proceeds only if many bytes are all correct at once. Getting some of them right lands in exactly the same place as getting none of them right: the comparison fails either way, the same branch is taken, and the coverage is identical. There is no partial credit, so the search has nothing to climb, and since hitting every byte at once by chance is astronomically unlikely, the fuzzer simply sits in front of the check forever.

Fuzzers that handle this reach for extra techniques, none of which is part of the basic loop. The common and lightweight ones stay inside the mutation model. One is a compiler transformation that breaks a wide comparison into a sequence of narrower ones, so that getting the first part right does reach new code and restores a gradient to climb; AFL++ does this with a pass called laf-intel. Another watches comparisons as the program runs, captures the value being compared against, and feeds it into the dictionary, so the constant is discovered rather than guessed; this is AFL++'s CMP-log. Both are standard equipment in a modern mutation fuzzer.

There is a heavier option, much less common, and it is the one this example uses: concolic execution. It is not a normal part of a fuzzer. The vast majority of real-world fuzzing, including essentially all of OSS-Fuzz, is pure coverage-guided mutation with no solver involved, because symbolic reasoning is slow and scales poorly. A concolic stage pairs the mutation fuzzer with a symbolic-execution engine: when mutation stalls, the engine treats the input bytes as unknowns, walks a path through the program collecting the conditions each branch imposes, and a constraint solver computes a concrete input satisfying all of them. It does not guess the value behind a hard check; it derives it from the program's own logic. Pairing the two is called hybrid fuzzing, an approach popularized by the Driller research tool (source on GitHub). This example uses it because it puts the hard-comparison problem and its most general solution in one place, but a typical fuzzer would lean on the lighter techniques above and keep a solver as a last resort.

The two files

The example has two parts. target.c is a parser for a made-up binary format with three planted memory-safety bugs inside it; it plays the role of the program under test. fuzzer.py is the fuzzer: it builds the target, generates inputs by mutation, falls back to a concolic stage when mutation stalls, checks the crash oracle, reads back coverage, and keeps what makes progress. The rest of this article walks through both in full.

The target: `target.c`

target.c is a parser compiled with AddressSanitizer (-fsanitize=address), which surrounds allocations with guarded regions and checks every memory access as it happens. The moment the program reads or writes memory it shouldn't, ASAN stops it, prints the violation type and the exact source line, and exits non-zero. That abort is the crash oracle from the architecture above, and it means every crash in this article is a genuine out-of-bounds access rather than a caught exception.

How input gets in

The bottom of the file is the harness, the part that turns a file on disk into a call into the parser:

int main(int argc, char **argv) {
    if (argc < 2) { fprintf(stderr, "usage: %s <file>\n", argv[0]); return 2; }
    FILE *f = fopen(argv[1], "rb");
    if (!f) return 2;
    static unsigned char buf[4096];
    long n = (long)fread(buf, 1, sizeof(buf), f);
    fclose(f);
    parse(buf, n);
    return 0;
}

It takes a filename on the command line, reads up to 4096 bytes into a buffer, and hands the buffer and its length to parse. That file-on-disk interface is why the fuzzer works by writing each candidate to a file and running the binary on it. A return code of 2 means the file couldn't be opened, which is a usage problem, not a crash; an ordinary run returns 0, and a crashing run never reaches the return 0 because ASAN has already aborted.

The header checks

parse starts by validating a fixed header, and these checks are the reason the fuzzer has real work to do:

static void parse(const unsigned char *data, long n) {
    if (n < 1 || data[0] != 'F') return;          /* magic   */
    if (n < 2 || data[1] != 1)   return;          /* version */
    if (n < 3)                   return;

    unsigned char rtype = data[2];

The first byte must be 'F', the second must be 1, and there must be a third byte to select the record type. Each check has two parts: a length test (n < 1) so the parser never reads past the input it was given, and a value test (data[0] != 'F'). Anything failing these returns immediately. The bugs all live past this header, so a fuzzer that can't produce a valid 'F', 0x01, type prefix never reaches a single one of them. The third byte, rtype, picks which of three buggy paths runs.

Bug 1: heap buffer overflow (type `'A'`)

    if (rtype == 'A') {
        if (n < 4) return;
        unsigned char length = data[3];
        unsigned char *buf = malloc(8);
        const unsigned char *payload = data + 4;
        long avail = n - 4;
        for (int i = 0; i < length; i++) {
            buf[i] = (i < avail) ? payload[i] : 0;   /* heap-buffer-overflow */
        }
        free(buf);
    }

After a guard ensuring there's a length byte at all, the code reads length from the input, allocates an eight-byte buffer, and copies length bytes into it. The conditional (i < avail) ? payload[i] : 0 is careful to avoid reading past the input, zero-filling once the payload runs out, so the bug isn't an out-of-bounds read of the input. The bug is the write: buf[i] for any i of eight or more runs off the end of an eight-byte allocation. Since length comes straight from the input, any length above eight overflows the heap buffer. Of the three bugs this is the shallowest: a valid header, type 'A', and almost any length triggers it, which is why mutation alone will find it.

Bug 2: use-after-free (type `'B'`)

    else if (rtype == 'B') {
        if (n < 4) return;
        unsigned char length = data[3];
        char *p = malloc(16);
        strcpy(p, "hello");
        free(p);
        if (length == 7) {
            p[0] = 'X';                              /* use-after-free */
        }
    }

The path allocates sixteen bytes, fills them, and frees the allocation. Then, only if the length byte is exactly seven, it writes through p, which now points at freed memory. That write is a use-after-free, and ASAN flags it. The gate is the exact comparison length == 7. Being a single byte, it isn't out of reach for mutation in principle, one value in 256, but it's specific enough that mutation takes a long time to stumble onto, and in the run below it's the concolic stage that produces it instantly instead.

Bug 3: stack buffer overflow (type `'C'`)

Type 'C' holds the bug that motivates the whole second half of the architecture, the one mutation cannot reach.

    else if (rtype == 'C') {
        if (n < 8) return;
        if (data[4] != 0xde)
            return;
        if (data[5] != 0xad)
            return;
        if (data[6] != 0xbe)
            return;
        if (data[7] != 0xef)
            return;
        unsigned char idx = data[3];
        char small[4];
        small[idx] = 0x41;                           /* stack-buffer-overflow */
        if (small[idx & 3] == 0) printf("");
    }
}

The bug is past a gate that demands four specific bytes, de ad be ef, at offsets four through seven. This is exactly the wide-exact-comparison wall from the architecture: with a normal four-byte check, getting three bytes right looks identical to getting none right, so mutation has no gradient and would need to hit all four by chance. The gate here is written as four separate single-byte checks instead, which is the laf-intel transformation done by hand, and it does help a little: matching the first byte reaches a new branch and shows up as new coverage. Even so, plain mutation does not crack the full token in any reasonable budget, because matching even one exact byte is a 1-in-256 event with nothing in the operators aimed at producing the specific value, so this is the bug the concolic stage exists for. Past the gate, an attacker-controlled index writes into a four-byte stack array, and small[idx] overflows the stack for a range of indices. The last line, the printf("") guarded by reading small back, exists only to stop the compiler from deciding the write is dead code and deleting it.

The fuzzer: `fuzzer.py`

With the target understood, the fuzzer puts the standard components into code and adds the concolic stage for the gates mutation can't pass.

Setup

import os, re, glob, random, subprocess, time
import angr, claripy, logging

random.seed(1337)
HERE = os.path.dirname(os.path.abspath(__file__))
SRC = os.path.join(HERE, "target.c")
ASAN_BIN = os.path.join(HERE, "target_asan")
COV_BIN = os.path.join(HERE, "target_cov")
PLAIN_BIN = os.path.join(HERE, "target_plain")
CASE = "/tmp/fuzz_case.bin"

DICTIONARY = [b'F', b'\x01', b'A', b'B', b'C']
INTERESTING = [0x00, 0x01, 0x10, 0x40, 0x7F, 0x80, 0xFF]
INPUT_SIZE = 12

random.seed(1337) makes every run identical, which is what lets this article quote exact execution counts. There are three builds of the target, in three binaries: target_asan for crash detection, target_cov for coverage, and target_plain for the concolic stage (AddressSanitizer's instrumentation confuses symbolic execution, so angr analyzes a clean build). CASE is the file each candidate gets written to before a run.

The two lists feed the mutation engine. INTERESTING is a set of generic boundary values that tend to trigger off-by-one and overflow conditions. DICTIONARY holds bytes drawn from the format specification: the magic byte, the version byte, and the three record-type letters, the kind of structural tokens a fuzzer is normally seeded with so it can clear a format's surface checks.

Building the three targets

def build():
    subprocess.run(["gcc", "-fsanitize=address", "-g", "-O0", "-o", ASAN_BIN, SRC],
                   check=True, stderr=subprocess.DEVNULL)
    subprocess.run(["gcc", "--coverage", "-O0", "-o", COV_BIN, SRC],
                   check=True, stderr=subprocess.DEVNULL)
    subprocess.run(["gcc", "-O0", "-g", "-o", PLAIN_BIN, SRC],
                   check=True, stderr=subprocess.DEVNULL)
    print("[*] built target_asan, target_cov, target_plain")

build compiles target.c three ways. The -fsanitize=address build aborts on memory corruption. The --coverage build is instrumented by gcov to record which lines execute. The plain -O0 -g build carries debug symbols and no instrumentation, which is what angr wants to reason about. Splitting the concerns across builds keeps each signal clean: one binary answers "did it crash," one answers "what did it run," and one is analyzable symbolically.

Clearing coverage state

def _clear_gcov():
    for f in glob.glob(os.path.join(HERE, "*.gcda")) + glob.glob(os.path.join(HERE, "*.gcov")):
        try: os.remove(f)
        except OSError: pass

The helper exists because of how gcov accumulates data. A --coverage binary writes execution counts to a .gcda file, and if that file already exists, the counts are added to it rather than replacing it. For per-input coverage that would be wrong: every input would look like it covered everything every previous input did. Deleting the .gcda (and the parsed .gcov) before each run keeps each input's coverage isolated. Small as it is, the feedback loop would be meaningless without it.

Measuring coverage

def coverage_of(data: bytes) -> frozenset:
    with open(CASE, "wb") as f:
        f.write(data)
    _clear_gcov()
    subprocess.run([COV_BIN, CASE], stdout=subprocess.DEVNULL,
                   stderr=subprocess.DEVNULL, cwd=HERE)
    subprocess.run(["gcov", "target_cov-target.gcda"],
                   stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, cwd=HERE)
    covered = set()
    g = os.path.join(HERE, "target.c.gcov")
    if os.path.exists(g):
        with open(g, errors="ignore") as f:
            for line in f:
                parts = line.split(":", 2)
                if len(parts) >= 2:
                    cnt, lineno = parts[0].strip(), parts[1].strip()
                    if cnt not in ("-", "#####") and cnt.isdigit() and int(cnt) > 0:
                        covered.add(int(lineno))
    return frozenset(covered)

Here is the coverage signal in full. The function writes the candidate to the case file, clears stale gcov state, runs the coverage build, then runs the gcov command to turn the resulting .gcda data into a readable target.c.gcov report. Each line of that report reads count:lineno:source. A line counts as covered when its count field is a positive integer; the two markers it rejects are gcov's own, - for a non-executable line (a brace, a blank) and ##### for an executable line that never ran. What comes back is the set of line numbers that ran, as a frozenset, cheap to diff against the running total. This is line coverage rather than the edge coverage a real fuzzer would use, and it shells out to gcov on every execution, which is slow; both points come back at the end. In the loop it does the one job that matters, flagging the inputs that reached something new.

The crash oracle

CRASH_RE = re.compile(r"AddressSanitizer:\s+(\S+).*?#0[^\n]*?target\.c:(\d+)", re.DOTALL)

def run_for_crash(data: bytes):
    with open(CASE, "wb") as f:
        f.write(data)
    env = dict(os.environ, ASAN_OPTIONS="abort_on_error=0:exitcode=86")
    p = subprocess.run([ASAN_BIN, CASE], stdout=subprocess.DEVNULL,
                       stderr=subprocess.PIPE, env=env, cwd=HERE)
    if p.returncode == 86 or b"AddressSanitizer" in p.stderr:
        err = p.stderr.decode(errors="ignore")
        m = CRASH_RE.search(err)
        return (m.group(1), int(m.group(2))) if m else ("unknown", -1)
    return None

The function runs the ASAN build and decides whether the input crashed it. Setting ASAN_OPTIONS=abort_on_error=0:exitcode=86 tells the sanitizer to exit with code 86 on an error instead of raising a signal, which makes the result easy to test for: the input crashed if the process exited 86 or if an AddressSanitizer report appeared on stderr. When it did crash, CRASH_RE pulls two facts from the report, the violation type (the token right after AddressSanitizer:, such as heap-buffer-overflow) and the source line from the top stack frame (#0 ... target.c:<line>). It returns that (type, line) pair, falls back to ("unknown", -1) if the format isn't recognized, and returns None when nothing crashed. That pair is the crash's signature, used for triage in the main loop.

The mutation engine

def mutate(seed: bytes, corpus) -> bytes:
    b = bytearray(seed)
    n = 1 << random.randint(1, 5)
    for _ in range(n):
        if not b:
            b = bytearray(random.choice(DICTIONARY)); continue
        op = 4 if random.random() < 0.30 else random.randrange(7)
        if op == 0:
            i = random.randrange(len(b)); b[i] ^= 1 << random.randrange(8)
        elif op == 1:
            b[random.randrange(len(b))] = random.randrange(256)
        elif op == 2:
            i = random.randrange(len(b)); b[i] = (b[i] + random.randint(-8, 8)) & 0xFF
        elif op == 3:
            b[random.randrange(len(b))] = random.choice(INTERESTING)
        elif op == 4:
            tok = random.choice(DICTIONARY); pos = random.randint(0, len(b)); b[pos:pos] = tok
        elif op == 5 and len(b) > 1:
            i = random.randrange(len(b)); del b[i:i + random.randint(1, len(b) - i)]
        elif op == 6 and corpus:
            other = random.choice(corpus); cut = random.randint(0, len(other))
            b = b[:random.randint(0, len(b))] + bytearray(other[cut:])
    return bytes(b)

mutate takes a parent input and applies a stack of random edits to a copy of it. The number of edits, 1 << random.randint(1, 5), is a random power of two from 2 to 32, so some passes barely change the input and others rework it heavily; piling several operators into one pass is the havoc approach the mainstream fuzzers use. The if not b guard handles the empty input: with nothing to mutate, it seeds the buffer from the dictionary and moves on, which is how the empty seed (b"") ever becomes anything.

Each edit picks an operator. The line op = 4 if random.random() < 0.30 else random.randrange(7) is weighted: 30 percent of the time it forces operator 4, dictionary insertion, otherwise it picks uniformly among all seven. The seven operators are the standard repertoire. Operator 0 flips a single random bit. Operator 1 sets a byte to a random value. Operator 2 adds a small signed number to a byte, good for walking length and offset fields across boundaries. Operator 3 overwrites a byte with one of the INTERESTING boundary constants. Operator 4 splices a dictionary entry in at a random position. Operator 5 deletes a run of bytes (it only runs when the buffer holds more than one byte, though the deletion itself can still remove everything, in which case the next call's empty-input branch reseeds it). Operator 6 is genetic crossover, joining the front of this input to the tail of another drawn from the corpus.

What this engine cannot do is produce the four exact bytes of the type-C token, and that limitation is the whole reason the next stage exists.

The concolic stage

_proj = None
def _project():
    global _proj
    if _proj is None:
        _proj = angr.Project(PLAIN_BIN, auto_load_libs=False)
    return _proj

def concolic_expand(stuck_input: bytes, known_cov: frozenset):
    proj = _project()

    sym = [claripy.BVS(f"b{i}", 8) for i in range(INPUT_SIZE)]
    content = claripy.Concat(*sym)
    simfile = angr.SimFile("input.bin", content=content, size=INPUT_SIZE)
    st = proj.factory.full_init_state(args=[PLAIN_BIN, "input.bin"],
                                      fs={"input.bin": simfile})
    simgr = proj.factory.simulation_manager(st)

    seen_blocks = set()
    new_inputs = []
    STEP_BUDGET = 600

    def harvest(state):
        if not state.solver.satisfiable():
            return
        try:
            sol = state.solver.eval(content, cast_to=bytes)
            if sol not in new_inputs:
                new_inputs.append(sol)
        except Exception:
            pass

        for idx_val in (4, 5, 8, 12):
            try:
                if state.solver.satisfiable(extra_constraints=[sym[3] == idx_val]):
                    sol = state.solver.eval(content, cast_to=bytes,
                                            extra_constraints=[sym[3] == idx_val])
                    if sol not in new_inputs:
                        new_inputs.append(sol)
            except Exception:
                pass

    steps = 0
    try:
        while simgr.active and steps < STEP_BUDGET:
            simgr.step()
            steps += 1
            for st_ in simgr.active:
                if st_.addr not in seen_blocks:   # a newly-reached block
                    seen_blocks.add(st_.addr)
                    harvest(st_)
            if len(simgr.active) > 200:           # bound state explosion
                simgr.active[:] = simgr.active[:200]
    except Exception:
        pass

    for stash in ("active", "deadended", "found"):
        for st_ in getattr(simgr, stash, []):
            harvest(st_)

    return new_inputs

Here is the part that gets through the gate. It builds a symbolic input: twelve bytes, each a fresh symbolic variable (claripy.BVS), concatenated and presented to the program as the contents of input.bin. angr then explores the program on that symbolic input, stepping the set of live states forward and forking them at each branch. There is no find= target and no bug address; the exploration simply fans out. Whenever a state reaches a basic block not seen before, harvest solves that state's accumulated constraints with s.solver.eval, producing a concrete input that drives execution to that block. When the explored path runs through the type-C token checks, the constraints on it pin bytes four through seven to the token, so the input the solver returns there contains de ad be ef, a value the fuzzer was never told and never aimed for. The token falls out of the gate's own branch conditions.

The loop over idx_val exists because reaching a store isn't the same as crashing on it: a write a few bytes past a four-byte stack array lands in the redzone AddressSanitizer guards, but a write far past it can miss into unpoisoned memory. Without committing to any address, harvest also asks the solver to fix the index byte to a few small values; for a state that happens to be past a gate using that byte as an index, one of those lands the write where the sanitizer sees it. For states where the index byte isn't an index into anything, the extra solves just produce more candidates that go nowhere. The point is that this is applied to every frontier state uniformly, not to a known bug.

The main hybrid loop

def fuzz(rounds=100000, stall_limit=4000):
    build()
    seeds = [b"", b"F", b"FF"]
    virgin = set()
    corpus = []
    for s in seeds:
        c = coverage_of(s)
        if c - virgin:
            virgin |= c; corpus.append(s)

fuzz ties everything together. It builds the three targets, then primes the corpus from three deliberately weak seeds (b"", b"F", b"FF"). Each seed runs through coverage_of and is kept only if it covers a line not already in virgin, the running set of all coverage so far. Starting this weak is the point: the fuzzer is handed almost nothing and has to discover the format on its own.

    crashes = {}
    execs = 0
    execs_since_new_cov = 0
    concolic_runs = 0
    t0 = time.time()

    for r in range(rounds):
        parent = random.choice(corpus)
        for _ in range(8):
            child = mutate(parent, corpus)
            execs += 1
            crash = run_for_crash(child)
            if crash and crash not in crashes:
                crashes[crash] = child
                print(f"[+] CRASH after {execs:>6} execs: ASAN {crash[0]} "
                      f"at target.c:{crash[1]} | {child!r}")
            c = coverage_of(child)
            if c - virgin:
                virgin |= c; corpus.append(child)
                execs_since_new_cov = 0
            else:
                execs_since_new_cov += 1

Here is the mutation loop, the same central loop from the architecture. Each round picks a parent and produces eight children. For every child: run the crash oracle, and if it crashed with a new signature, record the first input that produced it (the triage deduplication, keyed on the (type, line) pair). Then measure coverage; if the child reached a new line, fold it into virgin and add it to the corpus. The one addition is execs_since_new_cov, a counter that resets whenever coverage grows and climbs whenever it doesn't. That counter is how the loop knows it's stuck.

        if execs_since_new_cov >= stall_limit and len(crashes) < 3:
            concolic_runs += 1
            print(f"[*] mutation stalled at {execs} execs ({len(virgin)} lines). "
                  f"Invoking concolic stage #{concolic_runs} (angr/Z3)...")
            t = time.time()
            new_inputs = concolic_expand(random.choice(corpus), virgin)
            added = 0
            for ni in new_inputs:
                c = coverage_of(ni)
                if c - virgin:
                    virgin |= c; corpus.append(ni); added += 1
                cr = run_for_crash(ni)
                if cr and cr not in crashes:
                    crashes[cr] = ni
                    print(f"[+] CRASH via concolic input after {execs} execs: "
                          f"ASAN {cr[0]} at target.c:{cr[1]} | {ni!r}")
            print(f"    concolic produced {len(new_inputs)} inputs, "
                  f"{added} added new coverage, in {time.time()-t:.1f}s")
            execs_since_new_cov = 0

        if len(crashes) >= 3:
            print(f"[*] all 3 bugs found by {execs} execs; stopping")
            break

When the counter crosses stall_limit, mutation has plateaued, and the concolic stage takes over. It runs angr on the stuck program, and every input the exploration produces is fed back through the same machinery: measured for coverage and added to the corpus if it reached something new, and checked against the crash oracle. Among the inputs the exploration synthesizes are one that satisfies the type-C token gate (crashing the stack overflow) and one on the length == 7 path (crashing the use-after-free), so a single concolic invocation resolves both gated bugs without having been pointed at either. After it runs, the stall counter resets and mutation resumes from an enriched corpus. The campaign stops once all three distinct bugs are recorded.

    print("\n==== campaign summary ====")
    print(f"executions       : {execs}")
    print(f"concolic invocations: {concolic_runs}")
    print(f"corpus size      : {len(corpus)}")
    print(f"lines covered    : {len(virgin)}")
    print(f"wall clock       : {time.time()-t0:.1f}s")
    print(f"unique crashes   : {len(crashes)} (verified by AddressSanitizer)")
    for (kind, line), inp in sorted(crashes.items(), key=lambda x: x[0][1]):
        print(f"  - {kind:22} at target.c:{line:<3} reproducer={inp!r}")
    _clear_gcov()
    return crashes

The tail prints a summary: total executions, how many times the concolic stage fired, the corpus size, lines covered, wall-clock time, and the unique crashes with a reproducing input for each, sorted by source line. A final _clear_gcov tidies up.

What a run looks like

With the seed fixed, the campaign is reproducible. Mutation finds the heap overflow on its own at 423 executions, since that bug hides behind nothing but the header and a type byte. Then it plateaus: the type-B and type-C bugs are both behind value gates mutation can't satisfy, so coverage stops growing and the stall counter climbs. Once it has gone stall_limit executions with no new coverage, here at 5992 total, the concolic stage fires once and angr cracks both remaining bugs in a few seconds, the type-C reproducer carrying the de ad be ef the solver derived:

[+] CRASH after    423 execs: ASAN heap-buffer-overflow at target.c:31 | b'F\x01ACAC\x7fCCB\xdaFJV'
[*] mutation stalled at 5992 execs (30 lines). Invoking concolic stage #1 (angr/Z3)...
[+] CRASH via concolic input after 5992 execs: ASAN stack-buffer-overflow at target.c:59 | b'F\x01C\x04\xde\xad\xbe\xef\x00\x00\x00\x00'
[+] CRASH via concolic input after 5992 execs: ASAN heap-use-after-free at target.c:42 | b'F\x01B\x07\x00\x00\x00\x00\x00\x00\x00\x00'
    concolic produced 310 inputs, 6 added new coverage, in 25.4s
[*] all 3 bugs found by 5992 execs; stopping
unique crashes : 3 (verified by AddressSanitizer)

The split tells the story of the architecture. The unguarded bug fell to cheap mutation almost immediately. The two behind exact-value gates were invisible to mutation, no matter how long it ran, and the concolic stage synthesized passing inputs for both, deriving the type-C token from the gate's constraints rather than guessing it.

What it is and isn't

Put together, this is a working hybrid fuzzer: a coverage-guided mutation loop for the easy bug, a concolic angr/Z3 stage for the gates mutation can't pass, a sanitizer crash oracle, and signature-based triage. It finds memory bugs, derives a magic value it was never given, and proves each crash with AddressSanitizer.

The simplifications are worth stating plainly. It uses line coverage rather than the finer-grained edge coverage real fuzzers track, so two different paths through the same lines look identical to it. The type-C gate is split into single-byte checks by hand, where AFL++ applies that transformation automatically with a compiler pass. The concolic stage explores from a fixed twelve-byte symbolic input under a bounded step budget and sweeps a few index values on every frontier state to turn reachable bugs into detectable crashes; a production hybrid fuzzer scales to far larger inputs and longer exploration and uses the fuzzer's own coverage to decide which frontier states are worth solving. It spawns fresh processes for every input, two per execution plus a gcov invocation, which is why it runs at dozens of executions per second instead of the tens of thousands a fork-server or in-process harness reaches. And its triage neither deduplicates on full stack traces nor minimizes reproducers. Each of those is a shortcut traded for clarity. What the example shows: coverage feedback turns blind mutation into a directed search, that search hits a wall at checks with no gradient, and a constraint solver is one way through the wall.

View full post