Vegeta for HTTP Load Testing: Shell Scripting at Extreme Load

Most load testing tools have a fundamental design flaw: they throttle themselves. They send N requests, wait for responses, then send the next batch. That’s not how real traffic works. Real traffic doesn’t care whether your server is struggling — it keeps coming at the same rate.

That’s precisely why Vegeta exists, and why it belongs in your toolkit.

Vegeta (GitHub) is an HTTP load testing tool and library written in Go. It was built around one idea: constant-rate attack. You tell it "hit this endpoint at 500 requests per second" and it does exactly that, regardless of response times. The result is a brutally honest picture of what happens to your system under sustained pressure, not under a politely throttled benchmark.

This article walks through real-world Vegeta usage — from installation to multi-target shell scripts that you can drop into a CI pipeline or a pre-deploy smoke test.

Why Not ab, wrk, or k6?

ab (Apache Bench) is fine for a quick sanity check. wrk is fast and lua-scriptable. k6 is excellent for complex flows. But each of them operates on a concurrency model: they maintain N workers and push as hard as the workers allow.

The problem: if latency spikes, throughput drops. You end up testing "how fast can my server respond" rather than "what happens when traffic stays constant and the server slows down." The latter is what causes real outages.

Vegeta separates rate from concurrency. You set the rate, Vegeta maintains it. Latency can climb to 10 seconds — Vegeta still sends the next request on schedule. That’s the test your production incident was actually running.

Installation

Vegeta ships as a single static binary. No runtime dependencies, no Python version conflicts, no npm drama.

From GitHub releases (recommended for servers):

# Grab the latest release — check https://github.com/tsenart/vegeta/releases for current version
VEGETA_VERSION="12.12.0"
ARCH="amd64"  # or arm64 for ARM machines

curl -L "https://github.com/tsenart/vegeta/releases/download/v${VEGETA_VERSION}/vegeta_${VEGETA_VERSION}_linux_${ARCH}.tar.gz" \
  | tar xz vegeta

sudo mv vegeta /usr/local/bin/
vegeta --version

Via Go (if you have the toolchain):

go install github.com/tsenart/vegeta/v12@latest

On macOS:

brew install vegeta

Verify it works:

echo "GET http://httpbin.org/get" | vegeta attack -rate=5 -duration=5s | vegeta report

If you see a summary table, you’re good.

The Core Concepts

Before writing scripts, understand the three-stage pipeline Vegeta uses. It’s not hidden — it’s literally how you compose commands:

Attack — sends requests at the specified rate, produces a binary stream of results
Report — reads that binary stream and formats it (text summary, JSON, histogram)
Plot — generates an SVG latency chart (useful for visual inspection)

vegeta attack [options] | vegeta report [options]

The binary stream can be saved to a file, which is essential for running attacks and generating multiple report types without re-running the test.

Basic Usage

The simplest possible attack:

echo "GET https://example.com/" | vegeta attack -rate=50 -duration=30s | vegeta report

Output:

Requests      [total, rate, throughput]  1500, 50.03, 49.81/s
Duration      [total, attack, wait]      30.104s, 29.98s, 123.72ms
Latencies     [min, mean, 50, 90, 95, 99, max]  98ms, 134ms, 127ms, 189ms, 213ms, 390ms, 1.2s
Bytes In      [total, mean]              213000, 142.00
Bytes Out     [total, mean]              0, 0.00
Success       [ratio]                    100.00%
Status Codes  [code:count]               200:1500
Error Set:

The 99th percentile and max latency are where you should be looking. If your p99 is 390ms but your SLA requires sub-200ms responses for 99% of traffic, you just found a problem — before the incident.

Targets File: Testing Multiple Endpoints

Real services aren’t one endpoint. You need to test a realistic distribution of traffic. Vegeta accepts a targets file with one request per entry:

# targets.txt
GET https://api.example.com/health

GET https://api.example.com/v1/users/123

POST https://api.example.com/v1/events
Content-Type: application/json
@event_body.json

GET https://api.example.com/v1/products?page=1&limit=20
Authorization: Bearer your_token_here

DELETE https://api.example.com/v1/sessions/abc123
Authorization: Bearer your_token_here

The @filename syntax attaches a file as the request body — cleaner than trying to inline JSON in the targets file.

Run it:

vegeta attack -targets=targets.txt -rate=100 -duration=60s | tee results.bin | vegeta report

The tee saves the raw binary results while also streaming to the report. You can then generate a histogram separately:

vegeta report -type=hist[0,10ms,50ms,100ms,250ms,500ms,1s] results.bin

Shell Scripting for Real Load Tests

This is where Vegeta gets serious. Here’s a production-grade shell script that runs a stepped load test — starting light and ramping up — so you can find the breaking point of your service.

#!/usr/bin/env bash
# load_ramp.sh — stepped load test with Vegeta
# Usage: ./load_ramp.sh https://api.example.com/v1/health

set -euo pipefail

TARGET_URL="${1:?Usage: $0 <target-url>}"
DURATION="30s"
RESULTS_DIR="./results/$(date +%Y%m%d_%H%M%S)"
RATES=(10 50 100 200 500 1000)

mkdir -p "${RESULTS_DIR}"

echo "Target: ${TARGET_URL}"
echo "Results dir: ${RESULTS_DIR}"
echo ""

for RATE in "${RATES[@]}"; do
    echo "=== Attacking at ${RATE} req/s for ${DURATION} ==="

    RESULT_FILE="${RESULTS_DIR}/rate_${RATE}.bin"

    # Run the attack and save the binary result stream
    echo "GET ${TARGET_URL}" \
        | vegeta attack \
            -rate="${RATE}" \
            -duration="${DURATION}" \
            -timeout="10s" \
            -keepalive=true \
        > "${RESULT_FILE}"

    # Print a quick summary for this rate
    vegeta report "${RESULT_FILE}"

    # Extract the success ratio — bail out if it drops below 95%
    SUCCESS=$(vegeta report -type=json "${RESULT_FILE}" | python3 -c "
import sys, json
data = json.load(sys.stdin)
print(data['success'])
")

    echo "Success ratio: ${SUCCESS}"

    if (( $(echo "${SUCCESS} < 0.95" | bc -l) )); then
        echo "ERROR: Success ratio dropped below 95% at ${RATE} req/s. Stopping ramp."
        break
    fi

    # Brief pause between steps to let the service recover
    echo "Pausing 5s before next step..."
    sleep 5
    echo ""
done

echo "=== Generating combined report ==="

# Merge all result files and produce a final summary
cat "${RESULTS_DIR}"/*.bin \
    | vegeta report \
        -type=hist[0,5ms,25ms,50ms,100ms,250ms,500ms,1s,2s,5s]

# Generate an SVG latency plot for the full run
cat "${RESULTS_DIR}"/*.bin \
    | vegeta plot > "${RESULTS_DIR}/latency_chart.svg"

echo ""
echo "Done. SVG plot saved to ${RESULTS_DIR}/latency_chart.svg"

Run it against a local service under development:

chmod +x load_ramp.sh
./load_ramp.sh https://cd-linux.club/api/health

You’ll see each rate level run, get its report, and the script stops the ramp the moment your service starts falling apart. The merged histogram at the end shows the full picture across all rates.

POST Requests with Dynamic Bodies

Static request bodies are fine for read-heavy APIs. For write-heavy tests you sometimes need variation — different user IDs, different payloads. You have two options.

Option 1: Pre-generated targets file

Generate a large targets file before the test:

#!/usr/bin/env bash
# gen_targets.sh — generates 10,000 unique POST targets

OUTFILE="targets_post.txt"
> "${OUTFILE}"  # truncate

for i in $(seq 1 10000); do
    # Write the request headers
    cat >> "${OUTFILE}" << EOF
POST https://api.example.com/v1/events
Content-Type: application/json

{"user_id": ${i}, "event": "page_view", "ts": $(date +%s)}

EOF
done

echo "Generated ${OUTFILE}"

Vegeta cycles through the targets file in order, looping back when it reaches the end. 10,000 unique entries is usually enough to avoid caching artifacts.

Option 2: Vegeta’s library in a Go program

For truly dynamic payloads (cryptographic tokens, real timestamps, HMAC-signed requests), drop down to Go and use Vegeta as a library. The shell approach gets you 80% of the way there; the remaining 20% needs code.

Reading Results Properly

Most people look at average latency. That’s wrong. Averages hide everything interesting.

The metrics that matter:

Metric	What to watch for
p50	Your median user experience
p95	What 1 in 20 users experiences
p99	What your worst 1% experiences — usually the SLA target
max	Single worst request — often indicates a GC pause, lock contention, or cold cache
Success ratio	Anything below 99.9% is a conversation
Throughput vs Rate	If throughput < rate, requests are being dropped or queued

The difference between rate and throughput in the report header is your queue depth growing. When rate=500 but throughput=387, your server is falling behind. That gap will grow until something crashes.

Gotchas

Gotcha 1: Your test machine is the bottleneck

Vegeta can saturate your test machine’s network stack before it saturates the target server. If you’re running 2000 req/s from a single small VM, check your CPU and open file descriptors first:

# Raise the open file limit before running
ulimit -n 65535

# Check during the test
watch -n1 'cat /proc/$(pgrep vegeta)/fd | wc -l'

Gotcha 2: DNS resolution costs

By default, Vegeta resolves DNS for every connection if you’re not using keepalives. Use -keepalive=true (it’s the default but worth being explicit) and test against IPs when you want to isolate application-layer performance from DNS.

Gotcha 3: The wall clock drift

Vegeta uses a pacer to maintain constant rate. Under heavy load on a loaded machine, the pacer can drift. Always check the rate column in the report — if it says 498/s instead of 500/s, your machine is struggling to keep up. Results are still valid, but note the actual achieved rate.

Gotcha 4: HTTP/2 and TLS overhead

If you’re testing a TLS endpoint, the TLS handshake cost is included in your latency numbers. For baseline performance tests, use HTTP against a local service. For realistic production tests, always use TLS — that overhead is real and your users pay it too.

Gotcha 5: Target service connection limits

Your service might refuse connections before it slows down. You’ll see a sudden jump in errors rather than gradual latency increase. Check ulimit -n on the server side and tune your systemd service unit or nginx worker_connections accordingly before concluding that your app code is the problem.

Integrating Into CI/CD

A load test that runs only before an incident is useless. The real value is running it on every deploy against a staging environment and failing the pipeline if performance regresses.

#!/usr/bin/env bash
# ci_load_check.sh — runs a 60-second load test and fails if p99 > threshold

set -euo pipefail

TARGET="${STAGING_URL}/api/health"
RATE=200          # req/s — approximate prod traffic
DURATION="60s"
P99_LIMIT_MS=250  # fail if p99 latency exceeds this

RESULT=$(echo "GET ${TARGET}" \
    | vegeta attack -rate="${RATE}" -duration="${DURATION}" -timeout="5s" \
    | vegeta report -type=json)

P99=$(echo "${RESULT}" | python3 -c "
import sys, json
data = json.load(sys.stdin)
# latencies are in nanoseconds
print(int(data['latencies']['99th'] / 1_000_000))
")

SUCCESS=$(echo "${RESULT}" | python3 -c "
import sys, json
data = json.load(sys.stdin)
print(data['success'])
")

echo "p99 latency: ${P99}ms (limit: ${P99_LIMIT_MS}ms)"
echo "Success ratio: ${SUCCESS}"

if (( P99 > P99_LIMIT_MS )); then
    echo "FAIL: p99 latency ${P99}ms exceeds limit of ${P99_LIMIT_MS}ms"
    exit 1
fi

if (( $(echo "${SUCCESS} < 0.999" | bc -l) )); then
    echo "FAIL: Success ratio ${SUCCESS} below 99.9%"
    exit 1
fi

echo "PASS"

Add this to your GitHub Actions workflow after deploying to staging:

- name: Load test staging
  env:
    STAGING_URL: https://staging.api.example.com
  run: |
    bash ci_load_check.sh

Now a regression that pushes p99 latency past your SLA will break the build before it reaches production.

Production-Ready Patterns

Pattern 1: Store all binary results

Never throw away the raw .bin files. Raw results can be re-processed with different histogram buckets, combined with other runs, or compared against a future baseline. Disk is cheap; re-running a load test to get a different histogram is annoying.

Pattern 2: Test with production-like connection behavior

Real clients use keepalive but also open new connections. Mix -keepalive=true tests with -keepalive=false tests to see both scenarios. New connection overhead is significant for TLS endpoints.

Pattern 3: Run from multiple machines for high rates

At 2000+ req/s you need multiple machines. Run Vegeta on each, save .bin files, copy them to one machine, and merge:

# On the aggregator machine
cat machine1.bin machine2.bin machine3.bin | vegeta report

The binary format is designed for exactly this. The merged report is statistically correct.

Pattern 4: Correlate with server metrics

A Vegeta report without server-side metrics (CPU, memory, connection pool saturation, GC pause time) is only half the picture. Time your Vegeta run so you can correlate the latency histogram with Prometheus/Grafana dashboards. The combination tells you why p99 spiked, not just that it did.

Vegeta is one of those tools that rewards you for understanding it fully. The constant-rate model feels pedantic until the first time you see a "successful" load test mask a latency problem, then use Vegeta and find your service melting at 400 req/s sustained. That’s the test that saves you at 2am.

The shell scripts here are starting points. Adapt the ramp test to your traffic patterns, tune the CI thresholds to your actual SLAs, and store those .bin files somewhere your team can get to them. Good performance data is a gift to your future self.