Most load testing tools have a fundamental design flaw: they throttle themselves. They send N requests, wait for responses, then send the next batch. That’s not how real traffic works. Real traffic doesn’t care whether your server is struggling — it keeps coming at the same rate.
That’s precisely why Vegeta exists, and why it belongs in your toolkit.
Vegeta (GitHub) is an HTTP load testing tool and library written in Go. It was built around one idea: constant-rate attack. You tell it "hit this endpoint at 500 requests per second" and it does exactly that, regardless of response times. The result is a brutally honest picture of what happens to your system under sustained pressure, not under a politely throttled benchmark.
This article walks through real-world Vegeta usage — from installation to multi-target shell scripts that you can drop into a CI pipeline or a pre-deploy smoke test.
Why Not ab, wrk, or k6?
ab (Apache Bench) is fine for a quick sanity check. wrk is fast and lua-scriptable. k6 is excellent for complex flows. But each of them operates on a concurrency model: they maintain N workers and push as hard as the workers allow.
The problem: if latency spikes, throughput drops. You end up testing "how fast can my server respond" rather than "what happens when traffic stays constant and the server slows down." The latter is what causes real outages.
Vegeta separates rate from concurrency. You set the rate, Vegeta maintains it. Latency can climb to 10 seconds — Vegeta still sends the next request on schedule. That’s the test your production incident was actually running.
Installation
Vegeta ships as a single static binary. No runtime dependencies, no Python version conflicts, no npm drama.
From GitHub releases (recommended for servers):
# Grab the latest release — check https://github.com/tsenart/vegeta/releases for current version
VEGETA_VERSION="12.12.0"
ARCH="amd64" # or arm64 for ARM machines
curl -L "https://github.com/tsenart/vegeta/releases/download/v${VEGETA_VERSION}/vegeta_${VEGETA_VERSION}_linux_${ARCH}.tar.gz" \
| tar xz vegeta
sudo mv vegeta /usr/local/bin/
vegeta --version
Via Go (if you have the toolchain):
go install github.com/tsenart/vegeta/v12@latest
On macOS:
brew install vegeta
Verify it works:
echo "GET http://httpbin.org/get" | vegeta attack -rate=5 -duration=5s | vegeta report
If you see a summary table, you’re good.
The Core Concepts
Before writing scripts, understand the three-stage pipeline Vegeta uses. It’s not hidden — it’s literally how you compose commands:
- Attack — sends requests at the specified rate, produces a binary stream of results
- Report — reads that binary stream and formats it (text summary, JSON, histogram)
- Plot — generates an SVG latency chart (useful for visual inspection)
vegeta attack [options] | vegeta report [options]
The binary stream can be saved to a file, which is essential for running attacks and generating multiple report types without re-running the test.
Basic Usage
The simplest possible attack:
echo "GET https://example.com/" | vegeta attack -rate=50 -duration=30s | vegeta report
Output:
Requests [total, rate, throughput] 1500, 50.03, 49.81/s
Duration [total, attack, wait] 30.104s, 29.98s, 123.72ms
Latencies [min, mean, 50, 90, 95, 99, max] 98ms, 134ms, 127ms, 189ms, 213ms, 390ms, 1.2s
Bytes In [total, mean] 213000, 142.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:1500
Error Set:
The 99th percentile and max latency are where you should be looking. If your p99 is 390ms but your SLA requires sub-200ms responses for 99% of traffic, you just found a problem — before the incident.
Targets File: Testing Multiple Endpoints
Real services aren’t one endpoint. You need to test a realistic distribution of traffic. Vegeta accepts a targets file with one request per entry:
# targets.txt
GET https://api.example.com/health
GET https://api.example.com/v1/users/123
POST https://api.example.com/v1/events
Content-Type: application/json
@event_body.json
GET https://api.example.com/v1/products?page=1&limit=20
Authorization: Bearer your_token_here
DELETE https://api.example.com/v1/sessions/abc123
Authorization: Bearer your_token_here
The @filename syntax attaches a file as the request body — cleaner than trying to inline JSON in the targets file.
Run it:
vegeta attack -targets=targets.txt -rate=100 -duration=60s | tee results.bin | vegeta report
The tee saves the raw binary results while also streaming to the report. You can then generate a histogram separately:
vegeta report -type=hist[0,10ms,50ms,100ms,250ms,500ms,1s] results.bin
Shell Scripting for Real Load Tests
This is where Vegeta gets serious. Here’s a production-grade shell script that runs a stepped load test — starting light and ramping up — so you can find the breaking point of your service.
#!/usr/bin/env bash
# load_ramp.sh — stepped load test with Vegeta
# Usage: ./load_ramp.sh https://api.example.com/v1/health
set -euo pipefail
TARGET_URL="${1:?Usage: $0 <target-url>}"
DURATION="30s"
RESULTS_DIR="./results/$(date +%Y%m%d_%H%M%S)"
RATES=(10 50 100 200 500 1000)
mkdir -p "${RESULTS_DIR}"
echo "Target: ${TARGET_URL}"
echo "Results dir: ${RESULTS_DIR}"
echo ""
for RATE in "${RATES[@]}"; do
echo "=== Attacking at ${RATE} req/s for ${DURATION} ==="
RESULT_FILE="${RESULTS_DIR}/rate_${RATE}.bin"
# Run the attack and save the binary result stream
echo "GET ${TARGET_URL}" \
| vegeta attack \
-rate="${RATE}" \
-duration="${DURATION}" \
-timeout="10s" \
-keepalive=true \
> "${RESULT_FILE}"
# Print a quick summary for this rate
vegeta report "${RESULT_FILE}"
# Extract the success ratio — bail out if it drops below 95%
SUCCESS=$(vegeta report -type=json "${RESULT_FILE}" | python3 -c "
import sys, json
data = json.load(sys.stdin)
print(data['success'])
")
echo "Success ratio: ${SUCCESS}"
if (( $(echo "${SUCCESS} < 0.95" | bc -l) )); then
echo "ERROR: Success ratio dropped below 95% at ${RATE} req/s. Stopping ramp."
break
fi
# Brief pause between steps to let the service recover
echo "Pausing 5s before next step..."
sleep 5
echo ""
done
echo "=== Generating combined report ==="
# Merge all result files and produce a final summary
cat "${RESULTS_DIR}"/*.bin \
| vegeta report \
-type=hist[0,5ms,25ms,50ms,100ms,250ms,500ms,1s,2s,5s]
# Generate an SVG latency plot for the full run
cat "${RESULTS_DIR}"/*.bin \
| vegeta plot > "${RESULTS_DIR}/latency_chart.svg"
echo ""
echo "Done. SVG plot saved to ${RESULTS_DIR}/latency_chart.svg"
Run it against a local service under development:
chmod +x load_ramp.sh
./load_ramp.sh https://cd-linux.club/api/health
You’ll see each rate level run, get its report, and the script stops the ramp the moment your service starts falling apart. The merged histogram at the end shows the full picture across all rates.
POST Requests with Dynamic Bodies
Static request bodies are fine for read-heavy APIs. For write-heavy tests you sometimes need variation — different user IDs, different payloads. You have two options.
Option 1: Pre-generated targets file
Generate a large targets file before the test:
#!/usr/bin/env bash
# gen_targets.sh — generates 10,000 unique POST targets
OUTFILE="targets_post.txt"
> "${OUTFILE}" # truncate
for i in $(seq 1 10000); do
# Write the request headers
cat >> "${OUTFILE}" << EOF
POST https://api.example.com/v1/events
Content-Type: application/json
{"user_id": ${i}, "event": "page_view", "ts": $(date +%s)}
EOF
done
echo "Generated ${OUTFILE}"
Vegeta cycles through the targets file in order, looping back when it reaches the end. 10,000 unique entries is usually enough to avoid caching artifacts.
Option 2: Vegeta’s library in a Go program
For truly dynamic payloads (cryptographic tokens, real timestamps, HMAC-signed requests), drop down to Go and use Vegeta as a library. The shell approach gets you 80% of the way there; the remaining 20% needs code.
Reading Results Properly
Most people look at average latency. That’s wrong. Averages hide everything interesting.
The metrics that matter:
| Metric | What to watch for |
|---|---|
| p50 | Your median user experience |
| p95 | What 1 in 20 users experiences |
| p99 | What your worst 1% experiences — usually the SLA target |
| max | Single worst request — often indicates a GC pause, lock contention, or cold cache |
| Success ratio | Anything below 99.9% is a conversation |
| Throughput vs Rate | If throughput < rate, requests are being dropped or queued |
The difference between rate and throughput in the report header is your queue depth growing. When rate=500 but throughput=387, your server is falling behind. That gap will grow until something crashes.
Gotchas
Gotcha 1: Your test machine is the bottleneck
Vegeta can saturate your test machine’s network stack before it saturates the target server. If you’re running 2000 req/s from a single small VM, check your CPU and open file descriptors first:
# Raise the open file limit before running
ulimit -n 65535
# Check during the test
watch -n1 'cat /proc/$(pgrep vegeta)/fd | wc -l'
Gotcha 2: DNS resolution costs
By default, Vegeta resolves DNS for every connection if you’re not using keepalives. Use -keepalive=true (it’s the default but worth being explicit) and test against IPs when you want to isolate application-layer performance from DNS.
Gotcha 3: The wall clock drift
Vegeta uses a pacer to maintain constant rate. Under heavy load on a loaded machine, the pacer can drift. Always check the rate column in the report — if it says 498/s instead of 500/s, your machine is struggling to keep up. Results are still valid, but note the actual achieved rate.
Gotcha 4: HTTP/2 and TLS overhead
If you’re testing a TLS endpoint, the TLS handshake cost is included in your latency numbers. For baseline performance tests, use HTTP against a local service. For realistic production tests, always use TLS — that overhead is real and your users pay it too.
Gotcha 5: Target service connection limits
Your service might refuse connections before it slows down. You’ll see a sudden jump in errors rather than gradual latency increase. Check ulimit -n on the server side and tune your systemd service unit or nginx worker_connections accordingly before concluding that your app code is the problem.
Integrating Into CI/CD
A load test that runs only before an incident is useless. The real value is running it on every deploy against a staging environment and failing the pipeline if performance regresses.
#!/usr/bin/env bash
# ci_load_check.sh — runs a 60-second load test and fails if p99 > threshold
set -euo pipefail
TARGET="${STAGING_URL}/api/health"
RATE=200 # req/s — approximate prod traffic
DURATION="60s"
P99_LIMIT_MS=250 # fail if p99 latency exceeds this
RESULT=$(echo "GET ${TARGET}" \
| vegeta attack -rate="${RATE}" -duration="${DURATION}" -timeout="5s" \
| vegeta report -type=json)
P99=$(echo "${RESULT}" | python3 -c "
import sys, json
data = json.load(sys.stdin)
# latencies are in nanoseconds
print(int(data['latencies']['99th'] / 1_000_000))
")
SUCCESS=$(echo "${RESULT}" | python3 -c "
import sys, json
data = json.load(sys.stdin)
print(data['success'])
")
echo "p99 latency: ${P99}ms (limit: ${P99_LIMIT_MS}ms)"
echo "Success ratio: ${SUCCESS}"
if (( P99 > P99_LIMIT_MS )); then
echo "FAIL: p99 latency ${P99}ms exceeds limit of ${P99_LIMIT_MS}ms"
exit 1
fi
if (( $(echo "${SUCCESS} < 0.999" | bc -l) )); then
echo "FAIL: Success ratio ${SUCCESS} below 99.9%"
exit 1
fi
echo "PASS"
Add this to your GitHub Actions workflow after deploying to staging:
- name: Load test staging
env:
STAGING_URL: https://staging.api.example.com
run: |
bash ci_load_check.sh
Now a regression that pushes p99 latency past your SLA will break the build before it reaches production.
Production-Ready Patterns
Pattern 1: Store all binary results
Never throw away the raw .bin files. Raw results can be re-processed with different histogram buckets, combined with other runs, or compared against a future baseline. Disk is cheap; re-running a load test to get a different histogram is annoying.
Pattern 2: Test with production-like connection behavior
Real clients use keepalive but also open new connections. Mix -keepalive=true tests with -keepalive=false tests to see both scenarios. New connection overhead is significant for TLS endpoints.
Pattern 3: Run from multiple machines for high rates
At 2000+ req/s you need multiple machines. Run Vegeta on each, save .bin files, copy them to one machine, and merge:
# On the aggregator machine
cat machine1.bin machine2.bin machine3.bin | vegeta report
The binary format is designed for exactly this. The merged report is statistically correct.
Pattern 4: Correlate with server metrics
A Vegeta report without server-side metrics (CPU, memory, connection pool saturation, GC pause time) is only half the picture. Time your Vegeta run so you can correlate the latency histogram with Prometheus/Grafana dashboards. The combination tells you why p99 spiked, not just that it did.
Vegeta is one of those tools that rewards you for understanding it fully. The constant-rate model feels pedantic until the first time you see a "successful" load test mask a latency problem, then use Vegeta and find your service melting at 400 req/s sustained. That’s the test that saves you at 2am.
The shell scripts here are starting points. Adapt the ramp test to your traffic patterns, tune the CI thresholds to your actual SLAs, and store those .bin files somewhere your team can get to them. Good performance data is a gift to your future self.