Load testing is one of those things every team says they do and almost no one does right. Running k6 run script.js on your laptop against staging is not a load test — it’s a prayer. Your laptop will saturate its NIC, your OS will throttle sockets, and you’ll spend three hours debugging the tool instead of the system under test.
The real answer is distributed load generation: dozens of k6 instances firing in parallel, coordinated, with centralized output. That’s exactly what the k6 Operator solves. It gives you a Kubernetes-native way to spin up arbitrarily large test fleets via a single kubectl apply.
Official repo: https://github.com/grafana/k6-operator
Why k6, Why Kubernetes
k6 is the sensible choice for modern load testing. The scripting model is TypeScript-flavored JavaScript, the binary is a single statically-linked Go executable, and the output options are genuinely good — Prometheus remote write, InfluxDB, Datadog, CloudWatch, stdout JSON. It doesn’t spin up a JVM, it doesn’t need a GUI, and it doesn’t require a license key to do anything useful.
The problem is single-node throughput. A single k6 instance can realistically push somewhere between 3,000 and 30,000 RPS depending on the test complexity and the machine. If you’re trying to simulate 200,000 concurrent users hammering your checkout service, you need a fleet.
Kubernetes is the natural orchestration layer. You already have it. The k6 Operator turns your cluster into a load generation platform with almost no extra infrastructure to manage.
Architecture in One Paragraph
The operator watches for TestRun custom resources. When you apply one, the controller creates a Job per configured instance (parallelism), each running a k6 pod that pulls the test script from a ConfigMap. All pods run the same script but receive a different segment of the virtual user workload via k6’s built-in execution segmentation. The operator waits for all pods to finish, aggregates exit codes, and updates the TestRun status. That’s it. Simple, auditable, GitOps-friendly.
Prerequisites
- A working Kubernetes cluster (1.24+). k3s, EKS, GKE — all fine.
kubectlconfigured and pointing at it.- Helm 3.
- Basic familiarity with k6 scripting. If you’ve never used k6, spend 20 minutes on the k6 quickstart first.
Installing the k6 Operator
The officially supported installation method is Helm. There’s also a bundle.yaml if you prefer raw manifests, but Helm gives you easier upgrades.
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install k6-operator grafana/k6-operator \
--namespace k6-operator \
--create-namespace
Verify the controller is running:
kubectl get pods -n k6-operator
# NAME READY STATUS RESTARTS AGE
# k6-operator-controller-manager-xyz-ab 2/2 Running 0 45s
The operator installs its CRDs automatically. Confirm:
kubectl get crd | grep k6
# testruns.k6.io
# privateloadzoness.k6.io
Writing a k6 Script Worth Running
Before wiring up Kubernetes, write a script that actually tests something meaningful. Here’s a realistic e-commerce scenario — browsing a product list, adding to cart, and checking out:
// script.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Trend, Counter } from 'k6/metrics';
// Custom metrics — these will surface in any output backend
const checkoutDuration = new Trend('checkout_duration', true);
const failedRequests = new Counter('failed_requests');
export const options = {
// Ramp up, sustain, ramp down — classic shape
stages: [
{ duration: '2m', target: 500 }, // ramp to 500 VUs
{ duration: '5m', target: 500 }, // hold
{ duration: '1m', target: 0 }, // drain
],
thresholds: {
http_req_failed: ['rate<0.01'], // <1% errors
http_req_duration: ['p(95)<400'], // 95th percentile under 400ms
checkout_duration: ['p(99)<1000'], // checkout 99th under 1s
},
};
const BASE_URL = __ENV.BASE_URL || 'https://staging.myapp.internal';
export default function () {
// Step 1 — product listing
const listRes = http.get(`${BASE_URL}/api/products?page=1&limit=20`);
check(listRes, { 'products 200': r => r.status === 200 });
sleep(1);
// Step 2 — add to cart
const cartRes = http.post(
`${BASE_URL}/api/cart`,
JSON.stringify({ product_id: 42, qty: 1 }),
{ headers: { 'Content-Type': 'application/json' } }
);
check(cartRes, { 'cart 201': r => r.status === 201 });
sleep(0.5);
// Step 3 — checkout (the expensive path)
const start = Date.now();
const checkoutRes = http.post(
`${BASE_URL}/api/checkout`,
JSON.stringify({ payment_method: 'card_test' }),
{ headers: { 'Content-Type': 'application/json' } }
);
checkoutDuration.add(Date.now() - start);
if (!check(checkoutRes, { 'checkout 200': r => r.status === 200 })) {
failedRequests.add(1);
}
sleep(2);
}
Store this in a ConfigMap:
kubectl create configmap k6-test-script \
--from-file=script.js=script.js \
--namespace default
The TestRun Resource
This is where it all comes together. The TestRun CRD is your test configuration:
# testrun.yaml
apiVersion: k6.io/v1alpha1
kind: TestRun
metadata:
name: ecommerce-load-test
namespace: default
spec:
parallelism: 10 # 10 k6 pods, each handles 1/10 of the VU workload
script:
configMap:
name: k6-test-script
file: script.js
arguments: --out experimental-prometheus-rw # push metrics to Prometheus
runner:
env:
- name: BASE_URL
value: "https://staging.myapp.internal"
# Prometheus remote write endpoint
- name: K6_PROMETHEUS_RW_SERVER_URL
value: "http://prometheus-operated.monitoring.svc:9090/api/v1/write"
- name: K6_PROMETHEUS_RW_TREND_STATS
value: "p(50),p(90),p(95),p(99),max"
resources:
requests:
cpu: "500m"
memory: "256Mi"
limits:
cpu: "2"
memory: "512Mi"
# Spread pods across nodes to avoid co-location bottlenecks
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
k6_cr: ecommerce-load-test
topologyKey: kubernetes.io/hostname
Apply it:
kubectl apply -f testrun.yaml
Watch the test lifecycle:
kubectl get testrun ecommerce-load-test -w
# NAME STAGE AGE
# ecommerce-load-test created 2s
# ecommerce-load-test initialized 5s
# ecommerce-load-test running 8s
Stream logs from all pods simultaneously (requires stern):
stern -n default -l k6_cr=ecommerce-load-test
Or with plain kubectl:
kubectl logs -n default -l k6_cr=ecommerce-load-test -f --max-log-requests=20
How Segmentation Works
This is the part most people don’t read the docs on and then wonder why their test produces weird numbers.
When parallelism: 10 is set, the operator injects execution segment arguments into each pod automatically:
- Pod 0:
--execution-segment=0/10 --execution-segment-sequence=1/10 - Pod 1:
--execution-segment=1/10 --execution-segment-sequence=1/10 - …and so on.
Each pod runs a deterministic slice of the VU range. Total VUs across all pods equals the VUs you defined in options.stages. So with target: 500 and parallelism: 10, each pod handles 50 VUs. The total request rate seen by your system is the sum of all pods.
Gotcha #1: If your script has global state (shared arrays, hardcoded user IDs, etc.), each pod starts from the same state. You can end up with 10 pods all logging in as user ID 1. Use __VU and __ITER to derive unique values per virtual user.
// Correct: derive a unique user per VU index across the entire fleet
const userId = (__VU - 1) + (__ENV.K6_INSTANCE_INDEX * parseInt(__ENV.K6_VUS_PER_INSTANCE));
Output: Prometheus + Grafana
Storing test output in Prometheus and visualizing in Grafana is the standard production setup. The experimental-prometheus-rw output was stabilized in k6 v0.47 and is the recommended path.
Make sure your Prometheus has remote write receiver enabled. In a kube-prometheus-stack installation:
# values.yaml patch for kube-prometheus-stack
prometheus:
prometheusSpec:
enableRemoteWriteReceiver: true
Then import the official k6 Grafana dashboard (ID 2587) or build your own using the k6_* metric family.
For the Grafana dashboard, the most useful panels are:
k6_http_req_durationby percentile (p95, p99)k6_vusover time (confirms VU ramp shape)k6_http_req_failed_rate(your canary)k6_http_reqs_total/ wall-clock time (actual RPS)
Gotcha #2: DNS Pressure Under Load
At high concurrency, your CoreDNS pods will become a bottleneck before your application does. 10 k6 pods each with 500 VUs resolving staging.myapp.internal on every connection attempt generates serious query volume.
Fix this by resolving the target once and passing an IP, or by enabling ndots tuning on the k6 pods:
# in spec.runner
dnsConfig:
options:
- name: ndots
value: "1" # reduce search domain fallbacks
- name: single-request-reopen
Or better: ensure your target URL uses a stable ClusterIP or an ingress that your k6 pods resolve once via /etc/hosts override. You can also use k6’s resolve option per HTTP request to pin a hostname to an IP.
Gotcha #3: Node Resource Starvation
k6 is CPU-bound at high RPS. At 10,000 RPS per pod, you can easily spike a 2-core node to 100% CPU, which then causes test irregularities — VUs start sleeping longer than specified, connection timeouts creep in, and your results look like the system is slower than it is.
Always set both requests and limits on runner pods and pre-provision dedicated load-generation nodes using a node label + nodeSelector:
kubectl label node worker-3 worker-4 role=loadgen
# in spec.runner
nodeSelector:
role: loadgen
tolerations:
- key: role
operator: Equal
value: loadgen
effect: NoSchedule
Taint those nodes to keep other workloads off them during tests:
kubectl taint nodes worker-3 worker-4 role=loadgen:NoSchedule
Gotcha #4: Test Artifacts and Cleanup
By default, finished TestRun resources and their associated Job and Pod objects stick around. This is actually useful for post-test debugging, but if you run tests frequently you’ll accumulate zombie objects.
The operator respects ttlSecondsAfterFinished on the underlying Job, but you can also automate cleanup with a simple CronJob or a pipeline step:
# delete completed test runs older than 24h
kubectl get testrun -n default \
-o jsonpath='{range .items[?(@.status.stage=="finished")]}{.metadata.name}{"\n"}{end}' \
| xargs -r kubectl delete testrun -n default
Alternatively, set cleanup policy in the TestRun:
spec:
cleanup: post # operator deletes resources after the test finishes
Production-Ready: GitOps Flow
The real value here is treating load tests like code. Store your TestRun manifests in the same repository as the application, trigger them from CI on every release candidate, and fail the pipeline on threshold violations.
Here’s a GitHub Actions snippet that runs the test and waits for completion:
# .github/workflows/load-test.yml
- name: Apply TestRun
run: kubectl apply -f k6/testrun.yaml
- name: Wait for test completion
run: |
kubectl wait testrun/ecommerce-load-test \
--for=jsonpath='{.status.stage}'=finished \
--timeout=20m
- name: Check result
run: |
RESULT=$(kubectl get testrun ecommerce-load-test \
-o jsonpath='{.status.conditions[?(@.type=="TestRunFinished")].status}')
if [ "$RESULT" != "True" ]; then
echo "Load test failed or thresholds were breached"
exit 1
fi
The operator sets TestRunFinished condition to False if k6 exits with a non-zero code — which happens when your thresholds aren’t met. This gives you automatic gate behavior with no extra tooling.
Scaling to Truly Massive Load
With the architecture above, hitting 100,000+ sustained RPS is straightforward math. If each pod comfortably generates 10,000 RPS on a 2-core node, parallelism: 20 with 20 dedicated 2-core nodes gets you there.
The limiting factor at that scale is almost never k6 — it’s usually:
- Egress bandwidth — 100,000 RPS of 1 KB responses is ~800 Mbps. Know your node NICs.
- Connection tracking — kernel
nf_conntrack_maxon nodes that run iptables NAT. Check/proc/sys/net/netfilter/nf_conntrack_countduring tests. - The target’s frontend — your load balancer, ingress controller, or CDN will hit limits before your app pods do.
For the conntrack issue specifically, if you’re running on nodes with iptables-based kube-proxy:
# check current conntrack table usage
cat /proc/sys/net/netfilter/nf_conntrack_count
cat /proc/sys/net/netfilter/nf_conntrack_max
# bump the limit if needed (requires sudo)
sysctl -w net.netfilter.nf_conntrack_max=1048576
Better yet, switch to IPVS mode for kube-proxy on load-gen nodes — it scales connection tracking far better than iptables at high concurrency.
Wrapping Up
The k6 Operator is the cleanest way to do distributed load testing on infrastructure you already own. You get a declarative, versionable, GitOps-compatible test harness with virtually no operational overhead — no Locust master/worker drama, no JMeter distributed mode nightmares, no cloud credits burned on a SaaS load platform for a test that runs twice a week.
The gotchas are real but all fixable. DNS pressure, resource starvation, conntrack exhaustion — these are the things that turn a "the test passed" green light into misleading data. Address them upfront and your k6 results will actually reflect what your system does.
Start with parallelism: 3 on a small test, verify the segmentation is working correctly by summing VU counts across pods, then scale up. The operator handles the orchestration; your only job is writing good test scripts.