Distributed Load Testing at Scale: k6 Operator on Kubernetes

Load testing is one of those things every team says they do and almost no one does right. Running k6 run script.js on your laptop against staging is not a load test — it’s a prayer. Your laptop will saturate its NIC, your OS will throttle sockets, and you’ll spend three hours debugging the tool instead of the system under test.

The real answer is distributed load generation: dozens of k6 instances firing in parallel, coordinated, with centralized output. That’s exactly what the k6 Operator solves. It gives you a Kubernetes-native way to spin up arbitrarily large test fleets via a single kubectl apply.

Official repo: https://github.com/grafana/k6-operator

Why k6, Why Kubernetes

k6 is the sensible choice for modern load testing. The scripting model is TypeScript-flavored JavaScript, the binary is a single statically-linked Go executable, and the output options are genuinely good — Prometheus remote write, InfluxDB, Datadog, CloudWatch, stdout JSON. It doesn’t spin up a JVM, it doesn’t need a GUI, and it doesn’t require a license key to do anything useful.

The problem is single-node throughput. A single k6 instance can realistically push somewhere between 3,000 and 30,000 RPS depending on the test complexity and the machine. If you’re trying to simulate 200,000 concurrent users hammering your checkout service, you need a fleet.

Kubernetes is the natural orchestration layer. You already have it. The k6 Operator turns your cluster into a load generation platform with almost no extra infrastructure to manage.

Architecture in One Paragraph

The operator watches for TestRun custom resources. When you apply one, the controller creates a Job per configured instance (parallelism), each running a k6 pod that pulls the test script from a ConfigMap. All pods run the same script but receive a different segment of the virtual user workload via k6’s built-in execution segmentation. The operator waits for all pods to finish, aggregates exit codes, and updates the TestRun status. That’s it. Simple, auditable, GitOps-friendly.

Prerequisites

A working Kubernetes cluster (1.24+). k3s, EKS, GKE — all fine.
kubectl configured and pointing at it.
Helm 3.
Basic familiarity with k6 scripting. If you’ve never used k6, spend 20 minutes on the k6 quickstart first.

Installing the k6 Operator

The officially supported installation method is Helm. There’s also a bundle.yaml if you prefer raw manifests, but Helm gives you easier upgrades.

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

helm install k6-operator grafana/k6-operator \
  --namespace k6-operator \
  --create-namespace

Verify the controller is running:

kubectl get pods -n k6-operator
# NAME                                   READY   STATUS    RESTARTS   AGE
# k6-operator-controller-manager-xyz-ab  2/2     Running   0          45s

The operator installs its CRDs automatically. Confirm:

kubectl get crd | grep k6
# testruns.k6.io
# privateloadzoness.k6.io

Writing a k6 Script Worth Running

Before wiring up Kubernetes, write a script that actually tests something meaningful. Here’s a realistic e-commerce scenario — browsing a product list, adding to cart, and checking out:

// script.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Trend, Counter } from 'k6/metrics';

// Custom metrics — these will surface in any output backend
const checkoutDuration = new Trend('checkout_duration', true);
const failedRequests  = new Counter('failed_requests');

export const options = {
  // Ramp up, sustain, ramp down — classic shape
  stages: [
    { duration: '2m', target: 500 },   // ramp to 500 VUs
    { duration: '5m', target: 500 },   // hold
    { duration: '1m', target: 0   },   // drain
  ],
  thresholds: {
    http_req_failed:    ['rate<0.01'],         // <1% errors
    http_req_duration:  ['p(95)<400'],         // 95th percentile under 400ms
    checkout_duration:  ['p(99)<1000'],        // checkout 99th under 1s
  },
};

const BASE_URL = __ENV.BASE_URL || 'https://staging.myapp.internal';

export default function () {
  // Step 1 — product listing
  const listRes = http.get(`${BASE_URL}/api/products?page=1&limit=20`);
  check(listRes, { 'products 200': r => r.status === 200 });

  sleep(1);

  // Step 2 — add to cart
  const cartRes = http.post(
    `${BASE_URL}/api/cart`,
    JSON.stringify({ product_id: 42, qty: 1 }),
    { headers: { 'Content-Type': 'application/json' } }
  );
  check(cartRes, { 'cart 201': r => r.status === 201 });

  sleep(0.5);

  // Step 3 — checkout (the expensive path)
  const start = Date.now();
  const checkoutRes = http.post(
    `${BASE_URL}/api/checkout`,
    JSON.stringify({ payment_method: 'card_test' }),
    { headers: { 'Content-Type': 'application/json' } }
  );
  checkoutDuration.add(Date.now() - start);

  if (!check(checkoutRes, { 'checkout 200': r => r.status === 200 })) {
    failedRequests.add(1);
  }

  sleep(2);
}

Store this in a ConfigMap:

kubectl create configmap k6-test-script \
  --from-file=script.js=script.js \
  --namespace default

The TestRun Resource

This is where it all comes together. The TestRun CRD is your test configuration:

# testrun.yaml
apiVersion: k6.io/v1alpha1
kind: TestRun
metadata:
  name: ecommerce-load-test
  namespace: default
spec:
  parallelism: 10          # 10 k6 pods, each handles 1/10 of the VU workload
  script:
    configMap:
      name: k6-test-script
      file: script.js
  arguments: --out experimental-prometheus-rw  # push metrics to Prometheus
  runner:
    env:
      - name: BASE_URL
        value: "https://staging.myapp.internal"
      # Prometheus remote write endpoint
      - name: K6_PROMETHEUS_RW_SERVER_URL
        value: "http://prometheus-operated.monitoring.svc:9090/api/v1/write"
      - name: K6_PROMETHEUS_RW_TREND_STATS
        value: "p(50),p(90),p(95),p(99),max"
    resources:
      requests:
        cpu: "500m"
        memory: "256Mi"
      limits:
        cpu: "2"
        memory: "512Mi"
    # Spread pods across nodes to avoid co-location bottlenecks
    affinity:
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  k6_cr: ecommerce-load-test
              topologyKey: kubernetes.io/hostname

Apply it:

kubectl apply -f testrun.yaml

Watch the test lifecycle:

kubectl get testrun ecommerce-load-test -w
# NAME                    STAGE       AGE
# ecommerce-load-test     created     2s
# ecommerce-load-test     initialized 5s
# ecommerce-load-test     running     8s

Stream logs from all pods simultaneously (requires stern):

stern -n default -l k6_cr=ecommerce-load-test

Or with plain kubectl:

kubectl logs -n default -l k6_cr=ecommerce-load-test -f --max-log-requests=20

How Segmentation Works

This is the part most people don’t read the docs on and then wonder why their test produces weird numbers.

When parallelism: 10 is set, the operator injects execution segment arguments into each pod automatically:

Pod 0: --execution-segment=0/10 --execution-segment-sequence=1/10
Pod 1: --execution-segment=1/10 --execution-segment-sequence=1/10
…and so on.

Each pod runs a deterministic slice of the VU range. Total VUs across all pods equals the VUs you defined in options.stages. So with target: 500 and parallelism: 10, each pod handles 50 VUs. The total request rate seen by your system is the sum of all pods.

Gotcha #1: If your script has global state (shared arrays, hardcoded user IDs, etc.), each pod starts from the same state. You can end up with 10 pods all logging in as user ID 1. Use __VU and __ITER to derive unique values per virtual user.

// Correct: derive a unique user per VU index across the entire fleet
const userId = (__VU - 1) + (__ENV.K6_INSTANCE_INDEX * parseInt(__ENV.K6_VUS_PER_INSTANCE));

Output: Prometheus + Grafana

Storing test output in Prometheus and visualizing in Grafana is the standard production setup. The experimental-prometheus-rw output was stabilized in k6 v0.47 and is the recommended path.

Make sure your Prometheus has remote write receiver enabled. In a kube-prometheus-stack installation:

# values.yaml patch for kube-prometheus-stack
prometheus:
  prometheusSpec:
    enableRemoteWriteReceiver: true

Then import the official k6 Grafana dashboard (ID 2587) or build your own using the k6_* metric family.

For the Grafana dashboard, the most useful panels are:

k6_http_req_duration by percentile (p95, p99)
k6_vus over time (confirms VU ramp shape)
k6_http_req_failed_rate (your canary)
k6_http_reqs_total / wall-clock time (actual RPS)

Gotcha #2: DNS Pressure Under Load

At high concurrency, your CoreDNS pods will become a bottleneck before your application does. 10 k6 pods each with 500 VUs resolving staging.myapp.internal on every connection attempt generates serious query volume.

Fix this by resolving the target once and passing an IP, or by enabling ndots tuning on the k6 pods:

# in spec.runner
dnsConfig:
  options:
    - name: ndots
      value: "1"        # reduce search domain fallbacks
    - name: single-request-reopen

Or better: ensure your target URL uses a stable ClusterIP or an ingress that your k6 pods resolve once via /etc/hosts override. You can also use k6’s resolve option per HTTP request to pin a hostname to an IP.

Gotcha #3: Node Resource Starvation

k6 is CPU-bound at high RPS. At 10,000 RPS per pod, you can easily spike a 2-core node to 100% CPU, which then causes test irregularities — VUs start sleeping longer than specified, connection timeouts creep in, and your results look like the system is slower than it is.

Always set both requests and limits on runner pods and pre-provision dedicated load-generation nodes using a node label + nodeSelector:

kubectl label node worker-3 worker-4 role=loadgen

# in spec.runner
nodeSelector:
  role: loadgen
tolerations:
  - key: role
    operator: Equal
    value: loadgen
    effect: NoSchedule

Taint those nodes to keep other workloads off them during tests:

kubectl taint nodes worker-3 worker-4 role=loadgen:NoSchedule

Gotcha #4: Test Artifacts and Cleanup

By default, finished TestRun resources and their associated Job and Pod objects stick around. This is actually useful for post-test debugging, but if you run tests frequently you’ll accumulate zombie objects.

The operator respects ttlSecondsAfterFinished on the underlying Job, but you can also automate cleanup with a simple CronJob or a pipeline step:

# delete completed test runs older than 24h
kubectl get testrun -n default \
  -o jsonpath='{range .items[?(@.status.stage=="finished")]}{.metadata.name}{"\n"}{end}' \
  | xargs -r kubectl delete testrun -n default

Alternatively, set cleanup policy in the TestRun:

spec:
  cleanup: post  # operator deletes resources after the test finishes

Production-Ready: GitOps Flow

The real value here is treating load tests like code. Store your TestRun manifests in the same repository as the application, trigger them from CI on every release candidate, and fail the pipeline on threshold violations.

Here’s a GitHub Actions snippet that runs the test and waits for completion:

# .github/workflows/load-test.yml
- name: Apply TestRun
  run: kubectl apply -f k6/testrun.yaml

- name: Wait for test completion
  run: |
    kubectl wait testrun/ecommerce-load-test \
      --for=jsonpath='{.status.stage}'=finished \
      --timeout=20m

- name: Check result
  run: |
    RESULT=$(kubectl get testrun ecommerce-load-test \
      -o jsonpath='{.status.conditions[?(@.type=="TestRunFinished")].status}')
    if [ "$RESULT" != "True" ]; then
      echo "Load test failed or thresholds were breached"
      exit 1
    fi

The operator sets TestRunFinished condition to False if k6 exits with a non-zero code — which happens when your thresholds aren’t met. This gives you automatic gate behavior with no extra tooling.

Scaling to Truly Massive Load

With the architecture above, hitting 100,000+ sustained RPS is straightforward math. If each pod comfortably generates 10,000 RPS on a 2-core node, parallelism: 20 with 20 dedicated 2-core nodes gets you there.

The limiting factor at that scale is almost never k6 — it’s usually:

Egress bandwidth — 100,000 RPS of 1 KB responses is ~800 Mbps. Know your node NICs.
Connection tracking — kernel nf_conntrack_max on nodes that run iptables NAT. Check /proc/sys/net/netfilter/nf_conntrack_count during tests.
The target’s frontend — your load balancer, ingress controller, or CDN will hit limits before your app pods do.

For the conntrack issue specifically, if you’re running on nodes with iptables-based kube-proxy:

# check current conntrack table usage
cat /proc/sys/net/netfilter/nf_conntrack_count
cat /proc/sys/net/netfilter/nf_conntrack_max

# bump the limit if needed (requires sudo)
sysctl -w net.netfilter.nf_conntrack_max=1048576

Better yet, switch to IPVS mode for kube-proxy on load-gen nodes — it scales connection tracking far better than iptables at high concurrency.

Wrapping Up

The k6 Operator is the cleanest way to do distributed load testing on infrastructure you already own. You get a declarative, versionable, GitOps-compatible test harness with virtually no operational overhead — no Locust master/worker drama, no JMeter distributed mode nightmares, no cloud credits burned on a SaaS load platform for a test that runs twice a week.

The gotchas are real but all fixable. DNS pressure, resource starvation, conntrack exhaustion — these are the things that turn a "the test passed" green light into misleading data. Address them upfront and your k6 results will actually reflect what your system does.

Start with parallelism: 3 on a small test, verify the segmentation is working correctly by summing VU counts across pods, then scale up. The operator handles the orchestration; your only job is writing good test scripts.