Locust Load Testing at Scale: Patterns, Pitfalls, and Distributed Runs That Actually Work

Most load testing tools were built for a different era. JMeter ships with an XML DSL that looks like it survived the Java EE wars. Gatling needs Scala. Artillery is fine until you want to do anything non-trivial and discover it’s YAML turtles all the way down. Then someone showed you Locust, and your test file was 30 lines of Python you wrote in 15 minutes.

That feeling wears off the moment you need to push past 2000 users on a single machine, model realistic user journeys with shared state, or plug load generation into a CI pipeline that kills the build when p99 blows up. This article covers the patterns that keep working at scale — and the traps that will waste your afternoon.

Official repo: locustio/locust

Why Locust Works the Way It Does

Locust simulates users as Python coroutines via gevent — a cooperative multitasking library that monkey-patches the standard library so blocking I/O yields to other greenlets instead of sleeping. This means one OS thread can juggle thousands of "users" without spawning threads or processes. The catch is that CPU-bound work blocks the whole event loop, but HTTP testing is I/O-bound almost by definition.

Each simulated user is an instance of a class that inherits from User. The framework calls tasks on that instance, waits between them (configurable wait_time), and repeats until the test ends. Tasks are just methods decorated with @task.

That’s the whole mental model. Everything else is layered on top.

Installing and Your First Meaningful Test

pip install locust

For production usage, pin your version. Locust moves fast and occasionally breaks the API between minor versions.

# locustfile.py
from locust import HttpUser, task, between

class APIUser(HttpUser):
    # Each user waits 1-3 seconds between tasks — models human think time
    wait_time = between(1, 3)
    host = "https://api.yourapp.com"

    def on_start(self):
        """Called once per user, before tasks begin. Use for login/auth."""
        resp = self.client.post("/auth/login", json={
            "email": "[email protected]",
            "password": "testpass"
        })
        self.token = resp.json()["access_token"]
        self.client.headers.update({"Authorization": f"Bearer {self.token}"})

    @task(3)  # weight 3 — runs 3x more often than weight-1 tasks
    def browse_listings(self):
        self.client.get("/listings?page=1&limit=20")

    @task(1)
    def view_single_listing(self):
        # Avoid hardcoding IDs — pull from shared state or param pool
        listing_id = self.environment.shared_data.get("popular_listing_id", 42)
        self.client.get(f"/listings/{listing_id}")

    @task(1)
    def search(self):
        self.client.get("/search?q=apartment&city=Berlin",
                        name="/search")  # name groups requests in stats

Run it headless (no browser):

locust -f locustfile.py --headless -u 100 -r 10 --run-time 2m \
       --host https://api.yourapp.com

-u is total users, -r is spawn rate (users/second). The web UI at :8089 is great for demos; --headless is what you want in CI.

Task Patterns That Model Real Traffic

Sequential User Journeys with `TaskSet`

Random task selection works for simple APIs. E-commerce, onboarding flows, multi-step wizards — these need sequential steps where state flows from one request to the next.

from locust import HttpUser, TaskSet, task, between, SequentialTaskSet

class CheckoutFlow(SequentialTaskSet):
    """SequentialTaskSet executes tasks top-to-bottom in definition order."""

    @task
    def add_to_cart(self):
        resp = self.client.post("/cart/items", json={"product_id": 101, "qty": 1})
        self.cart_id = resp.json()["cart_id"]

    @task
    def view_cart(self):
        self.client.get(f"/cart/{self.cart_id}")

    @task
    def apply_coupon(self):
        self.client.post(f"/cart/{self.cart_id}/coupon",
                         json={"code": "TEST10"})

    @task
    def checkout(self):
        self.client.post(f"/cart/{self.cart_id}/checkout",
                         json={"payment_method": "test_card"})
        self.interrupt()  # exits the TaskSet, returns control to parent User

class ShopUser(HttpUser):
    wait_time = between(1, 2)
    tasks = [CheckoutFlow]

self.interrupt() at the end of a flow is easy to forget. Without it the TaskSet loops forever instead of finishing the journey and letting the user idle.

Parameterized Requests from a Data Pool

Hammering the same endpoint with identical parameters isn’t realistic. Use shared pools or generate synthetic data:

import itertools
from locust import HttpUser, task, between, events

SEARCH_TERMS = ["laptop", "phone", "headphones", "tablet", "monitor"]
USER_IDS = list(range(1000, 2000))

# Round-robin iterator is thread-safe enough for this use case
term_cycle = itertools.cycle(SEARCH_TERMS)
user_id_cycle = itertools.cycle(USER_IDS)

class SearchUser(HttpUser):
    wait_time = between(0.5, 1.5)

    @task
    def search(self):
        term = next(term_cycle)
        self.client.get(f"/search?q={term}", name="/search?q=[term]")

    @task
    def profile(self):
        uid = next(user_id_cycle)
        with self.client.get(f"/users/{uid}", name="/users/[id]",
                              catch_response=True) as resp:
            if resp.status_code == 404:
                # Known data gap — don't pollute error stats
                resp.success()

The name= parameter on requests is non-optional at scale. Without it, /users/1, /users/2, /users/3 become three separate entries in your stats table instead of one aggregated /users/[id] line.

Custom Load Shapes

The default behavior ramps users up linearly and holds. Real production traffic looks nothing like that. Locust’s LoadTestShape class lets you define arbitrary load curves:

from locust import LoadTestShape

class SpikeShape(LoadTestShape):
    """
    Baseline → spike → back to baseline → ramp down.
    Good for testing autoscale behavior and cache warming.
    """

    stages = [
        {"duration": 60,  "users": 50,  "spawn_rate": 5},   # warm up
        {"duration": 90,  "users": 50,  "spawn_rate": 5},   # steady baseline
        {"duration": 120, "users": 500, "spawn_rate": 100},  # spike
        {"duration": 180, "users": 500, "spawn_rate": 100},  # hold spike
        {"duration": 210, "users": 50,  "spawn_rate": 50},   # recover
        {"duration": 270, "users": 50,  "spawn_rate": 5},    # steady again
        {"duration": 300, "users": 0,   "spawn_rate": 50},   # ramp down
    ]

    def tick(self):
        run_time = self.get_run_time()
        for stage in self.stages:
            if run_time < stage["duration"]:
                return stage["users"], stage["spawn_rate"]
        return None  # None stops the test

Drop this class into the same locustfile — the framework auto-detects it. No CLI flags needed.

Going Distributed: Master and Workers

A single Locust process tops out around 500-1000 users depending on your hardware and the target’s latency. Beyond that you need distributed mode: one master process aggregates stats and coordinates, multiple worker processes do the actual HTTP work.

Docker Compose Setup

# docker-compose.yml
version: "3.9"

x-locust-common: &locust-common
  image: locustio/locust:2.33.0
  volumes:
    - ./locustfiles:/mnt/locust
  networks:
    - locust-net

services:
  master:
    <<: *locust-common
    ports:
      - "8089:8089"   # web UI
    command: >
      -f /mnt/locust/locustfile.py
      --master
      --expect-workers 4
      --host https://api.yourapp.com
    environment:
      - LOCUST_USERS=2000
      - LOCUST_SPAWN_RATE=50

  worker:
    <<: *locust-common
    command: >
      -f /mnt/locust/locustfile.py
      --worker
      --master-host master
    deploy:
      replicas: 4   # docker compose up --scale worker=8 overrides this

networks:
  locust-net:
    driver: bridge

docker compose up --scale worker=8

Each worker runs its own set of greenlets. The master aggregates request stats from all workers in real time. The web UI only lives on the master.

--expect-workers tells the master to wait before starting. Without it, the master starts distributing users immediately and early workers get an uneven share of the load.

Kubernetes for Serious Scale

For cloud-native test runs where you need 50+ workers on demand:

# locust-master.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: locust-master
spec:
  replicas: 1
  selector:
    matchLabels:
      app: locust-master
  template:
    metadata:
      labels:
        app: locust-master
    spec:
      containers:
        - name: locust
          image: locustio/locust:2.33.0
          args:
            - -f
            - /mnt/locust/locustfile.py
            - --master
            - --expect-workers
            - "20"
          ports:
            - containerPort: 8089  # UI
            - containerPort: 5557  # worker comms
          volumeMounts:
            - name: locustfile
              mountPath: /mnt/locust
      volumes:
        - name: locustfile
          configMap:
            name: locustfile
---
apiVersion: v1
kind: Service
metadata:
  name: locust-master
spec:
  selector:
    app: locust-master
  ports:
    - name: web
      port: 8089
    - name: worker
      port: 5557
  type: LoadBalancer
---
# locust-worker.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: locust-worker
spec:
  replicas: 20
  selector:
    matchLabels:
      app: locust-worker
  template:
    metadata:
      labels:
        app: locust-worker
    spec:
      containers:
        - name: locust
          image: locustio/locust:2.33.0
          args:
            - -f
            - /mnt/locust/locustfile.py
            - --worker
            - --master-host
            - locust-master
          volumeMounts:
            - name: locustfile
              mountPath: /mnt/locust
      volumes:
        - name: locustfile
          configMap:
            name: locustfile

Scale workers on the fly: kubectl scale deployment locust-worker --replicas=50.

CI Integration and Automated Pass/Fail

A load test that doesn’t break the build is a decoration. Wire it up properly:

# locustfile.py — add an event listener for threshold checking
from locust import events

@events.quitting.add_listener
def assert_thresholds(environment, **kwargs):
    stats = environment.stats.total

    failures = [
        ("Failure rate > 1%",
         stats.fail_ratio > 0.01),
        ("p95 latency > 500ms",
         stats.get_response_time_percentile(0.95) > 500),
        ("p99 latency > 1000ms",
         stats.get_response_time_percentile(0.99) > 1000),
        ("Avg RPS < 100",
         stats.current_rps < 100 and environment.runner.user_count > 50),
    ]

    failed = [(name, _) for name, cond in failures if cond]
    if failed:
        for name, _ in failed:
            print(f"[THRESHOLD FAILED] {name}")
        environment.process_exit_code = 1  # non-zero exit kills CI

In your CI pipeline (GitHub Actions example):

- name: Run load test
  run: |
    locust -f locustfile.py \
      --headless \
      -u 200 -r 20 \
      --run-time 3m \
      --host ${{ secrets.STAGING_HOST }} \
      --html report.html \
      --csv results

- name: Upload report
  uses: actions/upload-artifact@v4
  if: always()
  with:
    name: locust-report
    path: |
      report.html
      results*.csv

--html generates a self-contained HTML report. --csv produces raw data you can diff between runs.

FastHttpUser: When You Need Raw Throughput

HttpUser uses the requests library under gevent monkey-patching. It’s compatible with everything but has overhead. FastHttpUser uses a custom gevent-native HTTP client:

from locust import task, between
from locust.contrib.fasthttp import FastHttpUser

class HighThroughputUser(FastHttpUser):
    wait_time = between(0, 0.1)  # minimal wait for throughput testing

    @task
    def ping(self):
        with self.client.get("/health", catch_response=True) as resp:
            if resp.status_code != 200:
                resp.failure(f"Unexpected status: {resp.status_code}")

Benchmark: on a modern laptop, FastHttpUser typically achieves 2-3x the RPS of HttpUser per worker process for simple requests. Use it when you’re trying to saturate the target, not when you need cookie jars, session replay, or complex auth flows.

Non-HTTP Protocols

Locust isn’t HTTP-only. The User base class is protocol-agnostic — you just need to report timings manually:

import time
import psycopg2
from locust import User, task, between, events

class PostgresUser(User):
    wait_time = between(0.5, 2)
    abstract = True  # prevents Locust from instantiating this directly

    def on_start(self):
        self.conn = psycopg2.connect(
            host="localhost", dbname="testdb",
            user="tester", password="testpass"
        )
        self.cursor = self.conn.cursor()

    def on_stop(self):
        self.conn.close()

    def query(self, sql, name=None):
        start = time.perf_counter()
        try:
            self.cursor.execute(sql)
            self.conn.commit()
            elapsed = (time.perf_counter() - start) * 1000
            events.request.fire(
                request_type="PSQL",
                name=name or sql[:50],
                response_time=elapsed,
                response_length=0,
                exception=None,
                context=self.context(),
            )
        except Exception as e:
            elapsed = (time.perf_counter() - start) * 1000
            events.request.fire(
                request_type="PSQL",
                name=name or sql[:50],
                response_time=elapsed,
                response_length=0,
                exception=e,
                context=self.context(),
            )
            raise

class DBLoadUser(PostgresUser):
    @task
    def heavy_read(self):
        self.query(
            "SELECT * FROM orders WHERE status='pending' ORDER BY created_at LIMIT 100",
            name="pending-orders"
        )

This pattern works for Redis, gRPC, WebSockets, AMQP — anything you can drive from Python.

Gotchas

CPU-bound tasks kill your throughput. JSON parsing, crypto operations, image manipulation in task code blocks the gevent event loop. Offload heavy processing to separate threads or just don’t do it in the hot path. If you must, use gevent.spawn_later or the concurrent.futures executor.

The master doesn’t run users. In distributed mode, the master process handles coordination and the web UI only. If you configure --expect-workers 4 and only 3 connect, the test never starts. Set a reasonable --expect-workers-max-wait or script your worker startup.

Shared mutable state between users. Python’s GIL doesn’t save you here — gevent’s cooperative scheduling means a yield point (any I/O call) can let another greenlet modify shared data structures. Use gevent.lock.RLock or limit sharing to immutable objects and queues.

wait_time = constant(0) is a trap. Zero wait time hammers the event loop so hard that gevent itself becomes the bottleneck before the target does. Use between(0, 0.01) minimum, or use FastHttpUser with the understanding that you’re doing a throughput ceiling test, not a concurrency test.

Workers and the locustfile must be identical. In Docker/Kubernetes setups, rebuilding the image without bumping replicas means old workers with old code run alongside the master. Pin image tags and roll workers as a deployment, not ad-hoc.

--csv stats reset on test restart. If you run multiple test runs in one Locust session (via the UI), the CSV output only captures the last run. For multi-run comparison, restart the whole process each time.

Percentile reporting in distributed mode. Locust approximates percentiles by bucketing response times across workers. The default bucket resolution is 100ms — fine for most cases, but if you’re chasing p99 accuracy on sub-50ms services, you’ll see artifacts. The response_time_histogram config lets you adjust this, though it costs memory.

Production-Ready Patterns

Separate test scenarios into multiple User classes. Locust supports multiple User classes in one file and distributes the total user count among them by weight. This models traffic mix accurately:

class BrowseUser(HttpUser):
    weight = 70  # 70% of users
    ...

class CheckoutUser(HttpUser):
    weight = 20  # 20% of users
    ...

class AdminUser(HttpUser):
    weight = 10  # 10% of users
    ...

Store your locustfiles in the same repo as the service. Load tests that drift from the API contract are worse than no tests. Treat them like integration tests — same repo, same CI, same version.

Use environment variables for targets. Never hardcode staging/production hostnames. --host on the CLI or LOCUST_HOST env var keeps configs clean and prevents accidentally hammering prod from a CI run.

Baseline before every deploy. Run a 5-minute 200-user test against staging pre- and post-deploy. Diff the CSV outputs. Regressions in p95/p99 that don’t show in unit tests are real bugs.

Don’t ignore the green line. The Locust UI shows both RPS and failure rate over time. A test that shows constant RPS but climbing failure rate means your target is degrading under load — it’s rejecting requests, not slowing down. These look fine in average latency charts.

Closing Thoughts

Locust earns its place because the gap between "I know Python" and "I have a useful load test" is genuinely small. The distributed mode is solid, the extensibility is real, and the LoadTestShape class finally killed the excuse that load testers couldn’t model realistic traffic curves.

The ceiling is CPU on the workers — you’re buying greenlets, not threads, so plan your worker count accordingly. A rule of thumb: one worker core per 1000 sustained users, adjusted down if your tasks have any computational weight. For most teams running on Kubernetes, spinning up 10-20 worker pods is trivially cheap compared to the cost of discovering capacity limits in production.

Get your CI thresholds in place early. A load test that always passes is either wrong or testing nothing.