Stop Over-Engineering Your Messaging: NATS JetStream Is the Kafka You Can Actually Run

Kafka is a masterpiece of engineering. It’s also 800MB of JVM, requires ZooKeeper (or KRaft, which has its own learning curve), needs a minimum of three brokers for anything resembling production, and will happily consume 4GB of RAM on a quiet weekend. For a startup, a homelab, or a microservices project that processes a few thousand events per second, that’s brutal overkill.

NATS JetStream exists in a completely different weight class. The entire server binary is under 20MB. A single node boots in milliseconds. It handles millions of messages per second on commodity hardware. And it ships with persistent streams, durable consumers, a distributed KV store, and object storage — all baked in, no plugins, no external dependencies.

This article is a practical guide. We’ll stand up NATS JetStream, build real streams and consumers, poke at the KV API, and cover the operational side: monitoring, clustering, and the gotchas that bite you in production.

The official repo: https://github.com/nats-io/nats-server. Client libraries exist for Go, Python, Rust, Java, .NET, Node.js, and more.


The Core Mental Model

Before touching a command line, one conceptual shift: NATS core is fire-and-forget pub/sub. A message published to a subject is delivered only to subscribers alive at that exact moment. Gone is gone.

JetStream layers persistence on top. A Stream captures messages published to matching subjects and stores them on disk. A Consumer is a cursor into that stream — it tracks where a particular subscriber (or group) is up to, handles redelivery on failure, and manages acknowledgements. This maps roughly to Kafka’s topics and consumer groups, but the model is cleaner and the configuration surface is smaller.

One important difference: in Kafka, a topic’s partition count is immutable after creation and dictates parallelism. In NATS, you can have multiple consumers on the same stream with different filter subjects, delivery policies, and ack modes. Parallelism comes from queue groups on the consumer side, not from the stream configuration.


Running NATS JetStream

The simplest possible deployment: one Docker container, JetStream enabled, data persisted to a volume.

# docker-compose.yml
services:
  nats:
    image: nats:2.10-alpine
    container_name: nats
    restart: unless-stopped
    ports:
      - "4222:4222"   # client connections
      - "8222:8222"   # HTTP monitoring endpoint
      - "6222:6222"   # cluster route port (needed for multi-node later)
    volumes:
      - ./nats-data:/data
      - ./nats.conf:/etc/nats/nats.conf:ro
    command: ["-c", "/etc/nats/nats.conf"]
# nats.conf
server_name: nats-1

jetstream {
  store_dir: /data
  max_memory_store: 1GB
  max_file_store: 10GB
}

http_port: 8222

# Optional: require auth
# accounts {
#   $SYS { users: [{user: "sys", password: "changeme"}] }
# }
docker compose up -d

Hit https://cd-linux.club:8222/healthz — you should get {"status":"ok"}. Hit https://cd-linux.club:8222/jsz to see JetStream stats.

Install the nats CLI for everything that follows:

# Linux/macOS via script, or grab the binary from GitHub releases
curl -sf https://binaries.nats.dev/nats-io/natscli | sh
# or on macOS
brew install nats-io/nats-tools/nats

Verify the connection:

nats server info --server nats://localhost:4222

Streams: Creating and Configuring

Create a stream that captures all messages on the orders.> subject hierarchy:

nats stream add ORDERS \
  --subjects "orders.>" \
  --storage file \
  --retention limits \
  --max-msgs 1000000 \
  --max-bytes 5GB \
  --max-age 7d \
  --replicas 1 \
  --discard old \
  --dupe-window 2m

A few flags worth understanding:

  • --retention limits — messages are kept until size/age/count limits hit. Other options: interest (keep until all consumers have acked) and workqueue (delete on ack).
  • --discard old — when the stream is full, drop the oldest messages. Use new if you want to reject new publishes instead.
  • --dupe-window 2m — JetStream tracks message IDs within this window. If you publish with a Nats-Msg-Id header and the same ID arrives twice in this window, the duplicate is silently dropped. This is how you get exactly-once publish semantics.

Inspect the stream:

nats stream info ORDERS
nats stream ls

Publish some test messages:

nats pub orders.created '{"id":"ord-001","amount":49.99}'
nats pub orders.shipped '{"id":"ord-001","carrier":"DHL"}'

View messages stored in the stream:

nats stream view ORDERS

Consumers: Push vs Pull

NATS has two consumer delivery modes and the choice matters.

Push consumers have the server deliver messages to a subscribed subject. Great for low-latency, fire-and-forget pipelines where you trust your subscribers to keep up.

Pull consumers require subscribers to explicitly request batches. This is almost always what you want in production — it gives you flow control, lets you size batches to your processing capacity, and means a slow or dead consumer doesn’t create a buffering problem on the server.

Creating a Pull Consumer

nats consumer add ORDERS order-processor \
  --pull \
  --deliver all \
  --ack explicit \
  --max-deliver 5 \
  --ack-wait 30s \
  --filter "orders.>"
  • --deliver all — start from the beginning of the stream. Other policies: new, last, last-per-subject, or by-start-time.
  • --ack explicit — every message must be individually acknowledged. The alternative none is for monitoring/auditing consumers that don’t need delivery guarantees.
  • --max-deliver 5 — if a message isn’t acked after 5 attempts, it lands in the stream’s dead-letter equivalent (you handle this via advisory subjects).
  • --ack-wait 30s — if the consumer doesn’t ack within 30 seconds, the message is requeued.

Pull messages manually to verify:

nats consumer next ORDERS order-processor --count 5

Queue Consumers (Parallel Processing)

For horizontal scaling, multiple instances subscribe to the same consumer using a queue group. Each message goes to exactly one subscriber:

# Create the consumer with a deliver group
nats consumer add ORDERS order-processor-queue \
  --pull \
  --deliver all \
  --ack explicit \
  --max-deliver 5 \
  --filter "orders.>"

# Multiple workers compete for messages from the same consumer
# In Go:
# js.Subscribe("orders.>", handler, nats.Bind("ORDERS", "order-processor-queue"), nats.Queue("workers"))

This is the Kafka consumer group equivalent. The difference: you don’t need to think about partition count when you create the stream.


Working with JetStream in Go

Here’s a realistic producer/consumer pattern. Not toy code — something you’d actually use.

package main

import (
    "context"
    "encoding/json"
    "fmt"
    "log"
    "time"

    "github.com/nats-io/nats.go"
    "github.com/nats-io/nats.go/jetstream"
)

type Order struct {
    ID     string  `json:"id"`
    Amount float64 `json:"amount"`
}

func main() {
    nc, err := nats.Connect("nats://localhost:4222",
        nats.RetryOnFailedConnect(true),
        nats.MaxReconnects(-1),
        nats.ReconnectWait(2*time.Second),
    )
    if err != nil {
        log.Fatal(err)
    }
    defer nc.Drain()

    js, err := jetstream.New(nc)
    if err != nil {
        log.Fatal(err)
    }

    ctx := context.Background()

    // Ensure the stream exists (idempotent)
    _, err = js.CreateOrUpdateStream(ctx, jetstream.StreamConfig{
        Name:       "ORDERS",
        Subjects:   []string{"orders.>"},
        Storage:    jetstream.FileStorage,
        Retention:  jetstream.LimitsPolicy,
        MaxAge:     7 * 24 * time.Hour,
        MaxBytes:   5 * 1024 * 1024 * 1024,
        Duplicates: 2 * time.Minute,
    })
    if err != nil {
        log.Fatal(err)
    }

    // Publish with deduplication ID
    order := Order{ID: "ord-002", Amount: 129.99}
    data, _ := json.Marshal(order)

    ack, err := js.Publish(ctx, "orders.created", data,
        jetstream.WithMsgID("ord-002-created"), // dedup key
    )
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("Published: seq=%d duplicate=%v\n", ack.Sequence, ack.Duplicate)

    // Pull consumer
    cons, err := js.CreateOrUpdateConsumer(ctx, "ORDERS", jetstream.ConsumerConfig{
        Name:          "order-processor",
        FilterSubject: "orders.>",
        AckPolicy:     jetstream.AckExplicitPolicy,
        AckWait:       30 * time.Second,
        MaxDeliver:    5,
    })
    if err != nil {
        log.Fatal(err)
    }

    // Fetch a batch
    msgs, err := cons.Fetch(10, jetstream.FetchMaxWait(5*time.Second))
    if err != nil {
        log.Fatal(err)
    }

    for msg := range msgs.Messages() {
        var o Order
        if err := json.Unmarshal(msg.Data(), &o); err != nil {
            msg.Nak() // requeue
            continue
        }
        fmt.Printf("Processing order %s for $%.2f\n", o.ID, o.Amount)
        msg.Ack()
    }
}

The jetstream.New() API (v2 of the Go client) is significantly cleaner than the older nats.JetStreamContext(). Use it for any new project.


KV Store: Redis Without the Redis

JetStream’s Key-Value store is built on streams under the hood — each bucket is a stream with last-per-subject retention. You get versioned values, watches, TTLs, and compare-and-swap operations. It’s a legit Redis replacement for config distribution, feature flags, service discovery state, and leader election.

# Create a KV bucket
nats kv add config --ttl 1h --replicas 1

# CRUD
nats kv put config database.host "pg.internal"
nats kv put config database.port "5432"
nats kv get config database.host
nats kv del config database.host

# Watch for changes (streams in real time)
nats kv watch config

In Go:

kv, err := js.CreateOrUpdateKeyValue(ctx, jetstream.KeyValueConfig{
    Bucket: "config",
    TTL:    time.Hour,
})
if err != nil {
    log.Fatal(err)
}

// Put
_, err = kv.Put(ctx, "database.host", []byte("pg.internal"))

// Get
entry, err := kv.Get(ctx, "database.host")
fmt.Println(string(entry.Value()), entry.Revision())

// Compare-and-swap (optimistic locking)
_, err = kv.Update(ctx, "database.host", []byte("pg2.internal"), entry.Revision())

// Watch a key
watcher, _ := kv.Watch(ctx, "database.*")
for update := range watcher.Updates() {
    if update == nil {
        break // initial values delivered
    }
    fmt.Printf("key=%s value=%s op=%v\n", update.Key(), update.Value(), update.Operation())
}

The watcher delivers the current state on startup, then streams changes. This is exactly what you’d use to build live config reload in a service.


Gotchas

Gotcha #1: The AckWait trap. If your consumer’s ack-wait is shorter than your processing time, messages get redelivered while they’re still being processed. You end up with duplicate processing even with explicit acks. Either set ack-wait generously, or call msg.InProgress() periodically to reset the timer during long-running work.

Gotcha #2: Dupe window vs storage limits. The deduplication window only works while the message ID is within the window AND the original message is still in the stream. If your stream fills up and old messages are dropped, the dupe tracking goes with them. Size your stream and dupe window together.

Gotcha #3: Stream subjects are global. A subject can only be captured by one stream. If you try to add a second stream with overlapping subjects, NATS will refuse. Plan your subject hierarchy before you start creating streams.

Gotcha #4: Pull consumers and idle heartbeats. If you’re fetching in a loop and the stream is idle, Fetch() will block until FetchMaxWait expires. This is fine, but set a reasonable max wait and handle the timeout gracefully — don’t treat an empty fetch as an error.

Gotcha #5: The nats CLI consumer next is sequential, not parallel. It’s great for debugging. For load testing, use the nats bench command or write a proper consumer loop.


Production Setup: Clustering

A single NATS node is a single point of failure. For production, you want at least a three-node cluster. Add more nodes to your Compose stack or separate VMs:

# nats-2.conf
server_name: nats-2

jetstream {
  store_dir: /data
}

cluster {
  name: production
  listen: 0.0.0.0:6222
  routes: [
    "nats-route://nats-1:6222"
    "nats-route://nats-3:6222"
  ]
}

http_port: 8222

With three nodes, set --replicas 3 on your streams and consumers. JetStream uses the Raft protocol internally for leader election and replication. Losing one node is survivable; losing two out of three is not.

Check cluster health:

nats server report jetstream
nats server check jetstream --expected 3

The nats server check command is designed for Nagios/Prometheus-style monitoring integrations. Use it in your healthcheck scripts.


Monitoring

The HTTP monitoring endpoint at :8222 exposes JSON for everything:

  • /varz — server vitals
  • /jsz — JetStream stats (streams, consumers, bytes)
  • /jsz?accounts=true — per-account stats
  • /healthz — liveness probe

For Prometheus, the nats-exporter scrapes these endpoints and exposes metrics. The Docker image is natsio/prometheus-nats-exporter. A Grafana dashboard (ID 2279) covers the basics out of the box.

Key metrics to alert on:

  • gnatsd_varz_slow_consumer_seconds — consumers falling behind
  • gnatsd_jsz_streams — stream count drift (unexpected streams appearing)
  • Consumer num_pending growing without bound — a consumer is stuck

NATS vs Kafka: Where Each Wins

NATS JetStream is the right call when: you control the full stack, you don’t need Kafka’s ecosystem (Kafka Connect, ksqlDB, Debezium), your throughput fits in single-node or small cluster territory, and you value operational simplicity over feature breadth.

Kafka wins when: you need multi-datacenter replication with MirrorMaker, you have a large existing Kafka ecosystem investment, you need very long retention (weeks/months at petabyte scale), or you’re doing stream processing with Kafka Streams or Flink.

For most projects — especially anything self-hosted or at startup scale — NATS JetStream removes an enormous amount of operational burden without sacrificing the messaging guarantees you actually need.


Quick Reference

# Stream management
nats stream ls
nats stream info ORDERS
nats stream purge ORDERS          # delete all messages, keep stream
nats stream rm ORDERS             # delete stream and all data

# Consumer management
nats consumer ls ORDERS
nats consumer info ORDERS order-processor
nats consumer rm ORDERS order-processor

# Message inspection
nats stream view ORDERS           # browse messages
nats stream get ORDERS 42         # get by sequence number

# KV
nats kv ls
nats kv info config
nats kv history config database.host   # all revisions for a key

# Server
nats server info
nats server report jetstream
nats server check jetstream --expected 1

NATS JetStream is the kind of infrastructure that makes you wonder why you were running Kafka in the first place. It’s not a toy — Synadia (the commercial backer) runs it at serious scale, and the protocol is used in some genuinely demanding financial and IoT environments. But the entry cost is near zero, the operational overhead is minimal, and for most workloads it simply works without the ceremony.

Stand it up, create a stream, and pull some messages. You’ll have a feel for it in twenty minutes.

Leave a comment

👁 Views: 2,285 · Unique visitors: 1,642