Stop Letting PostgreSQL Bloat Kill Your Performance: Autovacuum Tuning for High-Write Workloads

Your PostgreSQL queries were fast six months ago. Now they crawl. You’ve added indexes, bumped work_mem, checked your query plans — and still nothing clicks. If this sounds familiar, there’s a decent chance the culprit is sitting right there in pg_stat_user_tables: millions of dead tuples that autovacuum never got around to cleaning up.

This is the article I wish existed when I first ran a write-heavy Postgres workload in production. We’re going deep: MVCC internals, how autovacuum actually decides when to wake up, what knobs matter and why, and how to set sane per-table overrides without turning your postgresql.conf into a graveyard of commented-out guesses.


Why VACUUM Exists at All

PostgreSQL uses MVCC — Multi-Version Concurrency Control. When you UPDATE a row, Postgres doesn’t overwrite it in place. It writes a new version of the row and marks the old one as dead. Same for DELETE — the row stays physically on disk, just flagged invisible to new transactions.

This is what makes Postgres’s concurrency model elegant: readers never block writers. But there’s a cost. Dead row versions ("dead tuples") accumulate. If nothing cleans them up, three bad things happen:

  1. Table bloat. Your 2 GB table is physically 14 GB on disk because 85% of it is dead rows. Every sequential scan reads all that garbage.
  2. Index bloat. Indexes point to dead heap rows. They grow fat and slow.
  3. Transaction ID wraparound. This one can take your database offline entirely. More on that shortly.

VACUUM is the garbage collector. It marks dead tuples as reusable space (but doesn’t return it to the OS unless you run VACUUM FULL, which locks the table). autovacuum is the background daemon that runs VACUUM automatically.

The default autovacuum configuration ships tuned for a modest database on modest hardware. For anything write-heavy, you need to retune it from scratch.


Diagnosing the Problem First

Before touching any config, confirm you actually have a bloat problem. Blind tuning is how you end up with autovacuum hammering your disk for no reason.

Check dead tuple accumulation:

SELECT
    schemaname,
    relname AS table_name,
    n_dead_tup,
    n_live_tup,
    ROUND(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 2) AS dead_pct,
    last_autovacuum,
    last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 20;

If dead_pct is consistently above 10-20% on busy tables, autovacuum is losing the race.

Check table and index bloat:

SELECT
    current_database() AS db,
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS total_size,
    pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 20;

Cross-reference this with your actual row counts. A table with 500k rows that’s 8 GB is screaming for help.

Check if autovacuum is running at all:

SELECT pid, query, state, wait_event_type, wait_event
FROM pg_stat_activity
WHERE query LIKE 'autovacuum%';

If you never see autovacuum workers here during write-heavy periods, something is throttling them out of existence — or they finish too fast to catch.


How Autovacuum Decides When to Act

Autovacuum triggers a vacuum on a table when:

dead_tuples > autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * table_row_count

The defaults:

  • autovacuum_vacuum_threshold = 50 (rows)
  • autovacuum_vacuum_scale_factor = 0.2 (20% of table)

So for a table with 10 million rows, autovacuum won’t kick in until there are 2,000,050 dead tuples. On a high-write workload, that’s a lot of garbage to let pile up.

Same math applies to ANALYZE triggers:

modified_rows > autovacuum_analyze_threshold + autovacuum_analyze_scale_factor * table_row_count

Defaults: threshold=50, scale_factor=0.1 (10%). Also too conservative for large tables.

The scale factor model made sense when "large" meant tens of thousands of rows. At millions of rows, percentage-based triggers mean you’re always playing catch-up.


The Core Tuning Parameters

Global settings in postgresql.conf

# --- Autovacuum workers ---
# Default is 3. On busy multi-table workloads, this is a bottleneck.
# Each worker handles one table at a time. Raise this if you see a vacuum queue.
autovacuum_max_workers = 6

# --- Trigger thresholds ---
# For large tables, scale_factor of 0.2 is absurd. Drop it to 0.01-0.02.
autovacuum_vacuum_scale_factor = 0.02
autovacuum_vacuum_threshold = 100

autovacuum_analyze_scale_factor = 0.01
autovacuum_analyze_threshold = 50

# --- Vacuum cost delay (throttling) ---
# Autovacuum is deliberately throttled to avoid hammering I/O.
# The default cost_delay=2ms with cost_limit=200 is way too conservative
# for modern SSDs. SSDs can handle much more.
autovacuum_vacuum_cost_delay = 2ms        # default; lower = less throttling
autovacuum_vacuum_cost_limit = 400        # default 200; raise for faster SSD storage

# --- Wraparound prevention ---
# When XID age hits this, autovacuum runs regardless of dead tuple count.
# Default 200M. Keep it. Don't lower it without good reason.
autovacuum_freeze_max_age = 200000000

The cost-based throttling deserves special attention. Every vacuum operation has a cost: reading a page costs vacuum_cost_page_hit or vacuum_cost_page_miss, modifying it costs vacuum_cost_page_dirty. When the running total hits autovacuum_vacuum_cost_limit, the worker sleeps for autovacuum_vacuum_cost_delay milliseconds before continuing.

On spinning disks, this throttling protects you. On NVMe, you’re leaving a lot of vacuum throughput on the table. A cost_limit of 800 and cost_delay of 1ms is a reasonable starting point for fast storage.

Worker count gotcha

More workers sounds better. But each autovacuum worker holds locks and does I/O. Throwing 10 workers at a system with 4 CPUs and a single HDD will make everything worse. Scale autovacuum_max_workers proportionally to your I/O capacity, not just your CPU count.


Per-Table Overrides: The Right Way to Do It

Global defaults are just that — defaults. Your events table with 50 million rows and 50k inserts per minute needs different treatment than your config table with 200 rows that changes twice a week.

PostgreSQL lets you set autovacuum parameters per table via storage parameters:

-- Aggressive autovacuum for a high-write events table
ALTER TABLE events SET (
    autovacuum_vacuum_scale_factor = 0.005,    -- trigger at 0.5% dead tuples
    autovacuum_vacuum_threshold = 500,
    autovacuum_analyze_scale_factor = 0.002,
    autovacuum_analyze_threshold = 100,
    autovacuum_vacuum_cost_delay = 2,          -- ms; override global throttle
    autovacuum_vacuum_cost_limit = 800
);

-- Near-static lookup table — don't waste vacuum cycles on it
ALTER TABLE country_codes SET (
    autovacuum_vacuum_scale_factor = 0.5,
    autovacuum_analyze_scale_factor = 0.3
);

This is the correct approach for production. Set aggressive per-table overrides on your hot tables. Don’t lower global defaults so far that autovacuum thrashes everything — including tables that don’t need it.


Transaction ID Wraparound: The One That Bites

Postgres transaction IDs are 32-bit integers. After about 2.1 billion transactions, they wrap around. When that happens, every old row looks like it’s from the "future" and becomes invisible — effectively corrupting your database. Postgres will refuse to accept new transactions before it reaches that state, which is its way of protecting you.

Autovacuum’s secondary job is "freezing" old row versions — replacing their XID with a special frozen marker that’s always considered in the past. This is controlled by:

autovacuum_freeze_max_age = 200000000  -- 200M transactions
vacuum_freeze_min_age = 50000000       -- don't freeze rows younger than 50M txns
vacuum_freeze_table_age = 150000000    -- force full-table freeze scan at 150M

Monitor your oldest XIDs:

SELECT
    datname,
    age(datfrozenxid) AS xid_age,
    pg_size_pretty(pg_database_size(datname)) AS db_size
FROM pg_database
ORDER BY xid_age DESC;

If xid_age is above 1.5 billion on any database, you’re in danger territory. If autovacuum can’t keep up because it’s being throttled or there aren’t enough workers, Postgres will eventually start logging warnings and then hard-refuse connections. This is a production outage, not a performance blip.

Gotcha: VACUUM FULL does not help with wraparound. It’s VACUUM (plain or auto) that freezes tuples. Running VACUUM FULL on a bloated table to reclaim space is fine, but don’t mistake it for fixing wraparound risk.


Monitoring Autovacuum Activity

You shouldn’t tune and forget. Set up ongoing visibility.

Tables with the most accumulated dead tuples right now:

SELECT
    relname,
    n_dead_tup,
    n_live_tup,
    last_autovacuum,
    last_autoanalyze,
    autovacuum_count
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;

Currently running autovacuum workers:

SELECT
    pid,
    now() - pg_stat_activity.query_start AS duration,
    query,
    state
FROM pg_stat_activity
WHERE query LIKE 'autovacuum:%';

Tables that autovacuum can’t reach because of long-running transactions:

SELECT
    pid,
    usename,
    now() - xact_start AS txn_age,
    state,
    left(query, 100) AS query_snippet
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY xact_start ASC
LIMIT 10;

This last one is critical. A single long-running transaction blocks vacuum from reclaiming dead tuples that predate it. A stuck application transaction from 6 hours ago can cause catastrophic bloat. Your monitoring should alert on transactions older than, say, 1 hour.


Gotchas

Gotcha: Autovacuum won’t run on tables with active long transactions.
If your application has transactions that run for hours (batch jobs, analytics queries on OLTP tables), dead tuples from before those transactions started cannot be vacuumed. The fix is architectural: isolate long-running reads to a replica, or use explicit transaction boundaries with shorter windows.

Gotcha: VACUUM FULL locks the table.
When someone says "I ran VACUUM FULL and now it’s worse," they usually mean they did it during business hours and locked users out. VACUUM FULL rewrites the table entirely, reclaims disk space back to the OS, but holds an exclusive lock the entire time. Use it only in maintenance windows, only on tables you’ve confirmed are massively bloated, and only after verifying you have enough free disk to create a full copy.

Gotcha: Autovacuum is disabled on temp tables.
Temp tables don’t get autovacuumed. If you’re using long-lived temp tables with heavy writes inside a session, you need to run VACUUM manually or ensure session churn clears them.

Gotcha: Setting autovacuum_vacuum_cost_limit too high on shared storage.
If multiple Postgres instances share the same disk (containerized environments, cloud VMs with shared EBS), cranking cost limits up on all instances means they’ll compete for I/O and hurt each other. Profile your I/O headroom first.

Gotcha: pg_stat_user_tables resets on service restart.
The last_autovacuum and row counts reset when Postgres restarts. Don’t rely on them for historical trending — ship these metrics to Prometheus or your monitoring stack.


A Production-Ready Baseline Config

Here’s a starting point for a write-heavy OLTP workload on modern server hardware (NVMe SSDs, 16+ cores, 64GB+ RAM). Adjust from here based on your monitoring data.

# postgresql.conf — autovacuum section

# More workers to handle concurrent table vacuuming
autovacuum_max_workers = 8

# Tighter triggers — don't wait for 20% dead tuples on large tables
autovacuum_vacuum_scale_factor = 0.02
autovacuum_vacuum_threshold = 200
autovacuum_analyze_scale_factor = 0.01
autovacuum_analyze_threshold = 100

# Less throttling on fast storage
autovacuum_vacuum_cost_delay = 2ms
autovacuum_vacuum_cost_limit = 600

# Reasonable freeze settings — don't touch these unless you know why
autovacuum_freeze_max_age = 200000000
vacuum_freeze_min_age = 50000000
vacuum_freeze_table_age = 150000000

# Let autovacuum see more memory for sorting
autovacuum_work_mem = 256MB   # default -1 uses maintenance_work_mem

Then immediately identify your 5 hottest tables and apply per-table overrides:

-- Run this for each high-write table after identifying them
ALTER TABLE your_hot_table SET (
    autovacuum_vacuum_scale_factor = 0.005,
    autovacuum_vacuum_threshold = 500,
    autovacuum_analyze_scale_factor = 0.005,
    autovacuum_vacuum_cost_limit = 1000,
    autovacuum_vacuum_cost_delay = 1
);

Reload config without a restart:

psql -U postgres -c "SELECT pg_reload_conf();"

When Manual VACUUM Still Has a Place

Autovacuum handles the steady state. But there are times you want to trigger vacuum manually:

# Verbose output — see exactly what it's doing
vacuumdb -U postgres -d mydb -t events --verbose

# Analyze only (update planner statistics)
vacuumdb -U postgres -d mydb --analyze-only

# Full vacuum with analyze on a specific table (locks table, use in maintenance window)
vacuumdb -U postgres -d mydb -t events --full --analyze --verbose

Manual vacuums also bypass the cost-delay throttling by default — they run at full speed. This is what you want after a bulk delete or a schema migration that torched half the rows in a table. Kick off a manual VACUUM ANALYZE immediately after large data operations rather than waiting for autovacuum to notice.


Connecting the Dots

The default autovacuum config that ships with PostgreSQL is deliberately conservative. It works fine for low-write databases. For anything serious — high-frequency inserts, updates on hot rows, event tables, log tables, queues — you need to tune it.

The process isn’t complicated once you understand the math: autovacuum fires based on percentage thresholds, those percentages are too high for large tables, workers are throttled more than necessary for modern storage. Fix those three things, add per-table overrides for your hottest tables, and monitor dead tuple accumulation over time.

What you shouldn’t do: blindly copy someone else’s config (including this one). Profile your workload, look at pg_stat_user_tables, check your XID ages, and tune from evidence. Autovacuum is not magic — it’s a daemon with knobs, and those knobs need to match your actual write rate and storage speed.

Leave a comment

👁 Views: 2,289 · Unique visitors: 1,646