Your database server has 256 GB of RAM, a fast NVMe array, and a 32-core CPU — and it still burns 15% of its CPU time doing nothing but walking page tables. Hugepages fix that. Not hypothetically. This is one of the highest-ROI kernel tuning knobs you can pull on a memory-hungry workload, and it takes about ten minutes to configure correctly.
Most guides stop at "set vm.nr_hugepages and restart your app." That’s incomplete. The 1 GB page variant can only be allocated at boot time. NUMA-aware allocation is almost never mentioned. And if you over-allocate, you silently waste physical RAM that nothing can reclaim. Let’s cover this properly.
Why the Default 4 KB Page Size Becomes a Problem
Every memory access on x86_64 goes through a virtual-to-physical address translation. The CPU keeps a cache of recent translations called the TLB (Translation Lookaside Buffer). It’s fast but tiny — typically 64 to 1536 entries depending on the level and CPU generation.
With 4 KB pages, a 128 GB working set needs over 33 million page table entries. The TLB covers a fraction of that. Cache misses mean a full page table walk: up to four memory accesses per miss on a four-level page table. Under load, TLB miss rates spike, latency climbs, and the CPU’s memory subsystem spends most of its time just figuring out where data is before it can read it.
Hugepages solve this by making each page larger, so a single TLB entry covers far more address space. A 2 MB page covers 512× as much as a 4 KB page. A 1 GB page covers 262,144× as much. Same TLB, dramatically better coverage.
The kernel reports TLB pressure indirectly. If you see high %sys and %iowait times in perf stat or sar output with LLC-load-misses and dTLB-load-misses both high, hugepages are likely worth testing.
Two Sizes, Two Different Beasts
2 MB hugepages are the standard flavor. They’re available on any x86_64 system and can be allocated dynamically at runtime — though fragmentation is a real concern after the system has been running for a while.
1 GB hugepages (also called "gigantic pages") require CPU support (pdpe1gb flag in /proc/cpuinfo). The catch: they must be reserved at boot via kernel parameters. You cannot allocate them dynamically after the kernel is up. This is a hard limitation, not a bug.
# Check if your CPU supports 1 GB pages
grep -c pdpe1gb /proc/cpuinfo
A non-zero result means you’re good. Any modern server-class Intel or AMD processor since roughly 2010 has this.
Reading the Current State
Before changing anything, see what you have:
grep -i huge /proc/meminfo
The key fields:
AnonHugePages: 2048 kB # Transparent Huge Pages in use (THP, not static)
ShmemHugePages: 0 kB
HugePages_Total: 0 # Static hugepages allocated
HugePages_Free: 0 # How many are unused
HugePages_Rsvd: 0 # Reserved but not yet faulted in
HugePages_Surp: 0 # Over-quota surplus pages
Hugepagesize: 2048 kB # Default hugepage size (2 MB on x86_64)
Hugetlb: 0 kB
You can also look at the full hugepage subsystem per size:
ls /sys/kernel/mm/hugepages/
# hugepages-1048576kB hugepages-2048kB
Each directory there has nr_hugepages, free_hugepages, and surplus_hugepages counters.
Allocating 2 MB Hugepages at Runtime
The simplest path. You can do this on a live system without a reboot.
# Check how many 2 MB pages you can realistically allocate
# (fragmentation may limit this on a loaded system)
cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# Allocate 2048 x 2 MB = 4 GB of hugepages
echo 2048 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# Verify
grep HugePages /proc/meminfo
To make this survive reboots, use sysctl:
# /etc/sysctl.d/90-hugepages.conf
# Reserve 4 GB as 2 MB hugepages
vm.nr_hugepages = 2048
# Allow up to 512 additional pages to be allocated on-demand (optional safety net)
vm.nr_overcommit_hugepages = 512
Apply without rebooting:
sysctl -p /etc/sysctl.d/90-hugepages.conf
One practical note: allocate hugepages either at boot or as early as possible in the system’s lifecycle. On a system that has been running for hours under load, physical memory becomes fragmented. The kernel may not be able to find 2 MB physically contiguous chunks even if MemFree looks promising.
Allocating 1 GB Hugepages — Boot Time Only
This is non-negotiable. You cannot do it at runtime.
Edit your GRUB configuration:
# /etc/default/grub
GRUB_CMDLINE_LINUX="... hugepagesz=1G hugepages=16 default_hugepagesz=1G"
Breaking those parameters down:
hugepagesz=1G— declares you want to configure 1 GB pageshugepages=16— reserve 16 of them (= 16 GB)default_hugepagesz=1G— makes 1 GB the default hugepage size (optional, affects which sizevm.nr_hugepagestargets)
If you want both 2 MB and 1 GB pages, you need to be explicit:
GRUB_CMDLINE_LINUX="... hugepagesz=1G hugepages=8 hugepagesz=2M hugepages=512"
Then rebuild and reboot:
# Debian/Ubuntu
update-grub && reboot
# RHEL/Rocky/AlmaLinux
grub2-mkconfig -o /boot/grub2/grub.cfg && reboot
After reboot, confirm:
grep -i huge /proc/meminfo
cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
NUMA: The Part Everyone Skips
On multi-socket servers, memory is NUMA-local. Allocating hugepages without specifying which NUMA node means the kernel decides, and it may not choose well for your workload.
Check your topology:
numactl --hardware
# or
ls /sys/devices/system/node/
Allocate per-node explicitly:
# Allocate 1024 x 2 MB hugepages on NUMA node 0
echo 1024 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
# Same on node 1
echo 1024 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages
This matters most for KVM guests and DPDK apps where you know which cores and memory will handle the workload. For databases that span NUMA nodes, even distribution is usually fine.
The hugetlbfs Mount
Most applications that use hugepages directly (DPDK, KVM, some JVMs) need a hugetlbfs filesystem mount. Applications using mmap() with MAP_HUGETLB don’t need it, but the mount is still a good idea to have.
# /etc/fstab — add this line
hugetlbfs /dev/hugepages hugetlbfs defaults,pagesize=2M 0 0
# For 1 GB pages, a separate mount:
hugetlbfs /dev/hugepages1G hugetlbfs defaults,pagesize=1G 0 0
mkdir -p /dev/hugepages /dev/hugepages1G
mount -a
Workloads That Actually Benefit
PostgreSQL
PostgreSQL has a huge_pages setting and it works well. With a large shared_buffers (say, 64 GB on a 128 GB machine), the reduction in TLB pressure is measurable. Enable in postgresql.conf:
# postgresql.conf
huge_pages = on # 'try' is the default — 'on' makes it fail-fast if unavailable
shared_buffers = 64GB
Before enabling, make sure your hugepages reservation covers shared_buffers plus some headroom. PostgreSQL uses shmget() to allocate shared memory; the kernel will map that to hugepages if huge_pages = on and pages are available.
Verify it actually worked after restart:
# Find the postmaster PID
head -1 /var/lib/postgresql/*/main/postmaster.pid
# Check its hugepage usage
grep -i huge /proc/<PID>/smaps_rollup
KVM / QEMU Virtual Machines
This is one of the biggest wins. A VM with 32 GB RAM backed by regular pages needs millions of TLB entries for the host to track guest memory. With 1 GB hugepages for KVM, you need 32.
In libvirt XML:
<memoryBacking>
<hugepages>
<page size="1" unit="GiB"/>
</hugepages>
<locked/>
</memoryBacking>
Or directly in QEMU:
qemu-system-x86_64 \
-m 32768 \
-mem-path /dev/hugepages1G \
-mem-prealloc \
...
-mem-prealloc forces all guest memory to be faulted in at VM startup rather than lazily. This adds startup latency but eliminates TLB misses during early workload ramp-up. Worth it for production VMs.
Redis
Redis with a large dataset benefits from 2 MB hugepages when THP is disabled (see Gotchas below). Set in redis.conf:
# No native hugepages API in Redis — it uses the OS mmap under the hood.
# What matters: disable THP and let static hugepages do the work.
Redis doesn’t have a huge_pages = on knob. It benefits indirectly because its memory allocator (jemalloc) will use hugepages if the system provides them through the transparent layer. The real Redis tuning is to disable THP and use static hugepages instead — more on that in the Gotchas.
DPDK (Data Plane Development Kit)
DPDK is the canonical hugepages use case. It uses hugepages exclusively for packet buffers. The default example configs expect 1 GB pages for serious throughput workloads.
# DPDK hugepage setup script (from DPDK's own tooling)
dpdk-hugepages.py --setup 2G # allocate 2 GB as 2M hugepages
dpdk-hugepages.py --setup 1G --huge-dir /dev/hugepages1G # 1 GB pages
Without hugepages, DPDK won’t even initialize its memory subsystem. This is the one workload where hugepages aren’t a tweak — they’re a hard requirement.
Java Applications
The JVM has supported large pages since Java 6. Enable with:
java -XX:+UseLargePages -XX:LargePageSizeInBytes=2m -Xmx48g -jar app.jar
The JVM allocates heap memory at startup using mmap() with MAP_HUGETLB. If the system doesn’t have enough hugepages reserved, the JVM falls back to 4 KB pages silently (unless you add -XX:+UseTransparentHugePages which is a different thing — see below). Adding -verbose:gc to your startup flags and watching for heap allocation warnings is wise here.
Kafka, Cassandra, and Elasticsearch all run on the JVM and see measurable latency improvement at high throughput on large heaps.
In-Memory Databases and Analytics (ClickHouse, Spark)
ClickHouse’s documentation explicitly recommends hugepages. Same for Spark with large executor heaps. The pattern is the same: large contiguous allocations, sequential scans, heavy aggregation. Anything that touches hundreds of gigabytes in predictable patterns will benefit.
Workloads That Don’t Benefit (Or Get Worse)
Small services with sub-1 GB working sets. A Node.js API or a Go microservice handling web traffic doesn’t need hugepages — their working sets fit in TLB coverage already.
Workloads with highly fragmented memory access patterns — certain graph algorithms, pointer-heavy data structures, hash maps over many small objects. The TLB coverage advantage shrinks when access is scattered across unrelated pages.
tmpfs-backed files. Hugepages are for anonymous memory and hugetlbfs. Regular file-backed mmap goes through the page cache on 4 KB pages regardless of your hugepage settings (unless THP decides to collapse them, which is unpredictable).
Gotchas
THP and static hugepages are not the same thing. Transparent Huge Pages (THP) are managed by khugepaged and applied automatically without reservation. For most applications, THP is actually harmful — it causes latency spikes when khugepaged scans and collapses pages. Disable THP for latency-sensitive workloads:
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
And persist it in /etc/rc.local or a systemd ExecStartPost — a sysctl won’t work for THP; it requires writing to sysfs.
Hugepages are not swappable. They’re locked in RAM. If you reserve 128 GB of hugepages on a 128 GB machine, you have no swap-eligible memory left. The kernel will OOM-kill processes before it can free hugepage memory that’s in use. Size with headroom.
1 GB pages fail silently if the kernel can’t allocate them at boot. Boot your system and verify before claiming the configuration is live. Check /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages. If it shows 0 instead of your requested count, you either don’t have pdpe1gb support or your system didn’t have enough contiguous physical memory at boot (rare, but happens on systems with BIOS memory reservations eating into the address space).
Non-root processes need RLIMIT_MEMLOCK raised. By default, non-root processes can only lock a small amount of memory. Processes using MAP_HUGETLB need mlock permissions:
# /etc/security/limits.d/hugepages.conf
postgres soft memlock unlimited
postgres hard memlock unlimited
Or in a systemd unit:
[Service]
LimitMEMLOCK=infinity
Overcommit doesn’t apply to hugepages. Normal allocations can be overcommitted. Hugepages cannot. If a process requests 100 hugepages and only 80 are free, the mmap() call fails. Plan your reservations to cover peak demand, not average demand.
Docker and Kubernetes need explicit configuration. Containers don’t inherit host hugepage settings automatically. In Kubernetes, hugepages are a schedulable resource:
resources:
requests:
hugepages-2Mi: 4Gi
limits:
hugepages-2Mi: 4Gi
The node must have hugepages pre-allocated and the hugepages feature gate enabled (it’s on by default since 1.19).
Sizing in Practice
A rough formula for static hugepage reservation:
nr_hugepages = ceil(target_memory_GB * 1024 / 2) + 10% overhead
For PostgreSQL shared_buffers = 32 GB using 2 MB pages:
ceil(32 * 1024 / 2) = 16384 pages + 1638 overhead = ~18000 pages
For KVM hosts, sum the RAM of all VMs you plan to run concurrently, then divide by your page size.
For NUMA systems: split the total evenly across nodes unless you know which node hosts your workload.
When deciding between 1 GB and 2 MB: use 1 GB pages for VM guests and DPDK where you’re handing entire gigabyte-scale buffers to a single workload. Use 2 MB pages for databases and JVMs where the allocator carves out regions in predictable but sub-GB chunks. Mixing both on the same host is fine; the kernel treats them as separate pools.
A Quick Validation Script
After configuration, use this to confirm your setup:
#!/bin/bash
# hugepage-check.sh
echo "=== Hugepage Summary ==="
grep -E "HugePages|Hugepagesize|Hugetlb" /proc/meminfo
echo ""
echo "=== Per-size allocation ==="
for dir in /sys/kernel/mm/hugepages/hugepages-*; do
size=$(basename "$dir")
total=$(cat "$dir/nr_hugepages")
free=$(cat "$dir/free_hugepages")
echo "$size: total=$total free=$free in-use=$((total - free))"
done
echo ""
echo "=== NUMA distribution ==="
for node in /sys/devices/system/node/node*/hugepages; do
[ -d "$node" ] || continue
nodename=$(echo "$node" | grep -oP 'node\d+')
echo "Node: $nodename"
for hpdir in "$node"/hugepages-*; do
size=$(basename "$hpdir")
total=$(cat "$hpdir/nr_hugepages")
echo " $size: $total pages"
done
done
echo ""
echo "=== THP status ==="
cat /sys/kernel/mm/transparent_hugepage/enabled
Run it after every configuration change. Takes two seconds and saves you from chasing phantom performance issues.
Hugepages are one of the few Linux kernel knobs where the performance difference is immediately visible in perf stat output after a single test run. For any workload that keeps more than ~4 GB hot in memory and accesses it heavily — a database, a VM host, a high-throughput network application — this is not optional tuning. It’s baseline configuration. The hard part isn’t enabling them; it’s knowing your workload’s memory shape well enough to size them correctly without wasting RAM or starving other processes.